JP6555755B2

JP6555755B2 - Image processing apparatus, image processing method, and image processing program

Info

Publication number: JP6555755B2
Application number: JP2016037954A
Authority: JP
Inventors: 敬介野中
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2016-02-29
Filing date: 2016-02-29
Publication date: 2019-08-07
Anticipated expiration: 2036-02-29
Also published as: JP2017156880A

Description

本発明は、画像処理装置および画像処理方法に関する。 The present invention relates to an image processing apparatus and an image processing method.

従来、スポーツシーンなどを対象として、カメラ視点以外の自由な視点からの映像（以下、自由視点映像と称す）を生成する技術が提案されている。この技術は、複数のカメラで撮影された映像を基に、それらの配置されていない仮想的な視点の映像を合成し、その結果を画面上に表示することでさまざまな視点での映像観賞を可能とするものである。 2. Description of the Related Art Conventionally, a technique for generating a video from a free viewpoint other than the camera viewpoint (hereinafter referred to as a free viewpoint video) has been proposed for a sports scene or the like. This technology synthesizes videos from virtual viewpoints that are not arranged based on videos taken by multiple cameras, and displays the results on the screen for viewing videos from various viewpoints. It is possible.

ここで、自由視点映像を合成する技術のうち、ビルボードと呼ばれる簡易なモデルを利用して高速に自由視点映像を合成する技術が存在する（非特許文献１参照）。このビルボードを利用した技術では、映像からモデル化対象のオブジェクトのテクスチャを正確に切り出し、それを厚みのないビルボードモデルとして仮想空間の地面に立たせることで、自由視点映像を生み出す。 Here, among techniques for synthesizing free viewpoint videos, there is a technique for synthesizing free viewpoint videos at high speed using a simple model called a billboard (see Non-Patent Document 1). In the technology using this billboard, the texture of the object to be modeled is accurately cut out from the video, and it is made to stand on the ground in a virtual space as a thin billboard model, thereby generating a free viewpoint video.

ここで、一般にビルボード方式では、あるビルボードの最下点（例えば、人物の足先）が仮想空間の地面に接するようにビルボードが配置される。また、仮想視点が水平方向に移動する際はその仮想視点の移動に合わせてビルボードを回転させ、垂直方向に移動する際はビルボードの方向を変化させない。 Here, in general, in the billboard system, the billboard is arranged such that the lowest point (for example, a person's foot) of a certain billboard is in contact with the ground of the virtual space. When the virtual viewpoint moves in the horizontal direction, the billboard is rotated in accordance with the movement of the virtual viewpoint, and when the virtual viewpoint moves in the vertical direction, the direction of the billboard is not changed.

Hayashi, K.; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer StadiumADIUM," in Computer Graphics, Imaging and Visualisation, 2006 International Conference on , vol., no., pp.220-225, 26-28 July 2006Hayashi, K .; Saito, H., "Synthesizing Free-Viewpoing Images from Multiple View Videos in Soccer Stadium ADIUM," in Computer Graphics, Imaging and Visualisation, 2006 International Conference on, vol., No., Pp.220-225, 26-28 July 2006

非特許文献１に記載の方式は、品質の高い自由視点映像を高速に合成可能であるという点において優れている。しかしながら、ビルボード（厚みのないテクスチャだけのモデル）を利用していることによって、対象人物の姿勢を適切に表現できない。 The method described in Non-Patent Document 1 is superior in that a high-quality free viewpoint video can be synthesized at high speed. However, the posture of the target person cannot be appropriately expressed by using a billboard (a model with only a texture having no thickness).

本発明はこうした課題に鑑みてなされたものであり、その目的は、自由視点映像において仮想視点が任意の方向に移動した場合でも自然な表示を行うことができる技術の提供にある。 The present invention has been made in view of these problems, and an object thereof is to provide a technique capable of performing natural display even when a virtual viewpoint moves in an arbitrary direction in a free viewpoint video.

本発明のある態様は、画像処理装置に関する。この画像処理装置は、実空間内に定義された基準面に接する被写体を複数の視点から撮像することにより得られる画像を取得する取得部と、取得部によって取得された画像で得られた実空間との対応付け情報を利用して前記被写体の接地位置を推定し、基準面と被写体とが接する基準面上の位置が３次元の実空間でどのような姿勢であるかを表す多角形から成るビルボード基準図形を算出する算出部と、取得部によって取得された画像から複数の視点に含まれない視点に対応する合成画像を合成する際に、算出部によって算出された前記ビルボード基準図形に基づいて、取得部によって取得された画像に含まれる被写体の像を仮想視点の情報を基に変形する変形部と、を備える。 One embodiment of the present invention relates to an image processing apparatus. This image processing apparatus includes an acquisition unit that acquires an image obtained by imaging a subject that is in contact with a reference plane defined in the real space from a plurality of viewpoints , and a real space obtained by an image acquired by the acquisition unit. The contact position of the subject is estimated using the association information with the object, and the object is composed of a polygon representing the posture in the three-dimensional real space where the position on the reference surface where the reference surface and the subject contact each other is estimated. A calculation unit that calculates a billboard reference graphic, and a composite image corresponding to a viewpoint that is not included in a plurality of viewpoints from the image acquired by the acquisition unit is combined with the billboard reference graphic calculated by the calculation unit. And a deformation unit that deforms the image of the subject included in the image acquired by the acquisition unit based on the information of the virtual viewpoint .

なお、以上の構成要素の任意の組み合わせや、本発明の構成要素や表現を装置、方法、システム、コンピュータプログラム、コンピュータプログラムを格納した記録媒体などの間で相互に置換したものもまた、本発明の態様として有効である。 It should be noted that any combination of the above-described constituent elements, or those obtained by replacing the constituent elements and expressions of the present invention with each other between apparatuses, methods, systems, computer programs, recording media storing computer programs, and the like are also included in the present invention. It is effective as an embodiment of

本発明によれば、自由視点映像において仮想視点が任意の方向に移動した場合でも自然な表示を行うことができる。 According to the present invention, natural display can be performed even when a virtual viewpoint moves in an arbitrary direction in a free viewpoint video.

実施の形態に係る画像処理装置を備える自由視点画像配信システムを示す模式図である。It is a schematic diagram which shows a free viewpoint image delivery system provided with the image processing apparatus which concerns on embodiment. 実施の形態に係る画像処理装置の機能および構成を示すブロック図である。It is a block diagram which shows the function and structure of the image processing apparatus which concerns on embodiment. カメラの画像平面上の座標とフィールド座標との対応関係を示す説明図である。It is explanatory drawing which shows the correspondence of the coordinate on the image plane of a camera, and a field coordinate. 図４（ａ）〜（ｃ）は、背景差分部における処理の例を示す説明図である。4A to 4C are explanatory diagrams illustrating an example of processing in the background difference unit. ビルボード制御点決定部におけるスケルトンの抽出処理を示す説明図である。It is explanatory drawing which shows the extraction process of the skeleton in a billboard control point determination part. ビルボード制御点決定部における制御点の特定処理を示す説明図である。It is explanatory drawing which shows the specific process of the control point in a billboard control point determination part. 図７（ａ）〜（ｆ）は、ビルボード姿勢推定部における接地位置の推定処理を示す説明図である。FIGS. 7A to 7F are explanatory diagrams illustrating a grounding position estimation process in the billboard posture estimation unit. 原画像に対する姿勢の推定結果を示すイメージ図である。It is an image figure which shows the estimation result of the attitude | position with respect to an original image. ビルボード基準図形に関する変数を示す説明図である。It is explanatory drawing which shows the variable regarding a billboard reference | standard figure. ビルボード再構成部におけるビルボードの再構成処理を示す説明図である。It is explanatory drawing which shows the reconstruction process of the billboard in a billboard reconstruction part. 図１の画像処理装置における一連の処理の流れを示すフローチャートである。2 is a flowchart showing a flow of a series of processes in the image processing apparatus of FIG. 1. カメラ視点での画像とカメラ視点から垂直方向に移動した仮想視点での画像とを示す図である。It is a figure which shows the image in a camera viewpoint, and the image in the virtual viewpoint moved to the perpendicular direction from the camera viewpoint.

以下、各図面に示される同一または同等の構成要素、部材、処理には、同一の符号を付するものとし、適宜重複した説明は省略する。また、各図面において説明上重要ではない部材の一部は省略して表示する。 Hereinafter, the same or equivalent components, members, and processes shown in the drawings are denoted by the same reference numerals, and repeated description is appropriately omitted. In addition, in the drawings, some of the members that are not important for explanation are omitted.

非特許文献１に記載の方式では、例えば、腕立て伏せのようなポーズを取った際に、仮想視点を垂直方向に移動させると、人物の姿勢を適切に表現できずあたかも手（ビルボードの最下点）で地面に立ち，足が浮いているように合成される場合がある。図１２は、カメラ視点での画像１０２とカメラ視点から垂直方向に移動した仮想視点での画像１０４とを示す図である。カメラ視点での画像１０２では人物は両手両足で地面に接しているように映っている。しかしながら、仮想視点での画像１０４では、ビルボード自体の表示は変わらず、また人物が手で地面に立っていると解釈されるので、人物の足が浮いているように見える（符号１０３）。 In the method described in Non-Patent Document 1, for example, when a virtual viewpoint is moved in the vertical direction when a pose such as a push-up is taken, the posture of the person cannot be expressed appropriately, as if the hand (the bottom of the billboard) It may be composed as if it stands on the ground at a point) and the feet are floating. FIG. 12 is a diagram showing an image 102 at the camera viewpoint and an image 104 at the virtual viewpoint moved in the vertical direction from the camera viewpoint. In the image 102 from the camera viewpoint, the person appears to touch the ground with both hands and feet. However, in the image 104 at the virtual viewpoint, the display of the billboard itself is not changed, and it is interpreted that the person is standing on the ground with his hand, so that the person's feet appear to be floating (reference numeral 103).

この原因としては、従来の方式は遠方から人物を撮影した映像（例えば、サッカーのスタジアム席からの映像）を対象としたものであり、人物の姿勢によって生じる奥行きの差を考慮しないことと、人物が地面に１点（足）で立っていることを前提としていることと、が挙げられる。 The reason for this is that the conventional method targets images taken from a distance (for example, images from soccer stadium seats), and does not consider the difference in depth caused by the posture of the person. Is assumed to be standing on the ground with one point (foot).

これに対して、実施の形態では、事前に計測された複数のカメラのキャリブレーションデータを用いて人物のビルボードの姿勢を簡易に推定し、その情報を用いて擬似的にビルボードを変形させることで、自由視点映像において仮想視点が移動した場合でも適応的な表示を行う。 In contrast, in the embodiment, the posture of a person's billboard is simply estimated using calibration data of a plurality of cameras measured in advance, and the billboard is artificially deformed using the information. Thus, even when the virtual viewpoint moves in the free viewpoint video, adaptive display is performed.

図１は、実施の形態に係る画像処理装置２００を備える自由視点画像配信システム１１０を示す模式図である。自由視点画像配信システム１１０は、複数のカメラ１１６、１１８、１２０と、それらのカメラと接続された画像処理装置２００と、携帯電話やタブレットやスマートフォンやＨＭＤ（ＨｅａｄＭｏｕｎｔｅｄＤｉｓｐｌａｙ）などの携帯端末１１４と、を備える。画像処理装置２００と携帯端末１１４とはインターネットなどのネットワーク１１２を介して接続される。自由視点画像配信システム１１０では、例えばスタジオ内に配置された複数のカメラ１１６、１１８、１２０が地面１２６に立つダンサー１２４を撮像する。複数のカメラ１１６、１１８、１２０は撮った映像を画像処理装置２００に送信し、画像処理装置２００はそれらの映像を処理する。携帯端末１１４のユーザは画像処理装置２００に対して希望の視点を指定し、画像処理装置２００は指定された視点（仮想視点）からダンサー１２４を見た場合の画像を合成し、ネットワーク１１２を介して携帯端末１１４に配信する。 FIG. 1 is a schematic diagram illustrating a free viewpoint image distribution system 110 including an image processing device 200 according to an embodiment. The free-viewpoint image distribution system 110 includes a plurality of cameras 116, 118, and 120, an image processing device 200 connected to the cameras, and a mobile terminal 114 such as a mobile phone, a tablet, a smartphone, or an HMD (Head Mounted Display). . The image processing apparatus 200 and the portable terminal 114 are connected via a network 112 such as the Internet. In the free viewpoint image distribution system 110, for example, a plurality of cameras 116, 118, and 120 arranged in a studio image the dancer 124 standing on the ground 126. The plurality of cameras 116, 118, and 120 transmit captured images to the image processing apparatus 200, and the image processing apparatus 200 processes these images. The user of the portable terminal 114 designates a desired viewpoint with respect to the image processing apparatus 200, and the image processing apparatus 200 synthesizes an image when the dancer 124 is viewed from the designated viewpoint (virtual viewpoint), via the network 112. To the mobile terminal 114.

なお、図１ではスタジオ内のダンサーを撮像する場合を説明したが、これに限られず、例えばフィットネスのインストラクタを撮像する場合やテニスの試合を撮像する場合やサッカーの試合を撮像する場合などの、実空間内の基準面と接する被写体を撮像する場合に、本実施の形態の技術的思想を適用できる。また、携帯端末１１４の代わりに、デスクトップＰＣやラップトップＰＣ、ＴＶ受像機等の据え置き型端末が使用されてもよい。 In addition, although FIG. 1 demonstrated the case where the dancer in a studio was imaged, it is not restricted to this, For example, when imaging a fitness instructor, when imaging a tennis game, when imaging a soccer game, The technical idea of the present embodiment can be applied when imaging a subject that is in contact with a reference plane in real space. Further, a stationary terminal such as a desktop PC, a laptop PC, or a TV receiver may be used instead of the portable terminal 114.

図２は、実施の形態に係る画像処理装置２００の機能および構成を示すブロック図である。ここに示す各ブロックは、ハードウエア的には、コンピュータのＣＰＵ（Central Processing Unit）をはじめとする素子や機械装置で実現でき、ソフトウエア的にはコンピュータプログラム等によって実現されるが、ここでは、それらの連携によって実現される機能ブロックを描いている。したがって、これらの機能ブロックはハードウエア、ソフトウエアの組合せによっていろいろなかたちで実現できることは、本明細書に触れた当業者には理解されるところである。 FIG. 2 is a block diagram illustrating functions and configurations of the image processing apparatus 200 according to the embodiment. Each block shown here can be realized by hardware such as a computer (CPU) (Central Processing Unit) and other elements and mechanical devices, and software can be realized by a computer program or the like. The functional block realized by those cooperation is drawn. Therefore, it is understood by those skilled in the art who have touched this specification that these functional blocks can be realized in various forms by a combination of hardware and software.

画像処理装置２００は、複数のカメラにて撮影された画像から任意の仮想視点の画像を合成する。画像処理装置２００は、画像読み込み部２０２、２０４、２０６と、背景差分部２０８、２１０、２１２と、カメラキャリブレーション部２１４、２１６、２１８（カメラキャリブレーション部２２０と総称されてもよい）と、ビルボード制御点決定部２２２、２２４、２２６と、ビルボード姿勢推定部２２８と、ビルボード再構成部２３０、２３２、２３４と、合成映像出力部２３６と、を備える。Ｎ台（Ｎは２以上の自然数）のカメラのそれぞれについて、画像読み込み部、背景差分部、カメラキャリブレーション部、ビルボード制御点決定部、ビルボード再構成部、が設けられている。 The image processing apparatus 200 synthesizes an image of an arbitrary virtual viewpoint from images taken by a plurality of cameras. The image processing apparatus 200 includes an image reading unit 202, 204, 206, a background difference unit 208, 210, 212, a camera calibration unit 214, 216, 218 (may be collectively referred to as a camera calibration unit 220), A billboard control point determination unit 222, 224, 226, a billboard posture estimation unit 228, a billboard reconstruction unit 230, 232, 234, and a composite video output unit 236 are provided. For each of the N cameras (N is a natural number of 2 or more), an image reading unit, a background difference unit, a camera calibration unit, a billboard control point determination unit, and a billboard reconstruction unit are provided.

画像処理装置２００は、同一の被写体を撮影した複数のカメラ映像を基に、ビルボード方式に則った自由視点映像の生成を行う。従来のビルボード方式では垂直方向の仮想視点の移動に対してビルボードの変形（回転なども含む）は行っていなかったが、本実施の形態に係る画像処理装置２００では垂直方向の仮想視点の移動に対して、その仮想視点に合わせたビルボードの生成を行う。 The image processing apparatus 200 generates a free viewpoint video in accordance with a billboard system based on a plurality of camera videos obtained by photographing the same subject. In the conventional billboard method, the billboard is not deformed (including rotation) with respect to the movement of the virtual viewpoint in the vertical direction, but the image processing apparatus 200 according to the present embodiment does not change the virtual viewpoint in the vertical direction. For movement, a billboard is generated in accordance with the virtual viewpoint.

以下では、複数カメラ間の時刻同期は事前に行われているとする。また、以下では主に人物のモデル化を仮定するがその他の被写体のモデル化についても本実施の形態の技術的思想を適用可能である。また、仮想視点は例えばユーザが任意に指定可能な仮想的な視点であり、カメラ１〜Ｎが配置される実際の複数の視点には含まれない。 In the following, it is assumed that time synchronization between a plurality of cameras is performed in advance. In the following, modeling of a person is mainly assumed, but the technical idea of the present embodiment can be applied to modeling of other subjects. The virtual viewpoint is a virtual viewpoint that can be arbitrarily designated by the user, for example, and is not included in the actual viewpoints in which the cameras 1 to N are arranged.

［画像処理装置２００の概要］
画像処理装置２００における画像処理では、地表面（地面）や床面や壁面などの実空間内の表面を基準面として定義する。例えば、地面は２次元の面としてモデル化される。画像読み込み部２０２、２０４、２０６は、地面に接する人物を複数の異なる視点に配置されたカメラ１〜Ｎで撮像することにより得られる画像を取得する取得部として機能する。 [Outline of Image Processing Device 200]
In the image processing in the image processing apparatus 200, a surface in real space such as the ground surface (ground), a floor surface, or a wall surface is defined as a reference surface. For example, the ground is modeled as a two-dimensional surface. The image reading units 202, 204, and 206 function as an acquisition unit that acquires an image obtained by imaging a person in contact with the ground with the cameras 1 to N arranged at a plurality of different viewpoints.

カメラキャリブレーション部２２０は複数のカメラで撮影した画像を入力として、カメラごとに実空間の地面（フィールドと称す場合がある）とカメラ画像との対応付けを取り、出力する。このキャリブレーション操作は、固定カメラを前提とした場合、ひとつのカメラについてのみ行えばよく、他のカメラについてはカメラ間の既知の位置関係を利用することで算出することができる。背景差分部２０８、２１０、２１２は背景差分法を用いて画像内の背景と前景とを分離し、２値化した画像を出力する。 The camera calibration unit 220 receives images taken by a plurality of cameras as inputs, associates the ground in real space (sometimes referred to as a field) with a camera image for each camera, and outputs it. This calibration operation need only be performed for one camera when a fixed camera is assumed, and can be calculated by using a known positional relationship between cameras for the other cameras. The background difference units 208, 210, and 212 separate the background and foreground in the image using the background difference method, and output a binarized image.

ビルボード制御点決定部２２２、２２４、２２６は、背景差分部２０８、２１０、２１２において得られた人物ビルボードを基に、最終的な合成結果を得るために制御するビルボード内の画素を決定する。ビルボード姿勢推定部２２８は、カメラキャリブレーション部２２０で得られた実空間との対応付け情報を利用して人物の接地位置を推定し、実際に人物が３次元の実空間でどのような姿勢を取っているかを推定する。また、ビルボード姿勢推定部２２８はそれを表す多角形（ビルボード基準図形と称す）を算出する。ビルボード再構成部２３０、２３２、２３４は、背景差分部２０８、２１０、２１２で得られたビルボード、ビルボード制御点決定部２２２、２２４、２２６で得られた制御点、ビルボード姿勢推定部２２８で得られたビルボード基準図形、およびユーザ選択による仮想視点の情報を基に、その仮想視点に最適なビルボードを構成する。 The billboard control point determination units 222, 224, and 226 determine pixels in the billboard to be controlled to obtain a final synthesis result based on the person billboards obtained in the background difference units 208, 210, and 212. To do. The billboard posture estimation unit 228 estimates the contact position of the person using the association information with the real space obtained by the camera calibration unit 220, and what kind of posture the person actually has in the three-dimensional real space. Estimate what you are taking. Further, the billboard posture estimation unit 228 calculates a polygon (referred to as a billboard reference figure) representing the polygon. The billboard reconstruction units 230, 232, and 234 are the billboard obtained by the background difference units 208, 210, and 212, the control points obtained by the billboard control point determination units 222, 224, and 226, and the billboard posture estimation unit. Based on the billboard reference graphic obtained in 228 and information on the virtual viewpoint selected by the user, a billboard optimal for the virtual viewpoint is constructed.

［カメラキャリブレーション部２２０］
カメラキャリブレーション部２２０は、ある時刻に撮影された画像におけるフィールド像の特徴的な点（例えば、テニスコートやサッカーグラウンドの白線の交点など）と実際の実空間におけるフィールド上の対応する点との対応付けを行う。ここで、被写体が一般的なスポーツ映像に含まれる場合は、コート等のサイズが規格化されているため、画像平面上の点が実空間内のフィールド上（世界座標系）のどの座標に対応するかを計算することが可能である。 [Camera calibration unit 220]
The camera calibration unit 220 calculates a characteristic point of a field image in an image taken at a certain time (for example, an intersection of white lines of a tennis court or a soccer ground) and a corresponding point on the field in an actual real space. Perform the association. Here, when the subject is included in a general sports video, since the size of the court etc. is standardized, the point on the image plane corresponds to which coordinate on the field in the real space (world coordinate system) It is possible to calculate what to do.

図３は、カメラ４０２の画像平面上の座標とフィールド座標との対応関係を示す説明図である。カメラ４０２の２次元画像平面上の座標を（ｕ、ｖ）、世界座標系のフィールド平面上の座標（フィールド座標）を（ｘ’、ｙ’）としたときに、両者の対応関係はホモグラフィ行列
とスカラー値ｓとを用いて次の通りに表すことができる。
…（式１） FIG. 3 is an explanatory diagram showing a correspondence relationship between coordinates on the image plane of the camera 402 and field coordinates. When the coordinates on the two-dimensional image plane of the camera 402 are (u, v) and the coordinates (field coordinates) on the field plane of the world coordinate system are (x ′, y ′), the correspondence between the two is homography. matrix
And a scalar value s can be expressed as follows.
... (Formula 1)

式１に上記の対応点の組を入力することでｓおよびＨを求めることが可能となり、画像平面上の任意の画素の座標とフィールド座標との相互変換が可能となる。ここで、ｉ番目のカメラの画像平面上の座標とフィールド座標との対応関係を表すホモグラフィ行列を
とする。 S and H can be obtained by inputting the set of corresponding points in Equation 1, and mutual conversion between the coordinates of an arbitrary pixel on the image plane and the field coordinates becomes possible. Here, the homography matrix representing the correspondence between the coordinates on the image plane of the i-th camera and the field coordinates is
And

なお、フィールドに対するカメラキャリブレーションの手法は上記のものに限られない。 The camera calibration method for the field is not limited to the above.

次に、時刻ｔに撮像された画像
（ｍ_ｉ、ｎ_ｉはそれぞれｉ番目のカメラの画像の幅、高さ)に対して、カメラごとに独立に以下の処理を行う。 Next, an image captured at time t
The following processing is independently performed for each camera with respect to ( _mi and _ni are the width and height of the image of the i-th camera, respectively).

[背景差分部２０８、２１０、２１２]
背景差分部２０８、２１０、２１２は、Ｉ_ｉ ^ｔの各画素を背景と前景との２つに分類することで、Ｉ_ｉ ^ｔを背景と前景とに分離する。本実施の形態では、この分離は、例えば公知の背景差分法を使用して実現されてもよい。背景差分部２０８、２１０、２１２は、背景、前景とされた画素の値にそれぞれ０、１を割り当てる。この背景と前景との分離を行うことによって、人物を含むおおまかな領域が抽出可能である。背景差分部２０８、２１０、２１２は、前景として抽出された画素集合のうち連結する画素集合をマスクとして画像から抜き出すことでテクスチャを生成する。背景差分部２０８、２１０、２１２は、生成されたテクスチャをビルボードとして出力する。出力されるビルボードは、カメラで撮像された画像に含まれる人物の像に対応する。 [Background difference unit 208, 210, 212]
Background subtraction unit _{208, 210,} 212 to classify each pixel of _I ^{i t} into two between the background and the _foreground, to separate _I ^{i t} in the background and foreground. In the present embodiment, this separation may be realized using a known background subtraction method, for example. The background difference units 208, 210, and 212 assign 0 and 1 to the pixel values set as the background and the foreground, respectively. By separating the background and the foreground, a rough region including a person can be extracted. The background difference units 208, 210, and 212 generate a texture by extracting a connected pixel set from the pixel set extracted as the foreground from the image. The background difference units 208, 210, and 212 output the generated texture as a billboard. The output billboard corresponds to an image of a person included in an image captured by the camera.

図４（ａ）〜（ｃ）は、背景差分部２０８、２１０、２１２における処理の例を示す説明図である。図４（ａ）はｉ番目のカメラにより撮像された時刻ｔの画像Ｉ_ｉ ^ｔを示す。背景差分部２０８、２１０、２１２はこの画像をＩ_ｉ ^ｔを原画像として処理する。図４（ｂ）は図４（ａ）の原画像に対して背景差分法を適用した結果を示す。黒色の部分は背景と判定され、０が割り当てられている。白色の部分は前景と判定され、１が割り当てられている。図４（ｃ）は背景差分部２０８、２１０、２１２が図４（ａ）の原画像を処理して出力するビルボードを示す。 4A to 4C are explanatory diagrams illustrating examples of processing in the background difference units 208, 210, and 212. FIG. 4 (a) shows the image I _i ^t of the captured time t by i-th camera. Background subtraction unit 208, 210, 212 processes the image _I ^{i t} as an original image. FIG. 4B shows the result of applying the background subtraction method to the original image of FIG. The black part is determined to be the background and 0 is assigned. The white part is determined to be the foreground, and 1 is assigned. FIG. 4C shows a billboard that the background difference units 208, 210, and 212 process and output the original image of FIG.

[ビルボード制御点決定部２２２、２２４、２２６]
ビルボード制御点決定部２２２、２２４、２２６は、各カメラのビルボードから、仮想視点に合わせたビルボード再構成のために利用するビルボード内部の画素（制御点）を設定または決定する。まずビルボード制御点決定部２２２、２２４、２２６は、ビルボードを抽象化した抽象化像を生成する。具体的には、ビルボード制御点決定部２２２、２２４、２２６は、ビルボードを構成するテクスチャの画素集合に対して公知の縮小処理を適用し、ビルボード内の人物の骨組みを、連結された画素集合（スケルトン）で表現する。 [Billboard Control Point Determination Unit 222, 224, 226]
The billboard control point determination units 222, 224, and 226 set or determine pixels (control points) inside the billboard to be used for billboard reconstruction according to the virtual viewpoint from the billboards of the cameras. First, the billboard control point determination units 222, 224, and 226 generate an abstract image obtained by abstracting the billboard. Specifically, the billboard control point determination units 222, 224, and 226 apply a known reduction process to the texture pixel set constituting the billboard, and the human skeleton in the billboard is connected. This is expressed as a pixel set (skeleton).

図５は、ビルボード制御点決定部２２２、２２４、２２６におけるスケルトンの抽出処理を示す説明図である。ビルボード５０２は「縮小」され、最終的に連結された線の集合すなわちスケルトン５０４となる。符号５０６は得られたスケルトン５０４を元のビルボード５０２に重ね合わせた状態を示す。 FIG. 5 is an explanatory diagram showing a skeleton extraction process in the billboard control point determination units 222, 224, and 226. Billboard 502 is “reduced” and eventually becomes a connected set of lines or skeleton 504. Reference numeral 506 indicates a state in which the obtained skeleton 504 is superimposed on the original billboard 502.

ビルボード制御点決定部２２２、２２４、２２６は、得られたスケルトンに含まれる画素集合のうち、所定の条件を満たす画素を制御点として特定する。所定の条件は、例えば連結数が２ではないことである。連結数は、得られたスケルトンに含まれる画素集合のある画素について、その画素の周囲の８画素のうち該画素集合に属する画素の数である。このように選択された画素はスケルトンの端部またはスケルトンの分岐部となる。すなわち、スケルトンの端部およびスケルトンの分岐部が制御点として利用される。 The billboard control point determination units 222, 224, and 226 specify, as control points, pixels that satisfy a predetermined condition from a set of pixels included in the obtained skeleton. The predetermined condition is, for example, that the number of connections is not two. The number of connections is the number of pixels belonging to the pixel set among the eight pixels around the pixel in a pixel set included in the obtained skeleton. The pixel selected in this way becomes an end portion of the skeleton or a branch portion of the skeleton. That is, the end portion of the skeleton and the branch portion of the skeleton are used as control points.

図６は、ビルボード制御点決定部２２２、２２４、２２６における制御点の特定処理を示す説明図である。制御点については、動画内でビルボードが動いた場合も人物の同様の箇所を指すほうが望ましい。そのため、ビルボード内の特徴点のうち制御点に最も近いものを利用することができる（制御点の座標の多少の誤差は結果に大きな影響を与えない）。また、制御点の初期値を手動で与えることも可能である。 FIG. 6 is an explanatory diagram showing control point identification processing in the billboard control point determination units 222, 224, and 226. As for the control points, it is desirable to indicate the same part of the person even when the billboard moves in the moving image. Therefore, the feature point in the billboard that is closest to the control point can be used (a slight error in the coordinates of the control point does not greatly affect the result). It is also possible to manually give the initial value of the control point.

[ビルボード姿勢推定部２２８]
ビルボード姿勢推定部２２８は、背景差分部２０８、２１０、２１２で得られた人物領域の２値マスク（例えば、図４（ｂ）に示される画像）から、フィールドと人物とが接するフィールド上の位置を算出する算出部として機能する。言い換えると、ビルボード姿勢推定部２２８は、２値マスクを利用して人物の姿勢を推定する。 [Billboard Posture Estimation Unit 228]
The billboard posture estimation unit 228 uses the binary mask (for example, the image shown in FIG. 4B) of the person area obtained by the background difference units 208, 210, and 212 on the field where the field and the person are in contact with each other. It functions as a calculation unit for calculating the position. In other words, the billboard posture estimation unit 228 estimates the posture of a person using a binary mask.

まず、ビルボード姿勢推定部２２８は、以下の処理によって人物がフィールド上に実際に接地している箇所を抽出する。ｉ番目のカメラにより撮像された時刻ｔの画像Ｉ_ｉ ^ｔに対応する２値マスクをＪ_ｉ ^ｔと表記する。ｉ番目のカメラの画像平面上の座標とフィールド座標との対応関係は、カメラキャリブレーション部２２０により式１にしたがい算出されたホモグラフィ行列Ｈ_ｉを使用して、
…（式２）
と表される。
ビルボード姿勢推定部２２８は、全てのカメラの時刻ｔにおける２値マスクについて、式２を用いて、以下のとおりフィールド座標への射影および積算を行う。
…（式３）
ここで、Ｆ^ｔはフィールド座標のそれぞれにおける値を保持する行列であり、Ｆ^ｔ（ｘ”、ｙ”）はフィールド座標（ｘ”、ｙ”）における値である。 First, the billboard posture estimation unit 228 extracts a place where the person is actually in contact with the field by the following processing. A binary mask corresponding to the image I _i ^t at time t captured by the i-th camera is denoted as J _i ^t . The correspondence between the coordinates on the image plane of the i-th camera and the field coordinates is obtained by using the homography matrix H _i calculated by the camera calibration unit 220 according to Equation 1,
... (Formula 2)
It is expressed.
The billboard posture estimation unit 228 performs projection and integration on the field coordinates as follows with respect to the binary masks at the time t of all the cameras using Expression 2.
... (Formula 3)
Here, F ^t is a matrix that holds values in each of the field coordinates, and F ^t (x ″, y ″) is a value in the field coordinates (x ″, y ″).

上記の射影操作では、カメラの画像平面上の画素のうち、実空間において人物がフィールド平面に接地している箇所に対応するものは、カメラによらず同じフィールド座標へ射影される。そのため、射影されたすべての値を積算することで、人物が接地している大まかな位置を算出できる。 In the above projection operation, among the pixels on the image plane of the camera, those corresponding to the place where the person is in contact with the field plane in the real space are projected to the same field coordinates regardless of the camera. Therefore, a rough position where the person is in contact with the ground can be calculated by integrating all the projected values.

図７（ａ）〜（ｆ）は、ビルボード姿勢推定部２２８における接地位置の推定処理を示す説明図である。図７（ａ）は１番目のカメラにより撮像された時刻ｔの画像Ｉ_１ ^ｔに対応する２値マスクＪ_１ ^ｔを示す。図７（ｂ）は２番目のカメラにより撮像された時刻ｔの画像Ｉ_２ ^ｔに対応する２値マスクＪ_２ ^ｔを示す。図７（ｃ）は３番目のカメラにより撮像された時刻ｔの画像Ｉ_３ ^ｔに対応する２値マスクＪ_３ ^ｔを示す。図７（ｄ）は３つの２値マスクＪ_１ ^ｔ、Ｊ_２ ^ｔ、Ｊ_３ ^ｔをフィールドに射影して積算した結果得られるマスクを示す。人物がフィールドに接している箇所７１０のＦ^ｔはより大きい値として算出される（図７（ｄ）では色が黒色に近いほどＦ^ｔの値が大きい）。 FIGS. 7A to 7F are explanatory diagrams illustrating the grounding position estimation processing in the billboard posture estimation unit 228. FIG. FIG. 7A shows a binary mask J ₁ ^t corresponding to the image I ₁ ^t at time t captured by the first camera. FIG. 7B shows a binary mask J ₂ ^t corresponding to the image I ₂ ^t at time t captured by the second camera. FIG. 7C shows a binary mask J ₃ ^t corresponding to the image I ₃ ^t at time t captured by the third camera. FIG. 7D shows a mask obtained as a result of projecting _three binary masks J ₁ ^t , J ₂ ^t , and J ₃ ^t onto the field and integrating them. The F ^t places 710 the person is in contact with the field is calculated as a larger value (high value enough color in FIG. 7 (d) is close to black F ^t).

ビルボード姿勢推定部２２８は、このＦ^ｔに対して以下のしきい値処理を行うことで、人物がフィールドと接するフィールド上の位置の候補を推定する。
…（式４） Billboard posture estimation unit 228, by performing the following threshold processing on the F ^t, estimates the candidate position on the field the person is in contact with the field.
... (Formula 4)

図７（ｅ）は式４によって得られたｆ^ｔを、あるカメラの画像平面へ射影したものを示す。黒色画素の集合７１２が人物の接地位置に相当する。図７（ｆ）は接地点および外接多角形を示す。上記操作によって、人物の接地位置であると判定された画像平面上の画素の集合７１２が得られる。ビルボード姿勢推定部２２８は、この画素の集合７１２について、連結している画素の集合の重心を実際に人物が接地している箇所（接地点）７０２、７０４、７０６、７０８として決定する。この接地点７０２、７０４、７０６、７０８について完全グラフを構成して得られる外接多角形αはその人物のおおまかな姿勢を示す。図８（ａ）は、図４（ａ）の原画像に対する姿勢の推定結果を示すイメージ図である。図８（ｂ）は、図１２の原画像に対する姿勢の推定結果を示すイメージ図である。図８（ｂ）において４つの接地点８０２、８０４、８０６、８０８が得られている。 Figure 7 (e) shows what the f ^t obtained by equation 4, is projected to the image plane of a camera. A set 712 of black pixels corresponds to a person's ground contact position. FIG. 7 (f) shows a ground contact point and a circumscribed polygon. Through the above operation, a set of pixels 712 on the image plane determined to be the person's ground contact position is obtained. The billboard posture estimation unit 228 determines the centroid of the connected pixel set as locations (grounding points) 702, 704, 706, and 708 where the person is actually in contact with the pixel set 712. A circumscribed polygon α obtained by constructing a complete graph with respect to the contact points 702, 704, 706, and 708 indicates a general posture of the person. FIG. 8A is an image diagram showing a posture estimation result with respect to the original image of FIG. FIG. 8B is an image diagram showing a result of posture estimation for the original image in FIG. In FIG. 8B, four ground points 802, 804, 806, 808 are obtained.

図９は、ビルボード基準図形に関する変数を示す説明図である。ビルボード姿勢推定部２２８は、得られた外接多角形αを画像平面上のｙ方向に平行移動することで、おおよそ人物を包含する図形Θを生成する。この図形Θのうち、平行移動における外接多角形αの頂点の軌跡である線分を２つ含む（画像上では側面のように見える）平行四辺形を
とする。Ｍは自然数である。この平行四辺形θ_ｊに対して膨張処理を行った結果得られる平行四辺形をζ_ｊと表すとき、ビルボード姿勢推定部２２８は、ζ_ｊの和集合
が人物のビルボードに外接するようにζ_ｊを決定する。このように決定されたζ_ｊをビルボード基準図形と称す。ビルボード基準図形ζ_ｊはビルボード再構成部２３０、２３２、２３４にてビルボードを変形する際の基準となる。 FIG. 9 is an explanatory diagram showing variables related to the billboard reference graphic. The billboard posture estimation unit 228 generates a figure Θ that roughly includes a person by translating the obtained circumscribed polygon α in the y direction on the image plane. Of this figure Θ, a parallelogram that includes two line segments that are the locus of the vertex of the circumscribed polygon α in translation (looks like a side on the image)
And M is a natural number. When the parallelogram obtained as a result of performing the expansion process on the parallelogram θ _j is expressed as ζ _j , the billboard posture estimation unit 228 sets the union of ζ _j
Ζ _j is determined so as to circumscribe the person's billboard. Ζ _j determined in this way is referred to as a billboard reference graphic. The billboard reference figure ζ _j is a reference when the billboard reconstructing units 230, 232, and 234 deform the billboard.

[ビルボード再構成部２３０、２３２、２３４]
ビルボード再構成部２３０は、背景差分部２０８で得られたビルボードとビルボード制御点決定部２２２で得られた制御点とビルボード姿勢推定部２２８で得られたビルボード基準図形とユーザ選択による仮想視点の情報とを基に、その仮想視点に最適なビルボードを構成する。ビルボード再構成部２３０は、画像読み込み部２０２によって取得された画像から仮想視点に対応する合成画像を合成する際に、ビルボード姿勢推定部２２８によって算出された接地位置に基づいて、ビルボードを変形する変形部として機能する。ビルボード再構成部２３２、２３４についても同様である。なお、本明細書では仮想視点を実際の視点から垂直方向に動かす場合について説明し、仮想視点の水平方向の移動については従来の手法と同様にビルボードの回転によって対応する。 [Billboard reconstruction unit 230, 232, 234]
The billboard reconstructing unit 230 selects the billboard obtained by the background difference unit 208, the control points obtained by the billboard control point determination unit 222, the billboard reference figure obtained by the billboard posture estimation unit 228, and the user selection. Based on the information on the virtual viewpoint, the optimal billboard for the virtual viewpoint is constructed. When the billboard reconstruction unit 230 synthesizes a composite image corresponding to the virtual viewpoint from the image acquired by the image reading unit 202, the billboard reconstruction unit 230 generates a billboard based on the ground contact position calculated by the billboard posture estimation unit 228. It functions as a deforming part that deforms. The same applies to the billboard reconstruction units 232 and 234. In this specification, the case where the virtual viewpoint is moved in the vertical direction from the actual viewpoint will be described, and the horizontal movement of the virtual viewpoint is handled by the rotation of the billboard as in the conventional method.

図１０は、ビルボード再構成部２３０、２３２、２３４におけるビルボードの再構成処理を示す説明図である。符号１５４で示される画像は、あるカメラにより撮像された時刻ｔの画像である。符号１５６で示される画像は、仮想視点から撮像された時刻ｔの画像として合成された合成画像である。 FIG. 10 is an explanatory diagram showing billboard reconfiguration processing in the billboard reconfiguration units 230, 232, and 234. An image indicated by reference numeral 154 is an image at time t captured by a certain camera. The image indicated by reference numeral 156 is a synthesized image synthesized as an image at time t captured from the virtual viewpoint.

ビルボード姿勢推定部２２８で得られた個々のビルボード基準図形ζ_ｊに内包される制御点の画像平面上の座標を
とする。
The coordinates on the image plane of the control points included in each billboard reference figure ζ _j obtained by the billboard posture estimation unit 228
And

ζ_ｊはその定義から明らかに接地点を２つもつ。ビルボード姿勢推定部２２８において接地点が対応するフィールド座標は算出されており、当然ながら仮想視点の移動によって当該フィールド座標は変わらない。この性質を利用し、ビルボード再構成部２３０は、垂直に移動した仮想視点とフィールドとの位置関係に基づいて、仮想視点の画像平面上に当該フィールド座標を射影する。これによりビルボード再構成部２３０は、当該フィールド座標に対応する接地点が仮想視点の画像上のどの位置に表示されるべきか（例えば、符号１５２の点の位置）を算出する。つまり、ビルボード再構成部２３０は、カメラから撮影した画像から得られたビルボード基準図形ζ_ｊが、仮想視点においてどのように見えるべきかについての情報を算出する。 ζ _j clearly has two ground points from its definition. The field coordinates corresponding to the contact point are calculated in the billboard posture estimation unit 228, and the field coordinates do not change due to the movement of the virtual viewpoint. Using this property, the billboard reconstruction unit 230 projects the field coordinates on the image plane of the virtual viewpoint based on the positional relationship between the virtual viewpoint moved vertically and the field. Thereby, the billboard reconstruction unit 230 calculates the position (for example, the position of the point of reference numeral 152) on which the ground point corresponding to the field coordinates should be displayed on the virtual viewpoint image. That is, the billboard reconstruction unit 230 calculates information about how the billboard reference figure ζ _j obtained from the image captured by the camera should appear in the virtual viewpoint.

新たに現在の仮想視点で得られる変形後のビルボード基準図形をη_ｊとする。ビルボード再構成部２３０は、写像
によって仮想視点の画面上に射影される
（仮想視点の画像平面上の座標を指す）を用いてビルボードの変形を行う。ここで、制御点の座標以外には写像Ｘ_ｊは適用されないことに注意されたい。ビルボード基準図形の境界における不自然さを抑制するためである。 Let η _j be a new billboard reference figure obtained from the current virtual viewpoint. Billboard reconstruction unit 230 maps
Projected onto the virtual viewpoint screen
The billboard is deformed using (points to the coordinates on the image plane of the virtual viewpoint). Note that the mapping X _j is not applied except for the coordinates of the control points. This is to suppress unnaturalness at the boundary of the billboard reference figure.

ビルボード再構成部２３０は、最終的には、接地点を含むビルボード上の制御点を動かすことでビルボードの変形を行う。この変形では、２次元画像をトライアングルメッシュとして扱い、制御点の移動に応じて変形させつつ、元の構造を崩さないようにする。この操作によって変形されたビルボードを仮想視点の画像に合成するビルボードとして採用する。なお、動かされた後の制御点のうちのいくつかの位置は、仮想視点の画像上の接地位置（例えば、符号１５２の点の位置）を含む。 The billboard reconstruction unit 230 finally deforms the billboard by moving control points on the billboard including the grounding point. In this modification, the two-dimensional image is treated as a triangle mesh, and the original structure is not destroyed while being deformed according to the movement of the control point. The billboard deformed by this operation is employed as a billboard for combining with a virtual viewpoint image. Note that some positions of the control points after being moved include a ground contact position on the virtual viewpoint image (for example, the position of the point 152).

合成映像出力部２３６は、ビルボード再構成部２３０、２３２、２３４によるビルボードの変形の結果得られる変形後のビルボードを、仮想視点の合成画像に合成する合成部として機能する。合成映像出力部２３６は、変形後のビルボードが合成された合成画像を、仮想視点からの画像として生成し、ネットワーク１１２を介して携帯端末１１４に送信する。 The combined video output unit 236 functions as a combining unit that combines the deformed billboard obtained as a result of the deformation of the billboard by the billboard reconstructing units 230, 232, and 234 with the composite image of the virtual viewpoint. The synthesized video output unit 236 generates a synthesized image obtained by synthesizing the deformed billboard as an image from the virtual viewpoint, and transmits the generated image to the mobile terminal 114 via the network 112.

以上の構成による画像処理装置２００の動作を説明する。
図１１は、画像処理装置２００における一連の処理の流れを示すフローチャートである。画像処理装置２００は、地面と接する人物の周りに設定された複数の視点のそれぞれに配置されたカメラなどの撮像装置から、人物の像を含む画像を取得する（Ｓ１２）。画像処理装置２００は、取得された画像のそれぞれから人物領域の２値マスクを抽出する（Ｓ１４）。画像処理装置２００は、抽出された２値マスクのそれぞれを実空間内のフィールドに射影する（Ｓ１６）。画像処理装置２００は、射影された像同士の重なりがより大きい部分の位置を接地位置として推定する（Ｓ１８）。画像処理装置２００は、カメラから取得した画像からビルボードを切り出し、そのビルボードに制御点を設定する（Ｓ２０）。画像処理装置２００は、推定された接地位置に基づいて、制御点の動かし方を決定する（Ｓ２２）。画像処理装置２００は、決定された動かし方で制御点を動かすことでビルボードを変形する（Ｓ２４）。画像処理装置２００は、変形後のビルボードを画像に合成し、仮想視点からの画像として出力する（Ｓ２６）。 The operation of the image processing apparatus 200 having the above configuration will be described.
FIG. 11 is a flowchart showing a flow of a series of processes in the image processing apparatus 200. The image processing apparatus 200 acquires an image including an image of a person from an imaging apparatus such as a camera arranged at each of a plurality of viewpoints set around a person in contact with the ground (S12). The image processing apparatus 200 extracts a binary mask of the person area from each of the acquired images (S14). The image processing apparatus 200 projects each of the extracted binary masks onto a field in real space (S16). The image processing apparatus 200 estimates the position of the portion where the projected images overlap more largely as the ground contact position (S18). The image processing apparatus 200 cuts out a billboard from the image acquired from the camera, and sets a control point on the billboard (S20). The image processing apparatus 200 determines how to move the control point based on the estimated contact position (S22). The image processing apparatus 200 deforms the billboard by moving the control points in the determined manner of movement (S24). The image processing apparatus 200 combines the deformed billboard with the image and outputs it as an image from the virtual viewpoint (S26).

上述の実施の形態において、保持部の例は、ハードディスクや半導体メモリである。また、本明細書の記載に基づき、各部を、図示しないＣＰＵや、インストールされたアプリケーションプログラムのモジュールや、システムプログラムのモジュールや、ハードディスクから読み出したデータの内容を一時的に記憶する半導体メモリなどにより実現できることは本明細書に触れた当業者には理解される。 In the embodiment described above, examples of the holding unit are a hard disk and a semiconductor memory. Further, based on the description of the present specification, each unit is configured by a CPU (not shown), a module of an installed application program, a module of a system program, a semiconductor memory that temporarily stores the content of data read from the hard disk, or the like. It will be appreciated by those skilled in the art who have touched this specification that this can be achieved.

本実施の形態に係る画像処理装置２００によると、従来のビルボード方式で仮想視点を垂直方向に移動させた際に問題となる、不自然な表示を解消することができる。また、本実施の形態ではビルボードを変形するために人物の姿勢を推定しているが、実際の正確な３次元モデル（厚みのあるモデル）を必要としないため、そのような３次元モデル構築を行う手法に比べて計算コストが低い。 According to the image processing apparatus 200 according to the present embodiment, it is possible to eliminate an unnatural display that becomes a problem when the virtual viewpoint is moved in the vertical direction by the conventional billboard method. In this embodiment, the posture of a person is estimated to deform the billboard. However, since an actual accurate three-dimensional model (thick model) is not required, such a three-dimensional model is constructed. The calculation cost is lower than the method of performing the above.

以上、実施の形態に係る画像処理装置２００の構成と動作について説明した。この実施の形態は例示であり、各構成要素や各処理の組み合わせにいろいろな変形例が可能なこと、またそうした変形例も本発明の範囲にあることは当業者に理解される。 The configuration and operation of the image processing apparatus 200 according to the embodiment have been described above. This embodiment is an exemplification, and it will be understood by those skilled in the art that various modifications can be made to each component and combination of processes, and such modifications are within the scope of the present invention.

実施の形態では、制御点を使用してビルボードを変形する場合について説明したが、これに限られず、例えばビルボード基準図形ζ_ｊに含まれる画素全てに写像Ｘ_ｊを適用してもよい。 In the embodiment, the description has been given of the case to deform the billboard using control points is not limited thereto and may be applied to mapping X _j for example, all pixels included in the billboard reference graphic zeta _j.

１１０自由視点画像配信システム、１１２ネットワーク、１１４携帯端末、２００画像処理装置。 110 free viewpoint image distribution system, 112 network, 114 mobile terminal, 200 image processing apparatus.

Claims

実空間内に定義された基準面に接する被写体を複数の視点から撮像することにより得られる画像を取得する取得部と、
前記取得部によって取得された画像で得られた実空間との対応付け情報を利用して前記被写体の接地位置を推定し、前記基準面と前記被写体とが接する前記基準面上の位置が３次元の実空間でどのような姿勢であるかを表す多角形から成るビルボード基準図形を算出する算出部と、
前記取得部によって取得された画像から前記複数の視点に含まれない視点に対応する合成画像を合成する際に、前記算出部によって算出された前記ビルボード基準図形に基づいて、前記取得部によって取得された画像に含まれる前記被写体の像を仮想視点の情報を基に変形する変形部と、を備えることを特徴とする画像処理装置。 An acquisition unit for acquiring an image obtained by capturing an image of a subject in contact with a reference plane defined in real space from a plurality of viewpoints;
The ground contact position of the subject is estimated using association information with the real space obtained from the image obtained by the obtaining unit, and the position on the reference surface where the reference surface and the subject are in contact is three-dimensional. A calculation unit for calculating a billboard reference figure composed of polygons representing a posture in a real space of
Acquired by the acquisition unit based on the billboard reference graphic calculated by the calculation unit when combining a composite image corresponding to a viewpoint that is not included in the plurality of viewpoints from the image acquired by the acquisition unit. An image processing apparatus comprising: a deforming unit that deforms an image of the subject included in the image based on virtual viewpoint information .

前記変形部による変形の結果得られる像を前記合成画像に合成する合成部をさらに備えることを特徴とする請求項１に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising a combining unit that combines an image obtained as a result of deformation by the deforming unit with the combined image.

前記変形部は、前記複数の視点に含まれない視点と前記基準面との位置関係に基づいて、前記算出部によって算出された位置に対応する前記合成画像上の対応位置を算出し、算出された対応位置に基づいて前記被写体の像を変形することを特徴とする請求項１または２に記載の画像処理装置。 The deformation unit calculates and calculates a corresponding position on the composite image corresponding to the position calculated by the calculation unit based on a positional relationship between viewpoints not included in the plurality of viewpoints and the reference plane. The image processing apparatus according to claim 1, wherein the image of the subject is deformed based on the corresponding position.

前記被写体の像に少なくともひとつの制御点を設定する設定部をさらに備え、
前記変形部は、前記設定部によって設定された少なくともひとつの制御点を動かすことで前記被写体の像を変形し、
動かされた後の少なくともひとつの制御点の位置は算出された対応位置を含むことを特徴とする請求項３に記載の画像処理装置。 A setting unit for setting at least one control point on the subject image;
The deformation unit deforms the image of the subject by moving at least one control point set by the setting unit,
The image processing apparatus according to claim 3, wherein the position of at least one control point after being moved includes a calculated corresponding position.

前記基準面は前記実空間内の表面であることを特徴とする請求項１から４のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, wherein the reference plane is a surface in the real space.

前記被写体の像を抽象化した抽象化像を生成する抽象化部をさらに備え、
前記設定部は、前記抽象化部により生成された抽象化像に対して制御点を設定することを特徴とする請求項４に記載の画像処理装置。 An abstraction unit that generates an abstract image obtained by abstracting the image of the subject;
The image processing apparatus according to claim 4, wherein the setting unit sets a control point for the abstract image generated by the abstraction unit.

前記設定部は、前記抽象化部により生成された抽象化像の分岐部または端部もしくはその両方を制御点として設定することを特徴とする請求項６に記載の画像処理装置。 The image processing apparatus according to claim 6, wherein the setting unit sets a branch portion and / or an end portion of the abstract image generated by the abstraction unit as a control point.

前記算出部は、前記複数の視点のそれぞれから前記被写体を撮像することにより得られる画像と前記基準面との対応関係に基づいて、前記基準面と前記被写体とが接する前記基準面上の位置を算出することを特徴とする請求項１から７のいずれか１項に記載の画像処理装置。 The calculation unit determines a position on the reference plane where the reference plane and the subject are in contact with each other based on a correspondence relationship between an image obtained by imaging the subject from each of the plurality of viewpoints and the reference plane. The image processing apparatus according to claim 1, wherein the image processing apparatus calculates the image processing apparatus.

実空間内に定義された基準面に接する被写体を複数の視点から撮像することにより得られる画像を取得することと、
前記取得された画像で得られた実空間との対応付け情報を利用して前記被写体の接地位置を推定し、前記基準面と前記被写体とが接する前記基準面上の位置が３次元の実空間でどのような姿勢であるかを表す多角形から成るビルボード基準図形を算出することと、
前記取得された画像から前記複数の視点に含まれない視点に対応する合成画像を合成する際に、前記算出された前記ビルボード基準図形に基づいて、前記取得された画像に含まれる前記被写体の像を仮想視点の情報を基に変形することと、を含むことを特徴とする画像処理方法。 Obtaining an image obtained by imaging a subject in contact with a reference plane defined in real space from a plurality of viewpoints;
The contact position of the subject is estimated using association information with the real space obtained from the acquired image , and the position on the reference plane where the reference plane and the subject are in contact is a three-dimensional real space Calculating a billboard reference figure consisting of a polygon that represents the posture in
When synthesizing a synthesized image corresponding to a viewpoint that is not included in the plurality of viewpoints from the acquired image, based on the calculated billboard reference graphic , the subject included in the acquired image is displayed. An image processing method comprising: transforming an image based on virtual viewpoint information .

実空間内に定義された基準面に接する被写体を複数の視点から撮像することにより得られる画像を取得する機能と、
前記取得された画像で得られた実空間との対応付け情報を利用して前記被写体の接地位置を推定し、前記基準面と前記被写体とが接する前記基準面上の位置が３次元の実空間でどのような姿勢であるかを表す多角形から成るビルボード基準図形を算出する機能と、
前記取得された画像から前記複数の視点に含まれない視点に対応する合成画像を合成する際に、前記算出された前記ビルボード基準図形に基づいて、前記取得された画像に含まれる前記被写体の像を仮想視点の情報を基に変形する機能と、をコンピュータに実現させるための画像処理プログラム。 A function for acquiring an image obtained by imaging a subject in contact with a reference plane defined in real space from a plurality of viewpoints;
The contact position of the subject is estimated using association information with the real space obtained from the acquired image , and the position on the reference plane where the reference plane and the subject are in contact is a three-dimensional real space A function to calculate a billboard reference figure composed of polygons representing what kind of posture
When synthesizing a synthesized image corresponding to a viewpoint that is not included in the plurality of viewpoints from the acquired image, based on the calculated billboard reference graphic , the subject included in the acquired image is displayed. An image processing program for causing a computer to realize a function of transforming an image based on virtual viewpoint information .