JP2020173726A

JP2020173726A - Virtual viewpoint conversion device and program

Info

Publication number: JP2020173726A
Application number: JP2019076605A
Authority: JP
Inventors: 俊枝三須; Toshie Misu; 秀樹三ツ峰; Hideki Mitsumine
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-10-22
Anticipated expiration: 2039-04-12
Also published as: JP7352374B2

Abstract

To appropriately composite an area having a predetermined video feature, such as the shadow of a subject to create a more natural virtual viewpoint video.SOLUTION: A second subject extraction unit 12 of a virtual viewpoint conversion device 1 extracts a second subject from an input video I to create a key video F. A composition unit 13 composites the input video I for a background video B by keying based on the key video F to create a background video A with composition. A first projection conversion unit 14 projects and converts the background video A with conversion according to a camera parameter to create a virtual viewpoint video L of the background. A second projection conversion unit 16 projects and converts the input video I and a key video K by using a billboard and the camera parameter to create a virtual viewpoint video Mi of the foreground and a virtual viewpoint video Ni of the key. A composition unit 17 composites the virtual viewpoint video L of the background and the virtual viewpoint video Mi of the foreground based on the virtual viewpoint video Ni of the key to create and output a virtual viewpoint video J.SELECTED DRAWING: Figure 1

Description

本発明は、入力映像の撮影時とは異なる視点の映像を仮想的に生成する仮想視点変換装置及びプログラムに関する。 The present invention relates to a virtual viewpoint conversion device and a program that virtually generates an image of a viewpoint different from that at the time of shooting an input image.

従来、撮影時の入力映像を、撮影時とは異なる視点の映像に仮想的に変換し、仮想視点映像を生成する手法が知られている。例えば、ビデオゲームにおいては、被写体データの容量及び演算コストを削減するため、カメラの方向を指向する部分平面にて被写体を表現するビルボードモデルが用いられることがある（例えば、特許文献１〜３を参照）。 Conventionally, there has been known a method of virtually converting an input image at the time of shooting into an image of a viewpoint different from that at the time of shooting to generate a virtual viewpoint image. For example, in a video game, a billboard model that expresses a subject on a partial plane that points in the direction of the camera may be used in order to reduce the capacity of subject data and the calculation cost (for example, Patent Documents 1 to 3). See).

また、実写映像に基づく仮想空間描画方法として、演算の高速化を実現するため、仮想空間内の仮想物体の実写画像に基づく空間データをビルボード画像データとして扱うものがある（例えば、特許文献４を参照）。 Further, as a virtual space drawing method based on a live-action image, there is a method of treating spatial data based on a live-action image of a virtual object in a virtual space as billboard image data in order to realize high-speed calculation (for example, Patent Document 4). See).

前述の特許文献１〜３の手法におけるビルボードモデルは、ビデオゲームのように、予め被写体情報をデータ化しておくことが可能な場合に多用される。また、前述の特許文献４の手法は、実写映像に適用するものであるが、ビルボード画像は、予め内部メモリに格納されていることが前提となっており、ビルボード画像自体の生成方法については言及されていない。 The billboard model in the methods of Patent Documents 1 to 3 described above is often used when it is possible to convert subject information into data in advance, such as in a video game. Further, although the method of Patent Document 4 described above is applied to a live-action video, it is premised that the billboard image is stored in the internal memory in advance, and the method of generating the billboard image itself is described. Is not mentioned.

そこで、仮想視点映像を生成する場合に、予め被写体情報をデータ化しておく必要がなく、かつ、予めビルボート画像を内部メモリに格納しておく必要のない手法が提案されている（例えば、非特許文献１を参照）。 Therefore, when generating a virtual viewpoint image, a method has been proposed in which it is not necessary to convert the subject information into data in advance and it is not necessary to store the billboard image in the internal memory in advance (for example, non-patent). See Patent Document 1).

非特許文献１の手法は、複数のカメラで撮影された入力映像から被写体領域をそれぞれ抽出し、複数の被写体領域の対応付けを行い、フィールド平面上の２次元座標に基づくビルボードモデルを生成し、３次元ＣＧ空間を生成するものである。 In the method of Non-Patent Document 1, subject areas are extracted from input images taken by a plurality of cameras, a plurality of subject areas are associated with each other, and a billboard model based on two-dimensional coordinates on a field plane is generated. It creates a three-dimensional CG space.

これにより、撮影時点とは異なる視点位置から仮想的に撮影した映像を生成することができ、実写ベースのレンダリングによる写実的な仮想視点移動を実現することができる。 As a result, it is possible to generate a virtually shot image from a viewpoint position different from that at the time of shooting, and it is possible to realize a realistic virtual viewpoint movement by rendering based on live action.

特許第６４４１８４３号公報Japanese Patent No. 6441843 特許第６３５１６４７号公報Japanese Patent No. 6351647 特許第４５９２０８７号公報Japanese Patent No. 4592087 特許第３４８６５７９号公報Japanese Patent No. 3486579

三巧浩嗣、内藤整、“選手領域の抽出と追跡によるサッカーの自由視点映像生成”、映像情報メディア学会誌、Vol.68、No.3、pp.J125−J134（2014）Hirotsugu Sankaku, Sei Naito, "Free-viewpoint video generation of soccer by extracting and tracking player areas", Journal of the Institute of Video and Information Media, Vol.68, No.3, pp.J125-J134 (2014)

前述の非特許文献１の手法は、背景差分法にて入力映像から被写体映像及び背景映像を抽出し、ビルボードモデルを用いて射影変換し、撮影時点とは異なる視点位置から仮想的に撮影した映像（仮想視点映像）を得るものである。 In the method of Non-Patent Document 1 described above, the subject image and the background image are extracted from the input image by the background subtraction method, projective conversion is performed using a billboard model, and the image is virtually photographed from a viewpoint position different from the time of photography. It obtains images (virtual viewpoint images).

例えば入力映像が日中に撮影された場合には、被写体に影が存在することとなり、被写体に影が付加された仮想視点映像が生成される。しかし、影はＣＧ処理により被写体に後付けしたり、背景差分法をそのまま用いたりすることで、結果として、影に違和感のある仮想視点映像が生成されてしまうという問題があった。 For example, when the input video is shot in the daytime, a shadow exists on the subject, and a virtual viewpoint video in which the shadow is added to the subject is generated. However, there is a problem that shadows are retrofitted to the subject by CG processing or the background subtraction method is used as it is, and as a result, a virtual viewpoint image having a sense of discomfort in the shadows is generated.

例えば背景差分法をそのまま用いると、入力映像から影を含む被写体映像が抽出され、影が被写体と同様に射影変換されることから、影が正しい位置に反映されず、違和感のある影となってしまうからである。 For example, if the background subtraction method is used as it is, a subject image including shadows is extracted from the input image, and the shadows are projected and converted in the same manner as the subject. This is because it ends up.

このため、被写体の影のような所定の映像特徴を有する領域を、仮想視点映像内の正しい位置に反映させることで、ユーザにとって違和感のない仮想視点映像を得ることが所望されていた。 Therefore, it has been desired to obtain a virtual viewpoint image that does not give a sense of discomfort to the user by reflecting a region having a predetermined image feature such as a shadow of a subject at a correct position in the virtual viewpoint image.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、撮影時の入力映像を、撮影時とは異なる視点の映像に仮想的に変換する際に、被写体の影等の所定の映像特徴を有する領域を適切に合成することで、一層自然な仮想視点映像を生成可能な仮想視点変換装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above problems, and an object of the present invention is to create a shadow of a subject when virtually converting an input image at the time of shooting into an image at a viewpoint different from that at the time of shooting. It is an object of the present invention to provide a virtual viewpoint conversion device and a program capable of generating a more natural virtual viewpoint image by appropriately synthesizing a region having a predetermined image feature.

前記課題を解決するために、請求項１の仮想視点変換装置は、撮影時の入力映像を、前記撮影時とは異なる視点の映像に仮想的に変換することで、仮想視点映像を生成する仮想視点変換装置において、前記入力映像から背景映像を生成する背景生成部と、前記入力映像から第一被写体の領域を抽出し、前記第一被写体の形状及び所定の画素値を有する第一キー映像を生成する第一被写体抽出部と、前記入力映像から所定の映像特徴を有する第二被写体の領域を抽出し、前記第二被写体の形状及び所定の画素値を有する第二キー映像を生成する第二被写体抽出部と、前記第二キー映像の示す前記入力映像の部分を第二被写体映像とし、当該第二被写体映像に対し、前記入力映像のカメラパラメータ及び前記仮想視点映像のカメラパラメータを用いて第一の射影変換を行い、前記第二被写体の仮想視点映像を生成する第一射影変換部と、前記第一被写体抽出部により生成された前記第一キー映像、及び前記入力映像のカメラパラメータに基づいて、ビルボードを設定するビルボード設定部と、前記入力映像及び前記第一被写体抽出部により生成された前記第一キー映像に対し、前記入力映像のカメラパラメータ、前記仮想視点映像のカメラパラメータ及び前記ビルボード設定部により設定された前記ビルボードを用いて第二の射影変換を行い、前記第一被写体の仮想視点映像を生成すると共に、前記第一被写体の形状及び前記所定の画素値を有する第一キーの仮想視点映像を生成する第二射影変換部と、前記第二射影変換部により生成された前記第一キーの仮想視点映像に基づいて、前記第一射影変換部により生成された前記第二被写体の仮想視点映像、及び前記第二射影変換部により生成された前記第一被写体の仮想視点映像を合成することで、前記仮想視点映像を生成する合成部と、を備えたことを特徴とする。 In order to solve the above problem, the virtual viewpoint conversion device according to claim 1 generates a virtual viewpoint image by virtually converting an input image at the time of shooting into an image of a viewpoint different from that at the time of shooting. In the viewpoint conversion device, a background generator that generates a background image from the input image and a first key image that extracts a region of the first subject from the input image and has the shape of the first subject and a predetermined pixel value. The second subject extraction unit to be generated and the region of the second subject having a predetermined image feature are extracted from the input image, and the second key image having the shape of the second subject and the predetermined pixel value is generated. The subject extraction unit and the portion of the input image indicated by the second key image are used as the second subject image, and the camera parameters of the input image and the camera parameters of the virtual viewpoint image are used for the second subject image. Based on the camera parameters of the first projection conversion unit that performs one projection conversion and generates the virtual viewpoint image of the second subject, the first key image generated by the first subject extraction unit, and the input image. With respect to the billboard setting unit that sets the billboard, the input image, and the first key image generated by the first subject extraction unit, the camera parameters of the input image, the camera parameters of the virtual viewpoint image, and the camera parameters of the virtual viewpoint image. The second projection conversion is performed using the billboard set by the billboard setting unit to generate a virtual viewpoint image of the first subject, and the shape of the first subject and the predetermined pixel value are obtained. The second projection conversion unit that generates the virtual viewpoint image of the first key and the first projection conversion unit that is generated based on the virtual viewpoint image of the first key generated by the second projection conversion unit. The feature is that the virtual viewpoint image of the second subject and the compositing unit that generates the virtual viewpoint image by synthesizing the virtual viewpoint image of the first subject generated by the second projection conversion unit are provided. And.

また、請求項２の仮想視点変換装置は、撮影時の入力映像を、前記撮影時とは異なる視点の映像に仮想的に変換することで、仮想視点映像を生成する仮想視点変換装置において、前記入力映像から背景映像を生成する背景生成部と、前記入力映像から第一被写体の領域を抽出し、前記第一被写体の形状及び所定の画素値を有する第一キー映像を生成する第一被写体抽出部と、前記入力映像から所定の映像特徴を有する第二被写体の領域を抽出し、前記第二被写体及び所定の画素値を有する第二キー映像を生成する第二被写体抽出部と、前記背景生成部により生成された前記背景映像に対し、前記第二被写体抽出部により生成された前記第二キー映像の示す前記入力映像の部分を合成することで、合成あり背景映像を生成する背景合成部と、前記背景合成部により生成された前記合成あり背景映像に対し、前記入力映像のカメラパラメータ及び前記仮想視点映像のカメラパラメータを用いて第一の射影変換を行い、背景の仮想視点映像を生成する第一射影変換部と、前記第一被写体抽出部により生成された前記第一キー映像、及び前記入力映像のカメラパラメータに基づいて、ビルボードを設定するビルボード設定部と、前記入力映像及び前記第一被写体抽出部により生成された前記第一キー映像に対し、前記入力映像のカメラパラメータ、前記仮想視点映像のカメラパラメータ及び前記ビルボード設定部により設定された前記ビルボードを用いて第二の射影変換を行い、前記第一被写体の仮想視点映像を生成すると共に、前記第一被写体の形状及び前記所定の画素値を有する第一キーの仮想視点映像を生成する第二射影変換部と、前記第二射影変換部により生成された前記第一キーの仮想視点映像に基づいて、前記第一射影変換部により生成された前記背景の仮想視点映像、及び前記第二射影変換部により生成された前記第一被写体の仮想視点映像を合成することで、前記仮想視点映像を生成する合成部と、を備えたことを特徴とする。 Further, the virtual viewpoint conversion device according to claim 2 is the virtual viewpoint conversion device that generates a virtual viewpoint image by virtually converting an input image at the time of shooting into an image of a viewpoint different from that at the time of shooting. A background generator that generates a background image from an input image, and a first subject extraction that extracts a region of a first subject from the input image and generates a first key image having the shape of the first subject and a predetermined pixel value. A second subject extraction unit that extracts a region of a second subject having a predetermined image feature from the input image and generates a second key image having the second subject and a predetermined pixel value, and the background generation. By synthesizing the part of the input image indicated by the second key image generated by the second subject extraction unit with the background image generated by the unit, the background synthesizing unit that generates the background image with composition The background image with composition generated by the background compositing unit is subjected to the first projection conversion using the camera parameters of the input image and the camera parameters of the virtual viewpoint image to generate the virtual viewpoint image of the background. The billboard setting unit that sets the billboard based on the camera parameters of the first projection conversion unit, the first key image generated by the first subject extraction unit, and the input image, the input image, and the input image. For the first key image generated by the first subject extraction unit, the camera parameters of the input image, the camera parameters of the virtual viewpoint image, and the billboard set by the billboard setting unit are used for the second. A second projection conversion unit that performs projection conversion to generate a virtual viewpoint image of the first subject and also generates a virtual viewpoint image of the first key having the shape of the first subject and the predetermined pixel value, and the above. Based on the virtual viewpoint image of the first key generated by the second projection conversion unit, the virtual viewpoint image of the background generated by the first projection conversion unit and the above-mentioned generated by the second projection conversion unit. It is characterized by including a compositing unit that generates the virtual viewpoint image by synthesizing the virtual viewpoint image of the first subject.

また、請求項３の仮想視点変換装置は、請求項１または２に記載の仮想視点変換装置において、前記背景生成部が、前記入力映像の複数フレームから前記背景映像を生成し、前記第二被写体抽出部が、前記入力映像の単一フレームにおける所定の画素値特徴を有する領域を前記第二被写体の領域として抽出する、ことを特徴とする。 Further, in the virtual viewpoint conversion device according to claim 3, in the virtual viewpoint conversion device according to claim 1, the background generation unit generates the background image from a plurality of frames of the input image, and the second subject. The extraction unit is characterized in that a region having a predetermined pixel value feature in a single frame of the input video is extracted as a region of the second subject.

さらに、請求項４のプログラムは、コンピュータを、請求項１から３までのいずれか一項に記載の仮想視点変換装置として機能させることを特徴とする。 Further, the program of claim 4 is characterized in that the computer functions as the virtual viewpoint conversion device according to any one of claims 1 to 3.

以上のように、本発明によれば、撮影時の入力映像を、撮影時とは異なる視点の映像に仮想的に変換する際に、被写体の影等の所定の映像特徴を有する領域を適切に合成することで、一層自然な仮想視点映像を生成することが可能となる。 As described above, according to the present invention, when virtually converting an input image at the time of shooting into an image at a viewpoint different from that at the time of shooting, an area having a predetermined image feature such as a shadow of a subject is appropriately defined. By synthesizing, it becomes possible to generate a more natural virtual viewpoint image.

本発明の実施形態による仮想視点変換装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the virtual viewpoint conversion apparatus by embodiment of this invention. 仮想視点変換装置の処理例を示すフローチャートである。It is a flowchart which shows the processing example of the virtual viewpoint conversion apparatus. 第一射影変換部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the 1st projective transformation part. 第一射影変換部の動作を説明する図である。It is a figure explaining the operation of the first projective transformation part. ビルボード設定部の動作を説明する図である。It is a figure explaining the operation of the billboard setting part. 第二射影変換部の動作を説明する図である。It is a figure explaining the operation of the 2nd projective transformation part. 第二射影変換部の構成例を示すブロック図である。It is a block diagram which shows the structural example of the 2nd projective transformation part. 本発明の他の実施形態による仮想視点変換装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the virtual viewpoint conversion apparatus by another Embodiment of this invention.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。本発明は、第一被写体と、第一被写体の影等の所定の映像特徴を有する第二被写体とをそれぞれ抽出し、これらに対して異なる射影変換を適用し、射影変換後の映像を合成することで、異なる視点から見た映像を仮想的に生成することを特徴とする。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. In the present invention, the first subject and the second subject having predetermined image features such as the shadow of the first subject are extracted, different projection conversions are applied to them, and the image after the projection conversion is synthesized. This is characterized by virtually generating images viewed from different viewpoints.

これにより、撮影時の入力映像を、撮影時とは異なる視点の映像に仮想的に変換する際に、第二被写体を適切に合成することができ、一層自然な仮想視点映像を生成することが可能となる。 As a result, when the input image at the time of shooting is virtually converted into an image at a viewpoint different from that at the time of shooting, the second subject can be appropriately combined, and a more natural virtual viewpoint image can be generated. It will be possible.

〔仮想視点変換装置〕
以下、本発明の実施形態による仮想視点変換装置について説明する。図１は、本発明の実施形態による仮想視点変換装置の構成例を示すブロック図である。この仮想視点変換装置１は、背景生成部１０、第一被写体抽出部１１、第二被写体抽出部１２、合成部（背景合成部）１３、第一射影変換部１４、ビルボード設定部１５、第二射影変換部１６及び合成部１７を備えている。 [Virtual viewpoint conversion device]
Hereinafter, the virtual viewpoint conversion device according to the embodiment of the present invention will be described. FIG. 1 is a block diagram showing a configuration example of a virtual viewpoint conversion device according to an embodiment of the present invention. The virtual viewpoint conversion device 1 includes a background generation unit 10, a first subject extraction unit 11, a second subject extraction unit 12, a composition unit (background composition unit) 13, a first projection conversion unit 14, a billboard setting unit 15, and a third. It includes a two-projection conversion unit 16 and a composition unit 17.

仮想視点変換装置１は、入力映像Ｉ、入力映像Ｉのカメラパラメータ及び仮想視点映像Ｊのカメラパラメータに基づいて、入力映像Ｉを幾何学的に変換する際に、被写体（第一被写体）の影（第二被写体）を背景映像Ｂに合成し、仮想視点映像Ｊを生成する。 The virtual viewpoint conversion device 1 geometrically converts the input image I based on the input image I, the camera parameters of the input image I, and the camera parameters of the virtual viewpoint image J, and the shadow of the subject (first subject). (Second subject) is combined with the background image B to generate a virtual viewpoint image J.

カメラパラメータは、カメラの光学主点に関する視点位置情報を含むものとする。尚、カメラパラメータは、さらに、姿勢（例えば、パン、チルト及びロールの各角度）、画角（またはレンズの焦点距離）、レンズひずみ、露出値（アイリス、シャッター速度、感度等）、色補正値等の一部または全部を含むようにしてもよい。 The camera parameters shall include the viewpoint position information regarding the optical principal point of the camera. The camera parameters also include attitude (for example, pan, tilt, and roll angles), angle of view (or lens focal length), lens distortion, exposure value (iris, shutter speed, sensitivity, etc.), and color correction value. Etc. may be included in part or in whole.

以下、時刻ｔ及び画像座標（ｘ，ｙ）における映像の画素値は、映像を表す文字の後に（ｔ；ｘ，ｙ）を付して示すものとする。例えば、入力映像Ｉの時刻ｔ及び画像座標（ｘ，ｙ）における画素値をＩ（ｔ；ｘ，ｙ）と記す。尚、画素値はスカラー量（例えば、モノクロ映像の場合）であってもよいし、ベクトル量（例えば、カラー映像の場合、赤、緑及び青の３成分からなるベクトル値）であってもよい。 Hereinafter, the pixel value of the image at the time t and the image coordinates (x, y) shall be indicated by adding (t; x, y) after the character representing the image. For example, the pixel value at the time t and the image coordinates (x, y) of the input video I is described as I (t; x, y). The pixel value may be a scalar amount (for example, in the case of a monochrome image) or a vector amount (for example, in the case of a color image, a vector value composed of three components of red, green, and blue). ..

図２は、仮想視点変換装置１の処理例を示すフローチャートである。以下、図１及び図２を参照して、仮想視点変換装置１の各構成部について説明する。 FIG. 2 is a flowchart showing a processing example of the virtual viewpoint conversion device 1. Hereinafter, each component of the virtual viewpoint conversion device 1 will be described with reference to FIGS. 1 and 2.

（背景生成部１０）
背景生成部１０は、時系列の入力映像Ｉ（入力映像Ｉの複数フレーム）から、動物体を除去した背景映像Ｂを生成し（ステップＳ２０１）、背景映像Ｂを第一被写体抽出部１１及び合成部１３に出力する。背景映像Ｂの生成処理は既知であり、例えば背景差分法を用いることができる。背景差分法の詳細については、例えば特許第５２２７２２６号公報の段落４４及び数式８を参照されたい。 (Background generation unit 10)
The background generation unit 10 generates a background image B from which the animal body is removed from the time-series input image I (multiple frames of the input image I) (step S201), and combines the background image B with the first subject extraction unit 11. Output to unit 13. The process of generating the background image B is known, and for example, the background subtraction method can be used. For details of the background subtraction method, refer to paragraph 44 and Equation 8 of Japanese Patent No. 5227226, for example.

（第一被写体抽出部１１）
第一被写体抽出部１１は、背景生成部１０から背景映像Ｂを入力する。そして、第一被写体抽出部１１は、入力映像Ｉ、及び背景生成部１０により入力映像Ｉの複数フレームから生成された背景映像Ｂに基づいて、被写体（第一被写体）とそれ以外の箇所（背景映像Ｂ）とを区別して被写体の領域を抽出し、被写体の形状を表し、かつ当該被写体の領域と他の領域とを区別する画素値を有するキー映像Ｋを生成する（ステップＳ２０２）。そして、第一被写体抽出部１１は、キー映像Ｋをビルボード設定部１５及び第二射影変換部１６に出力する。以下、被写体は第一被写体を示すものとする。 (First subject extraction unit 11)
The first subject extraction unit 11 inputs the background image B from the background generation unit 10. Then, the first subject extraction unit 11 determines the subject (first subject) and other parts (background) based on the input image I and the background image B generated from the plurality of frames of the input image I by the background generation unit 10. The area of the subject is extracted to distinguish it from the image B), and a key image K having a pixel value representing the shape of the subject and distinguishing the area of the subject from another area is generated (step S202). Then, the first subject extraction unit 11 outputs the key image K to the billboard setting unit 15 and the second projection conversion unit 16. Hereinafter, the subject shall indicate the first subject.

キー映像Ｋは２値映像であってもよいし（例えば、被写体に属する画素の画素値を１とし、それ以外の画素の画素値を０とする）、多値映像であってもよい（例えば、被写体に属する画素の画素値を１とし、それ以外の画素の画素値を０とするが、被写体の境界部については０より大きく１未満の数値とする）。 The key image K may be a binary image (for example, the pixel value of the pixel belonging to the subject is 1 and the pixel value of the other pixels is 0), or the key image K may be a multi-value image (for example). , The pixel value of the pixel belonging to the subject is set to 1, and the pixel value of the other pixels is set to 0, but the boundary portion of the subject is set to a value larger than 0 and less than 1.).

例えば第一被写体抽出部１１は、以下の式にて、背景生成部１０により生成された背景映像Ｂと入力映像Ｉとを比較することで、キー映像Ｋを生成する。

関数φ（ｐ，ｑ）は、画素値ｐと画素値ｑとの差異に応じて被写体か否かを判定する関数である。 For example, the first subject extraction unit 11 generates the key image K by comparing the background image B generated by the background generation unit 10 with the input image I by the following formula.

The function φ (p, q) is a function for determining whether or not the subject is a subject according to the difference between the pixel value p and the pixel value q.

例えば関数φとして、以下の式のように、画素値ｐと画素値ｑとの間の差に対するノルム値（例えばユークリッド距離、マンハッタン距離、チェビシェフ距離）に応じて出力値を決定する関数が用いられる。この場合のφ（ｐ，ｑ）は、１（画素値ｐと画素値ｑとの間の差の絶対値が予め設定された閾値θよりも大きい場合）または０（画素値ｐと画素値ｑとの間の差の絶対値が閾値θ以下である場合）のいずれかの値となる。

For example, as the function φ, a function that determines the output value according to the norm value (for example, Euclidean distance, Manhattan distance, Chebyshev distance) with respect to the difference between the pixel value p and the pixel value q is used as in the following equation. .. In this case, φ (p, q) is 1 (when the absolute value of the difference between the pixel value p and the pixel value q is larger than the preset threshold value θ) or 0 (pixel value p and the pixel value q). When the absolute value of the difference between and is equal to or less than the threshold value θ), it becomes one of the values.

（第二被写体抽出部１２）
第二被写体抽出部１２は、入力映像Ｉの単一フレームから、所定の映像特徴を有する領域（第二被写体の領域）を抽出し、当該領域の形状を表し、かつ当該領域と他の領域とを区別する画素値を有するキー映像Ｆを生成し（ステップＳ２０３）、キー映像Ｆを合成部１３に出力する。 (Second subject extraction unit 12)
The second subject extraction unit 12 extracts a region having a predetermined video feature (region of the second subject) from a single frame of the input video I, represents the shape of the region, and includes the region and another region. A key image F having a pixel value for distinguishing the above is generated (step S203), and the key image F is output to the synthesis unit 13.

所定の映像特徴を有する領域とは、第一被写体抽出部１１により抽出される第一被写体に関連する物の領域であり、例えば、第一被写体と共に動く第一被写体の影の領域である。 The region having a predetermined image feature is a region of an object related to the first subject extracted by the first subject extraction unit 11, and is, for example, a region of a shadow of the first subject moving together with the first subject.

第二被写体抽出部１２は、例えば、映像特徴として色ベクトルに関する情報を用いるクロマキー技術またはルミナンスキー技術を用いて、キー映像Ｆを生成する。 The second subject extraction unit 12 generates the key image F by using, for example, a chroma key technique or a luminansky technique that uses information about a color vector as an image feature.

例えば、以下の式が用いられる。

ここで、画素値が離散的である場合には、関数Ψの代わりに、３次元ルックアップテーブルが用いられる。関数Ψはキー映像Ｆの画素値を定める関数であり、例えば、第二被写体としたい色ベクトルｃ₁に対し、Ψ（ｃ₁）＝１とする。一方、第二被写体としたくない色ベクトルｃ₀に対し、Ψ（ｃ₀）＝０とする。 For example, the following equation is used.

Here, when the pixel values are discrete, a three-dimensional look-up table is used instead of the function Ψ. The function Ψ is a function that determines the pixel value of the key image F. For example, Ψ (c ₁ ) = 1 for the color vector c ₁ to be the second subject. On the other hand, for a color vector c ₀ that is not desired to be the second subject, Ψ (c ₀ ) = 0.

例えば第二被写体抽出部１２は、入力映像Ｉの各画素が緑色であるか否か（芝生であるか否か）を判定する。そして、第二被写体抽出部１２は、緑色である（芝生である）場合、キー映像Ｆの当該画素の画素値を０に設定し、緑色以外である（芝生でない）場合、キー映像Ｆの当該画素の画素値を１に設定する。 For example, the second subject extraction unit 12 determines whether or not each pixel of the input video I is green (whether or not it is a lawn). Then, the second subject extraction unit 12 sets the pixel value of the pixel of the key image F to 0 when it is green (is a lawn), and when it is other than green (not a lawn), the second subject extraction unit 12 is the key image F. Set the pixel value of the pixel to 1.

関数Ψは、画素が色ベクトルｃ＝［ｃ^(r) ｃ^(g) ｃ^(b)］^Ｔ（上付きのＴは、行列またはベクトルの転置を表す）なる３次元のベクトルで表される場合、以下の式が用いられる。

θ₀ ^(r)，θ₁ ^(r)，θ₀ ^(g)，θ₁ ^(g)，θ₀ ^(b)，θ₁ ^(b)は、予め設定された閾値である。 The function Ψ is when the pixel is represented by a three-dimensional vector such that the color vector c = [c ^(r) c ^(g) c ^(b) ] ^T (the superscript T represents the transpose of a matrix or vector). , The following equation is used.

θ ₀ ^(r) , θ ₁ ^(r) , θ ₀ ^(g) , θ ₁ ^(g) , θ ₀ ^(b) , and θ ₁ ^(b) are preset threshold values.

尚、第二被写体抽出部１２は、クロマキー技術またはルミナンスキー技術を用いて、キー映像Ｆの画素値を２値以上の多値としてもよい。例えば、キー映像Ｆの画素値を０以上かつ１以下とし、画素値が大きいほど「第二被写体らしい」ものと定義するようにしてもよい。 The second subject extraction unit 12 may use the chroma key technique or the luminansky technique to set the pixel value of the key image F to a multi-value of two or more values. For example, the pixel value of the key image F may be set to 0 or more and 1 or less, and the larger the pixel value, the more “like a second subject” may be defined.

（合成部１３）
合成部１３は、背景生成部１０から背景映像Ｂを入力すると共に、第二被写体抽出部１２からキー映像Ｆを入力する。そして、合成部１３は、背景映像Ｂに対し、キー映像Ｆに基づくキーイングにより入力映像Ｉの画素値を合成し、合成あり背景映像Ａ（第二被写体が合成された背景映像Ａ）を生成する（ステップＳ２０４）。合成部１３は、合成あり背景映像Ａを第一射影変換部１４に出力する。 (Synthesis unit 13)
The compositing unit 13 inputs the background image B from the background generation unit 10 and inputs the key image F from the second subject extraction unit 12. Then, the compositing unit 13 synthesizes the pixel values of the input video I with the background video B by keying based on the key video F, and generates a background video A with compositing (background video A in which the second subject is synthesized). (Step S204). The compositing unit 13 outputs the background image A with compositing to the first projective conversion unit 14.

例えば、第二被写体抽出部１２により、第二被写体である影の部分の色をＦ（ｔ；ｘ，ｙ）＝１、それ以外をＦ（ｔ；ｘ，ｙ）＝０としてキー映像Ｆが生成された場合を想定する。この場合、合成部１３は、例えば以下の式にて、背景映像Ｂに対し、キー映像Ｆの示す映像（キー映像Ｆの示す入力映像Ｉの部分）を合成した合成あり背景映像Ａを生成する。

前記式（５）において、右辺の第一項は、入力映像Ｉにおけるキー映像Ｆの示す影の領域の映像を示し、第二項は、背景映像Ｂにおけるキー映像Ｆの示す影以外の領域の映像を示す。 For example, the second subject extraction unit 12 sets the color of the shadow portion of the second subject to F (t; x, y) = 1 and the other colors to F (t; x, y) = 0, and the key image F is set. Suppose it is generated. In this case, the compositing unit 13 generates a composited background image A by synthesizing the image indicated by the key image F (the part of the input image I indicated by the key image F) with the background image B by, for example, the following formula. ..

In the above equation (5), the first term on the right side indicates the image of the shadow area indicated by the key image F in the input image I, and the second term indicates the image of the area other than the shadow indicated by the key image F in the background image B. Show the image.

尚、合成部１３は、背景映像Ｂに対し、キー映像Ｆ及びキー映像Ｋに基づくキーイングにより入力映像Ｉの画素値を合成し、合成あり背景映像Ａを生成するようにしてもよい。 The compositing unit 13 may synthesize the pixel values of the input video I with the background video B by keying based on the key video F and the key video K to generate the background video A with compositing.

例えば、第二被写体抽出部１２により、第二被写体である日向の背景色（例えば、日向の芝生）をＦ（ｔ；ｘ，ｙ）＝０、それ以外をＦ（ｔ；ｘ，ｙ）＝１としてキー映像Ｆが生成された場合を想定する。この場合、合成部１３は、例えば以下の式にて、合成あり背景映像Ａを生成する。

For example, the second subject extraction unit 12 sets F (t; x, y) = 0 for the background color of the second subject, Hinata (for example, the lawn of Hinata), and F (t; x, y) = for the others. It is assumed that the key image F is generated as 1. In this case, the compositing unit 13 generates the background image A with compositing by, for example, the following formula.

前記式（６）において、Ｆ（ｔ；ｘ，ｙ）＝１の部分には日陰の背景領域及び前景（背景領域における影及び被写体領域における影）が含まれ、Ｋ（ｔ；ｘ，ｙ）＝１の部分には前景（被写体）が含まれる。したがって、右辺のＦ（ｔ；ｘ，ｙ）・（１−Ｋ（ｔ；ｘ，ｙ））＝１の部分には、日陰の背景領域（背景領域における影）のみが含まれることとなる。その結果、合成あり背景映像Ａは、背景映像Ｂに対し、影の映像のみを合成した絵柄となる。 In the above equation (6), the portion of F (t; x, y) = 1 includes a shaded background area and a foreground (shadow in the background area and shadow in the subject area), and K (t; x, y). The portion of = 1 includes the foreground (subject). Therefore, the portion of F (t; x, y) · (1-K (t; x, y)) = 1 on the right side includes only the shaded background area (shadow in the background area). As a result, the background image A with composition becomes a pattern in which only the shadow image is combined with the background image B.

例えば、影の色が被写体の色と同じ場合には、影のみが反映されるべきキー映像Ｆは、被写体を含んでしまい、合成あり背景映像Ａは、被写体の映像も含んでしまう。前記式（６）を用いることにより、合成あり背景映像Ａから被写体の映像を除外することができる。 For example, when the color of the shadow is the same as the color of the subject, the key image F in which only the shadow should be reflected includes the subject, and the background image A with composition also includes the image of the subject. By using the above formula (6), the image of the subject can be excluded from the background image A with composition.

（第一射影変換部１４）
第一射影変換部１４は、合成部１３から合成あり背景映像Ａを入力すると共に、予め設定された入力映像Ｉのカメラパラメータ及び仮想視点映像Ｊのカメラパラメータを入力する。 (First projective transformation unit 14)
The first projective conversion unit 14 inputs the background image A with composition from the composition unit 13, and also inputs the camera parameters of the preset input image I and the camera parameters of the virtual viewpoint image J.

第一射影変換部１４は、合成あり背景映像Ａの各画素値が、被写界における所定の面内（例えば、地上高０の平面内、実空間上の面Ｇ内）の一点（または部分領域）を入力映像Ｉのカメラパラメータに応じて投影して撮像されたものと仮定する。そして、第一射影変換部１４は、被写界における所定の面内の一点（または部分領域）を、仮想視点（仮想視点映像Ｊ）のカメラパラメータに応じて、仮想視点映像Ｊの平面上に投影することで、背景の仮想視点映像Ｌを生成する（ステップＳ２０５）。 In the first projective transformation unit 14, each pixel value of the background image A with composition is a point (or a portion) in a predetermined plane in the field of view (for example, in a plane with a ground height of 0, in a plane G in real space). It is assumed that the area) is projected and captured according to the camera parameters of the input image I. Then, the first projective transformation unit 14 places one point (or a partial area) in a predetermined plane in the field of view on the plane of the virtual viewpoint image J according to the camera parameters of the virtual viewpoint (virtual viewpoint image J). By projecting, a virtual viewpoint image L in the background is generated (step S205).

すなわち、第一射影変換部１４は、合成あり背景映像Ａの各画素値が、被写界における所定の面内に存在することを仮定した射影変換を実行し、背景の仮想視点映像Ｌを生成する。第一射影変換部１４は、背景の仮想視点映像Ｌを合成部１７に出力する。 That is, the first projective transformation unit 14 executes a projective transformation assuming that each pixel value of the background image A with composition exists in a predetermined plane in the field of view, and generates a virtual viewpoint image L of the background. To do. The first projective conversion unit 14 outputs the background virtual viewpoint image L to the compositing unit 17.

実装上は、第一射影変換部１４は、仮想視点映像Ｊの画像座標から入力映像Ｉの画像座標へと光線を逆にたどることで、仮想視点映像Ｊの平面上に投影された合成あり背景映像Ａの画素値を決定し、背景の仮想視点映像Ｌを生成する。 In terms of implementation, the first projective conversion unit 14 reversely traces a light beam from the image coordinates of the virtual viewpoint image J to the image coordinates of the input image I, so that the background with composition projected on the plane of the virtual viewpoint image J The pixel value of the image A is determined, and the virtual viewpoint image L of the background is generated.

図３は、第一射影変換部１４の構成例を示すブロック図であり、図４は、第一射影変換部１４の動作を説明する図である。第一射影変換部１４は、フレームメモリ２０，２４、走査部２１、第一逆投影部２２及び第一投影部２３を備えている。 FIG. 3 is a block diagram showing a configuration example of the first projective conversion unit 14, and FIG. 4 is a diagram illustrating the operation of the first projective conversion unit 14. The first projective conversion unit 14 includes frame memories 20, 24, a scanning unit 21, a first back projection unit 22, and a first projection unit 23.

走査部２１は、仮想視点映像Ｊのフレーム内の各画素を所定の順序で選択することで、画素の画像座標Ｐ_Jを走査し、画素の画像座標Ｐ_Jを第一逆投影部２２及びフレームメモリ２４に出力する。走査部２１は、例えばラスタ走査により、画素を順次選択する。 Scanning unit 21, by selecting each pixel in the frame of the virtual viewpoint image J in a predetermined order to scan the image coordinates P _J of pixels, the image coordinates P _J a first inverse projection unit 22 and the frame of the pixel Output to memory 24. The scanning unit 21 sequentially selects pixels by, for example, raster scanning.

第一逆投影部２２は、走査部２１から仮想視点映像Ｊの画素の画像座標Ｐ_Jを入力すると共に、仮想視点映像Ｊのカメラパラメータを入力する。そして、第一逆投影部２２は、仮想視点映像Ｊのカメラパラメータに基づいて、画像座標Ｐ_Jを、被写界における所定の面である実空間上の面Ｇに逆投影し、逆投影像の点Ｐ_Gを設定し、点Ｐ_Gの位置情報を第一投影部２３に出力する。すなわち、第一逆投影部２２は、画像座標Ｐ_Jが実空間上の面Ｇのどこに対応するかを求め、対応する点Ｐ_Gを設定する。 The first back projection unit 22 inputs the image coordinates P _J of the pixels of the virtual viewpoint image J from the scanning unit 21, and also inputs the camera parameters of the virtual viewpoint image J. Then, the first back-projection unit 22 back-projects the image coordinates P _J onto the surface G in the real space, which is a predetermined surface in the field of view, based on the camera parameters of the virtual viewpoint image J, and back-projects the image. point to set the P _G, and outputs the positional information of the point P _G in the first projection part 23 of the. That is, the first back projection unit 22 finds where the image coordinates P _J correspond to the surface G in the real space, and sets the corresponding points P _G.

具体的には、第一逆投影部２２は、仮想視点映像Ｊの光学主点Ｏ_Jから画像座標Ｐ_Jの点を通る半直線を、画像座標Ｐ_Jの点方向へ伸ばし、その半直線が面Ｇと交わる点（複数の交わる点を有する場合には、光学主点Ｏ_Jに最も近い点）を求め、点Ｐ_Gを設定する。 Specifically, the first back projection unit 22 extends a half straight line passing through the point of the image coordinate P _J from the optical principal point O _J of the virtual viewpoint image J in the direction of the point of the image coordinate P _J , and the half straight line is formed. Find the point that intersects the surface G (if it has a plurality of intersecting points, the point closest to the optical principal point O _J ), and set the point P _G.

尚、面Ｇは平面であってもよいし、曲面であってもよい。面Ｇは、例えば被写界における地面、壁面、天井面（測量の結果得られる曲面であってもよいし、それを近似した平面であってもよい）とする。 The surface G may be a flat surface or a curved surface. The surface G is, for example, a ground surface, a wall surface, or a ceiling surface in the field of view (a curved surface obtained as a result of surveying, or a plane that approximates the curved surface).

第一投影部２３は、第一逆投影部２２から点Ｐ_Gの位置情報を入力すると共に、入力映像Ｉのカメラパラメータを入力する。そして、第一投影部２３は、入力映像Ｉのカメラパラメータに基づいて、点Ｐ_Gを入力映像Ｉの平面上に投影し、投影像の画像座標Ｐ_Iを設定し、画像座標Ｐ_Iをフレームメモリ２０に出力する。すなわち、第一投影部２３は、点Ｐ_Gが入力映像Ｉの平面のどこに対応するかを求め、対応する画像座標Ｐ_Iを設定する。 First projecting portion 23 inputs the positional information of the point P _G from the first back projection unit 22 inputs the camera parameters of the input image I. The first projecting portion 23, a frame based on camera parameters of the input image I, by projecting the point P _G in the plane of the input image I, and sets the image coordinates P _I of the projected image, the image coordinates P _I Output to memory 20. That is, the first projection unit 23 obtains where the point P _G corresponds to the plane of the input video I, and sets the corresponding image coordinates P _I.

具体的には、第一投影部２３は、点Ｐ_Gと入力映像Ｉを撮影したカメラの光学主点Ｏ_Iとを結ぶ線分が、入力映像Ｉの平面と交わる点を求め、これを画像座標Ｐ_Iに設定する。 Specifically, the first projecting portion 23, a line segment linking the point P _G and the input image optical principal point of a camera photographing I O _I is calculated a point which intersects the plane of the input image I, which image Set to coordinate P _I.

フレームメモリ２０は、合成部１３から合成あり背景映像Ａを入力し、合成あり背景映像Ａを格納する。これにより、フレームメモリ２０には、合成あり背景映像Ａの画素値が保持される。フレームメモリ２０は、第一投影部２３から画像座標Ｐ_Iを入力する。そして、フレームメモリ２０は、画像座標Ｐ_I（その水平及び垂直成分をそれぞれＰ_I ^(x)及びＰ_I ^(y)とする）における合成あり背景映像Ａの画素値、すなわち入力映像Ｉの画素値Ｉ（ｔ；Ｐ_I ^(x)，Ｐ_I ^(y)）をフレームメモリ２４に出力する。 The frame memory 20 inputs the background image A with composition from the composition unit 13, and stores the background image A with composition. As a result, the pixel value of the background image A with composition is held in the frame memory 20. The frame memory 20 inputs the image coordinates P _I from the first projecting portion 23. Then, the frame memory 20 uses the pixel value of the composite background image A at the image coordinates P _I (the horizontal and vertical components thereof are P _I ^(x) and P _I ^(y) , respectively), that is, the pixel value of the input image I. I; output _{^{(t P I (x),}} P I (y)) of the frame memory 24.

つまり、第一射影変換部１４により、フレームメモリ２０から、第一投影部２３にて設定された画像座標Ｐ_Iにおける画素値Ｉ（ｔ；Ｐ_I ^(x)，Ｐ_I ^(y)）が読み出され、フレームメモリ２４に出力される。 That is, by the first projective transformation unit 14, the frame from the memory 20, the pixel values in the image coordinate P _I set by the first projecting portion _{^{23 I (t; P I (}} x), P I (y)) is read It is output and output to the frame memory 24.

フレームメモリ２４は、走査部２１から画像座標Ｐ_Jを入力すると共に、フレームメモリ２０から画素値Ｉ（ｔ；Ｐ_I ^(x)，Ｐ_I ^(y)）を入力する。そして、フレームメモリ２４は、以下の式に示すように、画像座標Ｐ_J（その水平及び垂直成分をそれぞれＰ_J ^(x)及びＰ_J ^(y)とする）の位置に、画素値Ｉ（ｔ；Ｐ_I ^(x)，Ｐ_I ^(y)）を背景の仮想視点映像Ｌの画素値として格納する。フレームメモリ２４は、背景の仮想視点映像Ｌを合成部１７に出力する。

The frame memory 24 inputs the image coordinates P _J from the scanning unit 21, the pixel from the frame memory 20 value I; inputting a _{^{(t P I (x),}} P I (y)). Then, as shown in the following equation, the frame memory 24 has a pixel value I (t) at the position of the image coordinates P _J (the horizontal and vertical components thereof are P _J ^(x) and P _J ^(y) , respectively). ; P _I ^(x) , P _I ^(y) ) is stored as the pixel value of the virtual viewpoint image L in the background. The frame memory 24 outputs the background virtual viewpoint image L to the compositing unit 17.

つまり、第一射影変換部１４により、フレームメモリ２４から、走査部２１にて設定された画像座標Ｐ_Jの位置に画素値Ｉ（ｔ；Ｐ_I ^(x)，Ｐ_I ^(y)）が格納され、背景の仮想視点映像Ｌとして読み出され、合成部１７に出力される。 That is, by the first projective transformation unit 14, from the frame memory 24, the pixel value I to the position of the image coordinates P _J set by scanning unit _{^{21 (t; P I (x}} ), P I (y)) is stored It is read out as a background virtual viewpoint image L and output to the compositing unit 17.

（ビルボード設定部１５）
図１及び図２に戻って、ビルボード設定部１５は、第一被写体抽出部１１からキー映像Ｋを入力すると共に、予め設定された入力映像Ｉのカメラパラメータを入力する。そして、ビルボード設定部１５は、キー映像Ｋの示す被写体領域（例えば、Ｋ（ｔ；ｘ，ｙ）＝１を満たす領域）の各連結領域Ｃ_i（ｉは、連結領域の個々を区別するためのインデックスとする。）に対して、それぞれ所定のモデルによるビルボードの面Π_iを設定する（ステップＳ２０６）。所定のモデルによるビルボードの面Π_iとは、例えば、平面、円筒面または球面とする。 (Billboard setting unit 15)
Returning to FIGS. 1 and 2, the billboard setting unit 15 inputs the key image K from the first subject extraction unit 11, and also inputs the camera parameters of the preset input image I. Then, the billboard setting unit 15 distinguishes each connected area C _i (i of the area satisfying K (t; x, y) = 1) of the subject area indicated by the key image K. The billboard surface Π _{i according} to a predetermined model is set for each of the indexes (step S206). The billboard surface Π _i according to a predetermined model is, for example, a flat surface, a cylindrical surface, or a spherical surface.

ビルボード設定部１５は、ビルボードの面Π_iのパラメータ（例えば、面の方程式の各係数）をビルボードパラメータとして設定し、ビルボードパラメータを第二射影変換部１６に出力する。ここでは、ビルボード設定部１５は、連結領域Ｃ_iの総数（Ｄ個とする）のビルボードパラメータを出力するものとする。 The billboard setting unit 15 sets the parameters of the surface Π _i of the billboard (for example, each coefficient of the equation of the surface) as the billboard parameters, and outputs the billboard parameters to the second projective conversion unit 16. Here, it is assumed that the billboard setting unit 15 outputs the billboard parameters of the total number of connection areas C _i (assuming D).

図５は、ビルボード設定部１５の動作を説明する図である。ビルボード設定部１５は、所定のモデルによるビルボードの面Π_iを平面とする場合には、例えば、以下の（ａ）、（ｂ）及び（ｃ）の全ての条件を満たすように、ビルボードの面Π_iを設定する。 FIG. 5 is a diagram illustrating the operation of the billboard setting unit 15. When the surface Π _i of the billboard according to the predetermined model is a flat surface, the billboard setting unit 15 satisfies, for example, all of the following (a), (b) and (c). Set the board surface Π _i .

以下、図５を参照して説明する。
（ａ）面Π_iは、連結領域Ｃ_iの代表点（例えば、連結領域Ｃ_i（図５に示す黒塗りの領域）のバウンディングボックスの底辺の中点）を面Ｇ上に逆投影した点Ｘ（入力映像Ｉを撮影したカメラの光学主点Ｏ_Iを始点とし、入力映像Ｉの平面上の前記代表点を通る半直線が面Ｇと交差する点Ｘ）を含む。 Hereinafter, description will be made with reference to FIG.
(A) plane [pi _i is a representative point of the coupling region C _i (e.g., coupling regions C _i (midpoint of the base of the bounding box of the region) of the black shown in FIG. 5) a point obtained by backprojection onto a surface G Includes X (a point X whose starting point is the optical principal point O _I of the camera that captured the input image I and where a half straight line passing through the representative point on the plane of the input image I intersects the surface G).

（ｂ）面Π_iの法線ベクトルは、点Ｘにおける面Ｇの法線ベクトルと直交する。
（ｃ）面Π_iは、前記（ａ）及び（ｂ）を満たす平面のうち、点Ｘから光学主点Ｏ_Iへのベクトルと、当該面Π_iの法線ベクトルとの間の成す角が最小となるものである。 (B) The normal vector of the plane Π _i is orthogonal to the normal vector of the plane G at the point X.
(C) plane [pi _i, the (a) and (b) of the planes meet, and a vector of the optical principal point O _I from the point X, the angle formed between the normal vector of the plane [pi _i It is the minimum.

尚、ビルボード設定部１５は、前記（ｃ）の代わりに、以下の（ｄ）または（ｅ）の条件を満たすように、ビルボードの面Π_iを設定するようにしてもよい。
（ｄ）面Π_iは、前記（ａ）及び（ｂ）を満たす平面のうち、点Ｘから仮想視点映像Ｊの光学主点Ｏ_Jへのベクトルと、当該面Π_iの法線ベクトルとの間の成す角が最小となるものである。
（ｅ）点Ｘから光学主点Ｏ_Iへのベクトルをｖ_Iとし、点Ｘから光学主点Ｏ_Jへのベクトルをｖ_Jとする。面Π_iは、前記（ａ）及び（ｂ）を満たす平面のうち、ベクトル（αｖ_I＋（１−α）ｖ_J）と、当該面Π_iの法線ベクトルとの間の成す角が最小となるものである。パラメータαは、０＜α＜１を満たす実数とする（例えばα＝０．５）。 Instead of the above (c), the billboard setting unit 15 may set the surface Π _i of the billboard so as to satisfy the following conditions (d) or (e).
(D) The plane Π _i is a vector from the point X to the optical principal point O _J of the virtual viewpoint image J and the normal vector of the plane Π _i among the planes satisfying the above (a) and (b). The angle between them is the smallest.
(E) Let v _I be the vector from the point X to the optical principal point O _I , and let v _J be the vector from the point X to the optical principal point O _J. The surface Π _i has the smallest angle formed between the vector (αv _I + (1-α) v _J ) and the normal vector of the surface Π _i among the planes satisfying the above (a) and (b). Is what becomes. The parameter α is a real number satisfying 0 <α <1 (for example, α = 0.5).

前記（ｅ）の条件を満たすように面Π_iが設定されることにより、後述する合成部１７にて生成される仮想視点映像Ｊに含まれる被写体は、前記（ｃ）または前記（ｄ）を満たす場合に比べ、実際に近い形態で表現することができる。 By setting the surface Π _i so as to satisfy the condition (e), the subject included in the virtual viewpoint image J generated by the synthesis unit 17 described later is the above (c) or the above (d). It can be expressed in a form closer to the actual one than when it is satisfied.

このようにして設定されたビルボードの面Π_iのパラメータは、連結領域Ｃ_iの総数をＤ個とした場合、Ｄ個のビルボードパラメータとして第二射影変換部１６へ出力される。 The parameters of the billboard surface Π _i set in this way are output to the second projective conversion unit 16 as D billboard parameters when the total number of connecting regions C _i is D.

（第二射影変換部１６）
図１及び図２に戻って、第二射影変換部１６は、予め設定された入力映像Ｉのカメラパラメータ及び仮想視点映像Ｊのカメラパラメータを入力する。また、第二射影変換部１６は、第一被写体抽出部１１からキー映像Ｋを入力すると共に、ビルボード設定部１５からＤ個のビルボードパラメータを入力する。 (Second projective transformation unit 16)
Returning to FIGS. 1 and 2, the second projective conversion unit 16 inputs the camera parameters of the preset input image I and the camera parameters of the virtual viewpoint image J. Further, the second projective conversion unit 16 inputs the key image K from the first subject extraction unit 11, and also inputs D billboard parameters from the billboard setting unit 15.

第二射影変換部１６は、入力映像Ｉ及びキー映像Ｋの各画素がビルボード（Ｄ個のビルボードパラメータが示す面Π_i）上にあるという仮定の下で、入力映像Ｉのカメラパラメータ、仮想視点映像Ｊのカメラパラメータ及びビルボードを用いて射影変換を実行する。 The second projective converter 16 determines that the camera parameters of the input video I and the key video K are on the billboard (the surface Π _i indicated by the D billboard parameters). The projective transformation is executed using the camera parameters and billboard of the virtual viewpoint image J.

第二射影変換部１６は、前景の仮想視点映像（第一被写体の仮想視点映像）Ｍ₁〜Ｍ_D及びキーの仮想視点映像（第一キーの仮想視点映像）Ｎ₁〜Ｎ_Dを生成する（ステップＳ２０７）。第二射影変換部１６は、前景の仮想視点映像Ｍ₁〜Ｍ_D及びキーの仮想視点映像Ｎ₁〜Ｎ_Dを合成部１７に出力する。ここで、キーの仮想視点映像Ｎ₁〜Ｎ_Dは、第一被写体の形状を表し、かつ当該第一被写体の領域と他の領域とを区別する画素値を有するキー映像である。 The second projective transformation unit 16 generates N ₁ to N _D (virtual viewpoint image of the first key) virtual viewpoint image of the foreground of the virtual viewpoint image (virtual viewpoint image of the first object) M ₁ ~M _D and key (Step S207). The second projective conversion unit 16 outputs the virtual viewpoint images M _{1 to} M _D of the foreground and the virtual viewpoint images N _{1 to} N _D of the key to the compositing unit 17. Here, the virtual viewpoint image N ₁ to N _D key represents the shape of the first object, and a key image having distinguishing pixel value and the first object region and the other region.

以下、各ビルボードの法線の向きは、ビルボード設定部１５により設定されたビルボードの面Π_iの各法線ベクトルの方向のまま固定する場合で説明する。尚、各ビルボードの法線の向きは、各ビルボードを例えば仮想視点映像Ｊの光学主点Ｏ_Jに指向させる等、その法線方向に修正を加えるものであってもよい。 Hereinafter, the direction of the normal of each billboard will be described in the case of fixing the direction of each normal vector of the surface Π _i of the billboard set by the billboard setting unit 15. The direction of the normal of each billboard may be modified in the normal direction, for example, by directing each billboard to the optical principal point O _J of the virtual viewpoint image J.

図６は、第二射影変換部１６の動作を説明する図である。Ｄ個のビルボードパラメータのそれぞれについて、射影変換が実行される。以下、Ｄ個のビルボードパラメータのうちｉ番目のビルボードパラメータについての射影変換について説明する。仮想視点映像Ｊの平面上のある注目画素の画像座標をＰ_Jとし、画像座標Ｐ_Jに対応する入力映像Ｉの平面上にある画素の画像座標をＲ_iとする。 FIG. 6 is a diagram illustrating the operation of the second projective transformation unit 16. A projective transformation is performed for each of the D billboard parameters. Hereinafter, the projective transformation of the i-th billboard parameter among the D billboard parameters will be described. Let P _{J be} the image coordinates of a pixel of interest on the plane of the virtual viewpoint image J, and let R _{i be} the image coordinates of the pixels on the plane of the input image I corresponding to the image coordinates P _J.

第二射影変換部１６は、仮想視点映像Ｊの光学主点Ｏ_Jを始点として、始点から注目画素の画像座標Ｐ_Jを通る半直線が、ｉ番目のビルボードの面Π_iと交わる点Ｑ_iを求める。そして、第二射影変換部１６は、点Ｑ_iを入力映像Ｉの平面上に投影し、その像の画像座標Ｒ_iを求める。 The second projective transformation unit 16, starting at an optical principal point O _J of the virtual viewpoint image J, half line passing through the image coordinates P _J of the pixel of interest from the starting point is the point intersects the plane [pi _i of i-th Billboard Q _{Find i} . Then, the second projective transformation unit 16 projects the point Q _i on the plane of the input image I and obtains the image coordinates R _i of the image.

具体的には、第二射影変換部１６は、点Ｑ_iと入力映像Ｉの光学主点Ｏ_Iとを結ぶ線分が入力映像Ｉの平面と交差する点の画像座標を求め、これを画像座標Ｒ_iに設定する。第二射影変換部１６は、画像座標Ｒ_iにおける入力映像Ｉの画素値を、仮想視点映像Ｊの平面上の注目画素の画像座標Ｐ_Jにおける画素値に設定する。また、第二射影変換部１６は、画像座標Ｒ_iにおけるキー映像Ｋの画素値を、仮想視点映像Ｊの平面上の注目画素の画像座標Ｐ_Jにおけるキー値に設定する。 Specifically, the second projective conversion unit 16 obtains the image coordinates of the point where the line segment connecting the point Q _i and the optical principal point O _I of the input image I intersects the plane of the input image I, and obtains the image coordinates of this point. Set to the coordinate R _i . The second projective conversion unit 16 sets the pixel value of the input image I at the image coordinate R _{i to} the pixel value at the image coordinate P _J of the pixel of interest on the plane of the virtual viewpoint image J. Further, the second projective conversion unit 16 sets the pixel value of the key image K at the image coordinate R _{i to} the key value at the image coordinate P _J of the pixel of interest on the plane of the virtual viewpoint image J.

図７は、第二射影変換部１６の構成例を示すブロック図である。この第二射影変換部１６は、走査部３０，３４、ビルボード選択部３１、フレームメモリ３２，３３，３７，３８、第二逆投影部３５及び第二投影部３６を備えている。 FIG. 7 is a block diagram showing a configuration example of the second projective transformation unit 16. The second projective conversion unit 16 includes scanning units 30, 34, a billboard selection unit 31, frame memories 32, 33, 37, 38, a second back projection unit 35, and a second projection unit 36.

走査部３０は、Ｄ個のビルボードパラメータの示すビルボードを所定の順序で選択することで、ビルボードのインデックスｉ（選択したビルボードを識別するためのインデックス）を走査する。走査部３０は、インデックスｉをビルボード選択部３１に出力する。 The scanning unit 30 scans the billboard index i (index for identifying the selected billboard) by selecting the billboards indicated by the D billboard parameters in a predetermined order. The scanning unit 30 outputs the index i to the billboard selection unit 31.

尚、走査部３０は、Ｄ個のビルボードから１つを選択する際に、仮想視点映像Ｊの光学主点Ｏ_Jからの距離が遠いもの（光学主点Ｏ_Jからビルボードの面Πの代表点（例えば重心座標）までの距離（例えばユークリッド距離）が遠いもの）ほど先に選択するようにしてもよい。 The scanning unit 30, when selecting one of the D pieces of billboards, virtual viewpoint image J as the distance from the optical principal point O _J is distant (from the optical principal point O _J billboard face of Π The farther the distance to the representative point (for example, the coordinate of the center of gravity) (for example, the Euclidean distance) is, the earlier the selection may be made.

これにより、後述する合成部１７において、複数のビルボードによる映像が重なり合う場合に、この順番で映像が合成されることで、近くの画素を優先することができ、遠くのビルボードを近くのビルボードで隠すいわゆる陰面処理を実現することができる。 As a result, in the compositing unit 17 described later, when the images from a plurality of billboards overlap, the images are synthesized in this order, so that the nearby pixels can be prioritized, and the far billboard can be used as a nearby building. It is possible to realize so-called hidden surface treatment that is hidden by a board.

ビルボード選択部３１は、ビルボード設定部１５からＤ個のビルボードパラメータを入力すると共に、走査部３０からインデックスｉを入力する。そして、ビルボード選択部３１は、Ｄ個のビルボードパラメータのうち、インデックスｉの示すビルボードパラメータを選択し、選択したビルボードパラメータを第二逆投影部３５に出力する。 The billboard selection unit 31 inputs D billboard parameters from the billboard setting unit 15 and inputs the index i from the scanning unit 30. Then, the billboard selection unit 31 selects the billboard parameter indicated by the index i from the D billboard parameters, and outputs the selected billboard parameter to the second back projection unit 35.

走査部３４は、出力すべき前景の仮想視点映像Ｍ及びキーの仮想視点映像Ｎの平面上において画素位置を走査することで、各画素を所定の順序で選択し、画素の画像座標Ｐ_Jを第二逆投影部３５及びフレームメモリ３７，３８に出力する。走査部３４は、例えばラスタ走査により、画素を順次選択する。 Scanning unit 34, by scanning the pixel positions on a plane of the virtual viewpoint image N of the virtual viewpoint image M and key of the foreground to be output, and selecting pixels in a predetermined order, the image coordinates P _J pixels It is output to the second back projection unit 35 and the frame memories 37 and 38. The scanning unit 34 sequentially selects pixels by, for example, raster scanning.

第二逆投影部３５は、走査部３４から、前景の仮想視点映像Ｍ及びキーの仮想視点映像Ｎにおける画素の画像座標Ｐ_Jを入力すると共に、ビルボード選択部３１からビルボードパラメータを入力する。また、第二逆投影部３５は、予め設定された仮想視点映像Ｊのカメラパラメータを入力する。 The second back projection unit 35 inputs the image coordinates P _J of the pixels in the virtual viewpoint image M of the foreground and the virtual viewpoint image N of the key from the scanning unit 34, and inputs the billboard parameters from the billboard selection unit 31. .. Further, the second back projection unit 35 inputs the camera parameters of the virtual viewpoint image J set in advance.

第二逆投影部３５は、仮想視点映像Ｊのカメラパラメータに基づいて、画像座標Ｐ_Jを、ビルボードパラメータの示すビルボードの面Π_iに逆投影し、逆投影像の点Ｑ_iを設定し、点Ｑ_iの位置情報を第二投影部３６に出力する。すなわち、第二逆投影部３５は、画像座標Ｐ_Jがビルボードの面Π_iのどこに対応するかを求め、対応する点Ｑ_iを設定する。 The second back projection unit 35 back-projects the image coordinates P _J onto the billboard surface Π _i indicated by the billboard parameters based on the camera parameters of the virtual viewpoint image J, and sets the point Q _i of the back projection image. Then, the position information of the point Q _i is output to the second projection unit 36. That is, the second back projection unit 35 finds where the image coordinates P _J correspond to the surface Π _i of the billboard, and sets the corresponding points Q _i .

具体的には、第二逆投影部３５は、仮想視点映像Ｊの光学主点Ｏ_Jから画像座標Ｐ_Jの点を通る半直線を、画像座標Ｐ_Jの点方向へ伸ばし、その半直線がビルボードの面Π_iと交わる点（複数の交わる点を有する場合には、光学主点Ｏ_Jに最も近い点）を求め、点Ｑ_iを設定する。 Specifically, the second back projection unit 35 extends a half straight line passing through the point of the image coordinate P _J from the optical principal point O _J of the virtual viewpoint image J in the direction of the point of the image coordinate P _J , and the half straight line is formed. Find the point that intersects the surface Π _{i of the} billboard (if it has multiple points of intersection, the point closest to the optical principal point O _J ), and set the point Q _i .

第二投影部３６は、第二逆投影部３５から点Ｑ_iの位置情報を入力すると共に、入力映像Ｉのカメラパラメータを入力する。そして、第二投影部３６は、入力映像Ｉのカメラパラメータに基づいて、点Ｑ_iを入力映像Ｉの平面上に投影し、投影像の画像座標Ｒ_iを設定し、画像座標Ｒ_iをフレームメモリ３２，３３に出力する。すなわち、第二投影部３６は、点Ｑ_iが入力映像Ｉの平面のどこに対応するかを求め、対応する画像座標Ｒ_iを設定する。 The second projection unit 36 inputs the position information of the point Q _i from the second back projection unit 35, and also inputs the camera parameters of the input image I. Then, the second projection unit 36 projects the point Q _i on the plane of the input image I based on the camera parameters of the input image I, sets the image coordinates R _i of the projected image, and sets the image coordinates R _i to the frame. Output to memories 32 and 33. That is, the second projection unit 36 finds where the point Q _i corresponds to the plane of the input video I, and sets the corresponding image coordinates R _i .

具体的には、第二投影部３６は、点Ｑ_iと入力映像Ｉを撮影したカメラの光学主点Ｏ_Iとを結ぶ線分が、入力映像Ｉの平面と交わる点を求め、これを画像座標Ｒ_iに設定する。 Specifically, the second projection unit 36 finds a point where the line segment connecting the point Q _i and the optical principal point O _I of the camera that captured the input image I intersects the plane of the input image I, and obtains an image. Set to the coordinate R _i .

フレームメモリ３２は、入力映像Ｉを格納する。これにより、フレームメモリ３２には、入力映像Ｉの画素値が保持される。フレームメモリ３２は、第二投影部３６から画像座標Ｒ_iを入力する。そして、フレームメモリ３２は、画像座標Ｒ_i（その水平及び垂直成分をそれぞれＲ_i ^(x)及びＲ_i ^(y)とする）における入力映像Ｉの画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）をフレームメモリ３７に出力する。 The frame memory 32 stores the input video I. As a result, the pixel value of the input video I is held in the frame memory 32. The frame memory 32 inputs the image coordinates R _i from the second projection unit 36. Then, the frame memory 32 has a pixel value I (t; R _i ^(x) , of the input image I at the image coordinates R _i (the horizontal and vertical components thereof are R _i ^(x) and R _i ^(y) , respectively) ⁾ . _Ri ^(y) ) is output to the frame memory 37.

つまり、第二射影変換部１６により、フレームメモリ３２から、第二投影部３６にて設定された画像座標Ｒ_iにおける画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）が読み出され、フレームメモリ３７に出力される。 That is, the second projective conversion unit 16 reads the pixel values I (t; _Ri ^(x) , _Ri ^(y) ) at the image coordinates R _i set by the second projection unit 36 from the frame memory 32. It is output and output to the frame memory 37.

フレームメモリ３７は、走査部３４から画像座標Ｐ_Jを入力すると共に、フレームメモリ３２から画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）を入力する。そして、フレームメモリ３７は、以下の式に示すように、画像座標Ｐ_J（その水平及び垂直成分をそれぞれＰ_J ^(x)及びＰ_J ^(y)とする）の位置に、画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）を格納し、これを前景の仮想視点映像Ｍ_iの画素値Ｍ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）とする。

The frame memory 37 inputs the image coordinates P _J from the scanning unit 34, and also inputs the pixel values I (t; _Ri ^(x) , _Ri ^(y) ) from the frame memory 32. Then, as shown in the following equation, the frame memory 37 has a pixel value I (t) at the position of the image coordinates P _J (the horizontal and vertical components thereof are P _J ^(x) and P _J ^(y) , respectively). R _i ^(x) , R _i ^(y) ) is stored, and this is set as the pixel value M _i (t; P _J ^(x) , P _J ^(y) ) of the virtual viewpoint image M _{i in} the foreground.

つまり、第二射影変換部１６により、フレームメモリ３７において、走査部３４にて設定された画像座標Ｐ_Jの位置に、画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）が前景の仮想視点映像Ｍ_iの画素値Ｍ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）として格納される。 That is, the second projective conversion unit 16 sets the pixel values I (t; _Ri ^(x) , _Ri ^(y) ) at the position of the image coordinates P _J set by the scanning unit 34 in the frame memory 37. pixel value M _i of the foreground of the virtual viewpoint image M _i; stored as _{^{(t P J (x),}} P J (y)).

走査部３０により全てのインデックスｉが走査され、全てのインデックスｉについての画素値Ｉ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）が前景の仮想視点映像Ｍ_iの画素値Ｍ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）として、フレームメモリ３７に格納される。 All the index i is scanned by the scanning unit 30, the pixel value I for all indices _{^{i (t; R i (x}} ), R i (y)) is the pixel value M _i of the foreground of the virtual viewpoint image M _i ( It is stored in the frame memory 37 as t; P _J ^(x) , P _J ^(y) ).

フレームメモリ３７は、全てのインデックスｉ（ｉ＝１〜Ｄ）について格納した前景の仮想視点映像Ｍ₁〜Ｍ_Dを、合成部１７に出力する。 The frame memory 37 outputs the virtual viewpoint images M _{1 to} M _D of the foreground stored for all the indexes i (i = _{1 to} _D) to the compositing unit 17.

つまり、第二射影変換部１６により、フレームメモリ３７から前景の仮想視点映像Ｍ₁〜Ｍ_Dが読み出され、合成部１７に出力される。 That is, by the second projective transformation unit 16, a virtual viewpoint image M ₁ ~M _D foreground from the frame memory 37 is read and output to the combining unit 17.

フレームメモリ３３は、第一被写体抽出部１１からキー映像Ｋを入力して格納する。これにより、フレームメモリ３３には、キー映像Ｋの画素値が保持される。フレームメモリ３３は、第二投影部３６から画像座標Ｒ_iを入力する。そして、フレームメモリ３３は、画像座標Ｒ_i（その水平及び垂直成分をそれぞれＲ_i ^(x)及びＲ_i ^(y)とする）におけるキー映像Ｋの画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）をフレームメモリ３８に出力する。 The frame memory 33 inputs and stores the key image K from the first subject extraction unit 11. As a result, the pixel value of the key image K is held in the frame memory 33. The frame memory 33 inputs the image coordinates R _i from the second projection unit 36. Then, the frame memory 33 has a pixel value K (t; R _i ^(x) , of the key image K at the image coordinates R _i (the horizontal and vertical components thereof are R _i ^(x) and R _i ^(y) , respectively) ⁾ . _Ri ^(y) ) is output to the frame memory 38.

つまり、第二射影変換部１６により、フレームメモリ３３から、第二投影部３６にて設定された画像座標Ｒ_iにおける画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）が読み出され、フレームメモリ３８に出力される。 That is, the second projective conversion unit 16 reads from the frame memory 33 the pixel values K (t; _Ri ^(x) , R _i ^(y) ) at the image coordinates R _i set by the second projection unit 36. It is output and output to the frame memory 38.

フレームメモリ３８は、走査部３４から画像座標Ｐ_Jを入力すると共に、フレームメモリ３３から画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）を入力する。そして、フレームメモリ３８は、以下の式に示すように、画像座標Ｐ_J（その水平及び垂直成分をそれぞれＰ_J ^(x)及びＰ_J ^(y)とする）の位置に、画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）を格納し、これをキーの仮想視点映像Ｎ_iの画素値Ｎ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）とする。

The frame memory 38 inputs the image coordinates P _J from the scanning unit 34, and also inputs the pixel values K (t; _Ri ^(x) , _Ri ^(y) ) from the frame memory 33. Then, as shown in the following equation, the frame memory 38 has a pixel value K (t) at the position of the image coordinates P _J (the horizontal and vertical components thereof are P _J ^(x) and P _J ^(y) , respectively). ; R _i ^(x), and stores the R _i ^(y)), the pixel value N _i of the virtual viewpoint image N _i of this key _{^{(t; P J (x)}} , and P _J ^(y)).

つまり、第二射影変換部１６により、フレームメモリ３８において、走査部３４にて設定された画像座標Ｐ_Jの位置に、画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）がキーの仮想視点映像Ｎ_iの画素値Ｎ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）として格納される。 That is, the second projective conversion unit 16 sets the pixel values K (t; _Ri ^(x) , _Ri ^(y) ) at the position of the image coordinates P _J set by the scanning unit 34 in the frame memory 38. pixel value N _i of the virtual viewpoint image N _i key; stored as _{^{(t P J (x),}} P J (y)).

走査部３０により全てのインデックスｉが走査され、全てのインデックスｉについての画素値Ｋ（ｔ；Ｒ_i ^(x)，Ｒ_i ^(y)）がキーの仮想視点映像Ｎ_iの画素値Ｎ_i（ｔ；Ｐ_J ^(x)，Ｐ_J ^(y)）として、フレームメモリ３８に格納される。 All the index i is scanned by the scanning unit 30, the pixel value K for all indices _{^{i (t; R i (x}} ), R i (y)) is the pixel value N _i of the virtual viewpoint image N _i key ( It is stored in the frame memory 38 as t; P _J ^(x) , P _J ^(y) ).

フレームメモリ３８は、全てのインデックスｉ（ｉ＝１〜Ｄ）について格納したキーの仮想視点映像Ｎ₁〜Ｎ_Dを、合成部１７に出力する。 The frame memory 38 outputs the virtual viewpoint images N ₁ to ND of the keys stored for all the indexes i (i = _{1 to} _D) to the compositing unit 17.

つまり、第二射影変換部１６により、フレームメモリ３８からキーの仮想視点映像Ｎ₁〜Ｎ_Dが読み出され、合成部１７に出力される。 That is, by the second projective transformation unit 16, a virtual viewpoint image N ₁ to N _D key is read out from the frame memory 38 is output to the combining unit 17.

（合成部１７）
図１及び図２に戻って、合成部１７は、第一射影変換部１４から背景の仮想視点映像Ｌを入力すると共に、第二射影変換部１６から前景の仮想視点映像Ｍ₁〜Ｍ_D及びキーの仮想視点映像Ｎ₁〜Ｎ_Dを入力する。そして、合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dに基づいて、背景の仮想視点映像Ｌ及び前景の仮想視点映像Ｍ₁〜Ｍ_Dを合成し、仮想視点映像Ｊを生成して出力する（ステップＳ２０８）。 (Synthesis unit 17)
Returning to FIG. 1 and FIG. 2, the combining unit 17 inputs the virtual viewpoint image L of the background from the first projective transformation unit 14, the second virtual viewpoint image M ₁ of the foreground from the projective transformation unit 16 ~M _D and Enter the virtual viewpoint images N _{1 to} N _D of the key. Then, the composition unit 17, based on the virtual viewpoint video N ₁ to N _D key, to synthesize a virtual viewpoint image L and the virtual viewpoint video M ₁ ~M _D foreground background, to generate a virtual viewpoint image J Output (step S208).

合成部１７は、背景の仮想視点映像Ｌ及び前景の仮想視点映像Ｍ₁〜Ｍ_Dを合成する際に、例えば以下の式で表す処理を行う。具体的には、合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dにおける当該画素位置の画素値を参照し、ｉ＝１〜Ｄの順番に、その画素値が大きいほど、前景の仮想視点映像Ｍ₁〜Ｍ_Dを低い透明度で重畳し、その画素値が小さいほど、前景の仮想視点映像Ｍ₁〜Ｍ_Dを高い透明度で重畳することで、仮想視点映像Ｊを生成する。

Combining unit 17 performs in the synthesis of virtual viewpoint image L and the virtual viewpoint video M ₁ ~M _D foreground background, for example, a process expressed by the following equation. Specifically, the synthesizing unit 17 refers to the pixel values of the pixel positions in the virtual viewpoint images N _{1 to} ND of the key, and in the order of i = _{1 to} _D , the larger the pixel value, the more virtual the foreground is. The virtual viewpoint images J are generated by superimposing the viewpoint images M _{1 to} M _D with low transparency and superimposing the virtual viewpoint images M _{1 to} M _D in the foreground with high transparency as the pixel value becomes smaller.

尚、合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dを用いることなく、背景の仮想視点映像Ｌを下地として、その上に前景の仮想視点映像Ｍ₁〜Ｍ_Dを画素位置毎に重畳し、仮想視点映像Ｊを生成するようにしてもよい。 Incidentally, the combining unit 17, without using the virtual viewpoint image N ₁ to N _D key, as a base a virtual viewpoint image L of the background, the virtual viewpoint image M ₁ ~M _D foreground thereon for each pixel position The virtual viewpoint image J may be generated by superimposing the images.

また、合成部１７は、仮想視点映像Ｊの各画素について、当該画素の各ビルボード上の対応点Ｑ_iと光学主点Ｏ_Jとの間の距離を算出し、全ビルボード中最も距離の短いビルボードの画素値を特定し、この画素値を用いて仮想視点映像Ｊを生成するようにしてもよい。 Further, the compositing unit 17 calculates the distance between the corresponding point Q _i on each billboard and the optical principal point O _J of each pixel of the virtual viewpoint image J, and is the longest distance among all the billboards. A pixel value of a short billboard may be specified, and the virtual viewpoint image J may be generated using this pixel value.

以上のように、本発明の実施形態の仮想視点変換装置１によれば、背景生成部１０は、入力映像Ｉの複数フレームから背景映像Ｂを生成し、第一被写体抽出部１１は、入力映像Ｉの複数フレーム及び背景映像Ｂに基づいて第一被写体の領域を抽出し、キー映像Ｋを生成する。 As described above, according to the virtual viewpoint conversion device 1 of the embodiment of the present invention, the background generation unit 10 generates the background image B from the plurality of frames of the input image I, and the first subject extraction unit 11 generates the input image. The area of the first subject is extracted based on the plurality of frames of I and the background image B, and the key image K is generated.

第二被写体抽出部１２は、入力映像Ｉの単一フレームから所定の映像特徴を有する第二被写体の領域を抽出し、キー映像Ｆを生成する。合成部１３は、背景映像Ｂに対し、キー映像Ｆに基づくキーイングにより入力映像Ｉの画素値を合成し、合成あり背景映像Ａを生成する。 The second subject extraction unit 12 extracts a region of the second subject having a predetermined image feature from a single frame of the input image I, and generates a key image F. The compositing unit 13 synthesizes the pixel values of the input video I with the background video B by keying based on the key video F, and generates the background video A with compositing.

第一射影変換部１４は、合成あり背景映像Ａの各画素値が、被写界における面Ｇ内の一点を入力映像Ｉのカメラパラメータに応じて投影して撮像されたものと仮定し、面Ｇ内の一点を、仮想視点映像Ｊのカメラパラメータに応じて、仮想視点映像Ｊの平面上に投影することで、背景の仮想視点映像Ｌを生成する。 The first projective conversion unit 14 assumes that each pixel value of the background image A with composition is imaged by projecting a point in the surface G in the field of view according to the camera parameters of the input image I. A background virtual viewpoint image L is generated by projecting one point in G onto the plane of the virtual viewpoint image J according to the camera parameters of the virtual viewpoint image J.

ビルボード設定部１５は、キー映像Ｋの示す被写体領域の各連結領域Ｃ_iに対し、それぞれ所定のモデルによるビルボードの面Π_iを設定し、Ｄ個のビルボードパラメータを設定する。 The billboard setting unit 15 sets the surface Π _i of the billboard according to a predetermined model for each connection area C _i of the subject area indicated by the key image K, and sets D billboard parameters.

第二射影変換部１６は、入力映像Ｉ及びキー映像Ｋの各画素がビルボード（Ｄ個のビルボードパラメータが示す面Π_i）上にあるという仮定の下で、射影変換を実行し、前景の仮想視点映像Ｍ₁〜Ｍ_D及びキーの仮想視点映像Ｎ₁〜Ｎ_Dを生成する。 The second projective conversion unit 16 executes the projective conversion on the assumption that each pixel of the input image I and the key image K is on the billboard (the surface Π _i indicated by the D billboard parameters), and the foreground. to the generation of the virtual viewpoint image N ₁ to N _D of the virtual viewpoint image M ₁ ~M _D and key.

合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dに基づいて、背景の仮想視点映像Ｌ及び前景の仮想視点映像Ｍ₁〜Ｍ_Dを合成し、仮想視点映像Ｊを生成して出力する。 Combining unit 17, based on the virtual viewpoint video N ₁ to N _D key, to synthesize a virtual viewpoint image L and the virtual viewpoint video M ₁ ~M _D foreground background, and generates and outputs a virtual viewpoint image J ..

これにより、入力映像Ｉに含まれる背景及び第一被写体である前景に対し、異なる射影変換を適用することで、異なる視点から見た仮想視点映像Ｊを仮想的に生成することができる。この場合、背景映像Ｂにおいて欠落してしまう影等の第二被写体を第二被写体抽出部１２にて抽出し、合成部１３にて背景映像Ｂに合成するようにしたから、合成部１７において、より自然な仮想視点映像Ｊを得ることができる。 Thereby, by applying different projective transformations to the background included in the input video I and the foreground which is the first subject, the virtual viewpoint video J viewed from different viewpoints can be virtually generated. In this case, the second subject such as a shadow that is missing in the background image B is extracted by the second subject extraction unit 12, and is combined with the background image B by the composition unit 13. Therefore, the composition unit 17 A more natural virtual viewpoint image J can be obtained.

したがって、撮影時の入力映像Ｉを、撮影時とは異なる視点の映像に仮想的に変換する際に、第一被写体の影等の第二被写体を有する領域を適切に合成することができ、一層自然な仮想視点映像Ｊを生成することが可能となる。 Therefore, when the input image I at the time of shooting is virtually converted into an image at a viewpoint different from that at the time of shooting, it is possible to appropriately synthesize an area having a second subject such as a shadow of the first subject. It is possible to generate a natural virtual viewpoint image J.

〔他の実施形態〕
次に、仮想視点変換装置１について他の実施形態を説明する。図８は、本発明の他の実施形態による仮想視点変換装置の構成例を示すブロック図である。この仮想視点変換装置２は、背景生成部１０、第一被写体抽出部１１、第二被写体抽出部１２、ビルボード設定部１５、第二射影変換部１６、合成部１７及び第一射影変換部１８を備えている。 [Other Embodiments]
Next, another embodiment of the virtual viewpoint conversion device 1 will be described. FIG. 8 is a block diagram showing a configuration example of a virtual viewpoint conversion device according to another embodiment of the present invention. The virtual viewpoint conversion device 2 includes a background generation unit 10, a first subject extraction unit 11, a second subject extraction unit 12, a billboard setting unit 15, a second projection conversion unit 16, a composition unit 17, and a first projection conversion unit 18. It has.

図１に示した仮想視点変換装置１とこの仮想視点変換装置２とを比較すると、両仮想視点変換装置１，２は、背景生成部１０、第一被写体抽出部１１、第二被写体抽出部１２、ビルボード設定部１５、第二射影変換部１６及び合成部１７を備えている点で共通する。一方、仮想視点変換装置２は、合成部１３を備えておらず、第一射影変換部１４の代わりに第一射影変換部１８を備えている点で、合成部１３及び第一射影変換部１４を備えている仮想視点変換装置１と相違する。 Comparing the virtual viewpoint conversion device 1 shown in FIG. 1 with the virtual viewpoint conversion device 2, both virtual viewpoint conversion devices 1 and 2 have a background generation unit 10, a first subject extraction unit 11, and a second subject extraction unit 12. , A billboard setting unit 15, a second projective conversion unit 16, and a compositing unit 17 are provided in common. On the other hand, the virtual viewpoint conversion device 2 does not include the compositing unit 13, but includes the first projecting conversion unit 18 instead of the first projecting conversion unit 14, and the compositing unit 13 and the first projecting conversion unit 14 It is different from the virtual viewpoint conversion device 1 provided with.

第一射影変換部１８は、第二被写体（例えば影）が合成された合成あり背景映像Ａを入力する代わりに、第二被写体抽出部１２から第二被写体の形状等を表すキー映像Ｆを入力する。また、第一射影変換部１８は、入力映像Ｉを入力し、予め設定された入力映像Ｉのカメラパラメータ及び仮想視点映像Ｊのカメラパラメータを入力する。 The first projective transformation unit 18 inputs a key image F representing the shape of the second subject from the second subject extraction unit 12 instead of inputting the composite background image A in which the second subject (for example, a shadow) is synthesized. To do. Further, the first projective conversion unit 18 inputs the input video I, and inputs the preset camera parameters of the input video I and the camera parameters of the virtual viewpoint video J.

第一射影変換部１８は、入力映像Ｉからキー映像Ｆの示す映像を抽出し、第二被写体映像を生成する。つまり、第一射影変換部１８は、キー映像Ｆの示す入力映像Ｉの部分を第二被写体映像として生成し、第二被写体映像に対し、第一射影変換部１４と同様の処理を行い、第二被写体の仮想視点映像Ｌ’を生成する。 The first projective transformation unit 18 extracts the image indicated by the key image F from the input image I and generates the second subject image. That is, the first projective conversion unit 18 generates the portion of the input image I indicated by the key image F as the second subject image, and performs the same processing as the first projective conversion unit 14 on the second subject image to perform the first projection conversion unit 18. Generates a virtual viewpoint image L'of two subjects.

具体的には、第一射影変換部１８は、第二被写体映像の各画素値が、実空間上の面Ｇ内の一点（または部分領域）を入力映像Ｉのカメラパラメータに応じて投影して撮像されたものと仮定する。そして、第一射影変換部１８は、面Ｇ内の一点（または部分領域）を、仮想視点映像Ｊのカメラパラメータに応じて、仮想視点映像Ｊの平面上に投影することで、第二被写体の仮想視点映像Ｌ’を生成する。 Specifically, the first projective transformation unit 18 projects each pixel value of the second subject image onto a point (or a partial area) in the surface G in the real space according to the camera parameters of the input image I. It is assumed that the image was taken. Then, the first projective transformation unit 18 projects a point (or a partial area) in the surface G onto the plane of the virtual viewpoint image J according to the camera parameters of the virtual viewpoint image J, so that the second subject Generate a virtual viewpoint image L'.

すなわち、第一射影変換部１８は、第二被写体映像の各画素値が、面Ｇ内に存在することを仮定した射影変換を実行し、第二被写体の仮想視点映像Ｌ’を生成し、第二被写体の仮想視点映像Ｌ’を合成部１７に出力する。 That is, the first projective transformation unit 18 executes a projective transformation assuming that each pixel value of the second subject image exists in the surface G, generates a virtual viewpoint image L'of the second subject, and obtains a second subject image. The virtual viewpoint image L'of the two subjects is output to the compositing unit 17.

合成部１７は、背景の仮想視点映像Ｌの代わりに、第一射影変換部１８から第二被写体の仮想視点映像Ｌ’を入力し、前述した処理を行う。すなわち、合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dに基づいて、第二被写体の仮想視点映像Ｌ’及び前景の仮想視点映像Ｍ₁〜Ｍ_Dを合成し、仮想視点映像Ｊを生成して出力する。 The compositing unit 17 inputs the virtual viewpoint image L'of the second subject from the first projection conversion unit 18 instead of the virtual viewpoint image L of the background, and performs the above-described processing. That is, the compositing unit 17 synthesizes the virtual viewpoint images L'of the second subject and the virtual viewpoint images M _{1 to} M _D of the foreground based on the virtual viewpoint images N _{1 to} N _D of the key, and creates the virtual viewpoint images J. Generate and output.

以上のように、本発明の他の実施形態の仮想視点変換装置２によれば、背景生成部１０は、背景映像Ｂを生成し、第一被写体抽出部１１は、キー映像Ｋを生成し、第二被写体抽出部１２は、キー映像Ｆを生成する。 As described above, according to the virtual viewpoint conversion device 2 of another embodiment of the present invention, the background generation unit 10 generates the background image B, and the first subject extraction unit 11 generates the key image K. The second subject extraction unit 12 generates the key image F.

第一射影変換部１８は、キー映像Ｆの示す入力映像Ｉの部分を第二被写体映像として生成する。そして、第一射影変換部１８は、第二被写体映像の各画素値が、実空間上の面Ｇ内の一点を入力映像Ｉのカメラパラメータに応じて投影して撮像されたものと仮定し、面Ｇ内の一点を、仮想視点映像Ｊのカメラパラメータに応じて、仮想視点映像Ｊの平面上に投影することで、第二被写体の仮想視点映像Ｌ’を生成する。 The first projective transformation unit 18 generates the portion of the input image I indicated by the key image F as the second subject image. Then, the first projective conversion unit 18 assumes that each pixel value of the second subject image is captured by projecting a point in the surface G in the real space according to the camera parameter of the input image I. By projecting one point in the surface G onto the plane of the virtual viewpoint image J according to the camera parameters of the virtual viewpoint image J, the virtual viewpoint image L'of the second subject is generated.

ビルボード設定部１５は、Ｄ個のビルボードパラメータを設定し、第二射影変換部１６は、前景の仮想視点映像Ｍ₁〜Ｍ_D及びキーの仮想視点映像Ｎ₁〜Ｎ_Dを生成する。 The billboard setting unit 15 sets D billboard parameters, and the second projective conversion unit 16 generates virtual viewpoint images M _{1 to} M _{D for} the foreground and virtual viewpoint images N _{1 to} N _D for the keys.

合成部１７は、キーの仮想視点映像Ｎ₁〜Ｎ_Dに基づいて、第二被写体の仮想視点映像Ｌ’及び前景の仮想視点映像Ｍ₁〜Ｍ_Dを合成し、仮想視点映像Ｊを生成して出力する。 The compositing unit 17 synthesizes the virtual viewpoint images L'of the second subject and the virtual viewpoint images M _{1 to} M _D of the foreground based on the virtual viewpoint images N _{1 to} N _D of the key, and generates the virtual viewpoint images J. And output.

これにより、入力映像Ｉに含まれる第一被写体及び第二被写体に対し、異なる射影変換を適用することで、異なる視点から見た仮想視点映像Ｊを仮想的に生成することができる。この場合、第二被写体抽出部１２にて第二被写体の領域を抽出し、第一射影変換部１８にて第二被写体の仮想視点映像Ｌ’を生成するようにしたから、合成部１７において、より自然な仮想視点映像Ｊを得ることができる。 Thereby, by applying different projective transformations to the first subject and the second subject included in the input video I, the virtual viewpoint video J viewed from different viewpoints can be virtually generated. In this case, the second subject extraction unit 12 extracts the area of the second subject, and the first projection conversion unit 18 generates the virtual viewpoint image L'of the second subject. A more natural virtual viewpoint image J can be obtained.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。 Although the present invention has been described above with reference to embodiments, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the technical idea.

尚、本発明の実施形態による仮想視点変換装置１，２のハードウェア構成としては、通常のコンピュータを使用することができる。仮想視点変換装置１，２は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。 As the hardware configuration of the virtual viewpoint conversion devices 1 and 2 according to the embodiment of the present invention, a normal computer can be used. The virtual viewpoint conversion devices 1 and 2 are composed of a computer provided with a volatile storage medium such as a CPU and RAM, a non-volatile storage medium such as a ROM, and an interface.

仮想視点変換装置１に備えた背景生成部１０、第一被写体抽出部１１、第二被写体抽出部１２、合成部１３，１７、第一射影変換部１４、ビルボード設定部１５及び第二射影変換部１６の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Background generation unit 10, first subject extraction unit 11, second subject extraction unit 12, composition unit 13, 17, first projection conversion unit 14, billboard setting unit 15, and second projection conversion provided in the virtual viewpoint conversion device 1. Each function of the unit 16 is realized by causing the CPU to execute a program describing these functions.

また、仮想視点変換装置２に備えた背景生成部１０、第一被写体抽出部１１、第二被写体抽出部１２、ビルボード設定部１５、第二射影変換部１６、合成部１７及び第一射影変換部１８の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。 Further, the background generation unit 10, the first subject extraction unit 11, the second subject extraction unit 12, the billboard setting unit 15, the second projection conversion unit 16, the composition unit 17, and the first projection conversion provided in the virtual viewpoint conversion device 2. Each function of the unit 18 is also realized by causing the CPU to execute a program describing these functions.

これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 These programs are stored in the storage medium, read by the CPU, and executed. In addition, these programs can be stored and distributed in storage media such as magnetic disks (floppy (registered trademark) disks, hard disks, etc.), optical disks (CD-ROM, DVD, etc.), semiconductor memories, etc., and can be distributed via a network. You can also send and receive.

１，２仮想視点変換装置
１０背景生成部
１１第一被写体抽出部
１２第二被写体抽出部
１３合成部（背景合成部）
１４，１８第一射影変換部
１５ビルボード設定部
１６第二射影変換部
１７合成部
２０，２４，３２，３３，３７，３８フレームメモリ
２１，３０，３４走査部
２２第一逆投影部
２３第一投影部
３１ビルボード選択部
３５第二逆投影部
３６第二投影部
Ｉ入力映像
Ｋ，Ｆキー映像
Ａ合成あり背景映像
Ｂ背景映像
Ｃ_i 連結領域
Ｌ背景の仮想視点映像
Ｌ’ 第二被写体の仮想視点映像
Ｊ仮想視点映像
Ｍ₁〜Ｍ_D 前景の仮想視点映像（第一被写体の仮想視点映像）
Ｎ₁〜Ｎ_D キーの仮想視点映像（第一キーの仮想視点映像）
Ｏ_I，Ｏ_J 光学主点
Ｇ実空間上の面 1, 2, virtual viewpoint conversion device 10 background generation unit 11 first subject extraction unit 12 second subject extraction unit 13 composition unit (background composition unit)
14,18 1st projective conversion unit 15 Billboard setting unit 16 2nd projective conversion unit 17 Synthesis unit 20, 24, 32, 33, 37, 38 Frame memory 21, 30, 34 Scanning unit 22 1st back projection unit 23 1 Projection unit 31 Billboard selection unit 35 Second back projection unit 36 Second projection unit I Input image K, F Key image A With composition Background image B Background image C _i Connection area L Background virtual viewpoint image L'Second subject of the virtual viewpoint image J virtual viewpoint image M ₁ ~M _D foreground virtual viewpoint image (virtual viewpoint image of the first object)
Virtual viewpoint video of N _{1 to} N _D keys (virtual viewpoint video of the first key)
O _I , O _J Optical principal point G Surface in real space

Claims

撮影時の入力映像を、前記撮影時とは異なる視点の映像に仮想的に変換することで、仮想視点映像を生成する仮想視点変換装置において、
前記入力映像から背景映像を生成する背景生成部と、
前記入力映像から第一被写体の領域を抽出し、前記第一被写体の形状及び所定の画素値を有する第一キー映像を生成する第一被写体抽出部と、
前記入力映像から所定の映像特徴を有する第二被写体の領域を抽出し、前記第二被写体の形状及び所定の画素値を有する第二キー映像を生成する第二被写体抽出部と、
前記第二キー映像の示す前記入力映像の部分を第二被写体映像とし、当該第二被写体映像に対し、前記入力映像のカメラパラメータ及び前記仮想視点映像のカメラパラメータを用いて第一の射影変換を行い、前記第二被写体の仮想視点映像を生成する第一射影変換部と、
前記第一被写体抽出部により生成された前記第一キー映像、及び前記入力映像のカメラパラメータに基づいて、ビルボードを設定するビルボード設定部と、
前記入力映像及び前記第一被写体抽出部により生成された前記第一キー映像に対し、前記入力映像のカメラパラメータ、前記仮想視点映像のカメラパラメータ及び前記ビルボード設定部により設定された前記ビルボードを用いて第二の射影変換を行い、前記第一被写体の仮想視点映像を生成すると共に、前記第一被写体の形状及び前記所定の画素値を有する第一キーの仮想視点映像を生成する第二射影変換部と、
前記第二射影変換部により生成された前記第一キーの仮想視点映像に基づいて、前記第一射影変換部により生成された前記第二被写体の仮想視点映像、及び前記第二射影変換部により生成された前記第一被写体の仮想視点映像を合成することで、前記仮想視点映像を生成する合成部と、
を備えたことを特徴とする仮想視点変換装置。 In a virtual viewpoint conversion device that generates a virtual viewpoint image by virtually converting an input image at the time of shooting into an image of a viewpoint different from that at the time of shooting.
A background generator that generates a background image from the input image,
A first subject extraction unit that extracts a region of the first subject from the input video and generates a first key image having the shape of the first subject and a predetermined pixel value.
A second subject extraction unit that extracts a region of a second subject having a predetermined image feature from the input video and generates a second key image having the shape of the second subject and a predetermined pixel value.
The portion of the input image indicated by the second key image is set as the second subject image, and the first projection conversion is performed on the second subject image by using the camera parameters of the input image and the camera parameters of the virtual viewpoint image. The first projection conversion unit that generates the virtual viewpoint image of the second subject, and
A billboard setting unit that sets a billboard based on the camera parameters of the first key image and the input image generated by the first subject extraction unit.
For the input image and the first key image generated by the first subject extraction unit, the camera parameters of the input image, the camera parameters of the virtual viewpoint image, and the billboard set by the billboard setting unit are used. A second projective image is generated by performing a second projective transformation using the first subject to generate a virtual viewpoint image of the first subject, and also generating a virtual viewpoint image of the first key having the shape of the first subject and the predetermined pixel value. Conversion part and
Based on the virtual viewpoint image of the first key generated by the second projection conversion unit, the virtual viewpoint image of the second subject generated by the first projection conversion unit and generated by the second projection conversion unit. By synthesizing the virtual viewpoint image of the first subject, the compositing unit that generates the virtual viewpoint image and
A virtual viewpoint conversion device characterized by being equipped with.

撮影時の入力映像を、前記撮影時とは異なる視点の映像に仮想的に変換することで、仮想視点映像を生成する仮想視点変換装置において、
前記入力映像から背景映像を生成する背景生成部と、
前記入力映像から第一被写体の領域を抽出し、前記第一被写体の形状及び所定の画素値を有する第一キー映像を生成する第一被写体抽出部と、
前記入力映像から所定の映像特徴を有する第二被写体の領域を抽出し、前記第二被写体及び所定の画素値を有する第二キー映像を生成する第二被写体抽出部と、
前記背景生成部により生成された前記背景映像に対し、前記第二被写体抽出部により生成された前記第二キー映像の示す前記入力映像の部分を合成することで、合成あり背景映像を生成する背景合成部と、
前記背景合成部により生成された前記合成あり背景映像に対し、前記入力映像のカメラパラメータ及び前記仮想視点映像のカメラパラメータを用いて第一の射影変換を行い、背景の仮想視点映像を生成する第一射影変換部と、
前記第一被写体抽出部により生成された前記第一キー映像、及び前記入力映像のカメラパラメータに基づいて、ビルボードを設定するビルボード設定部と、
前記入力映像及び前記第一被写体抽出部により生成された前記第一キー映像に対し、前記入力映像のカメラパラメータ、前記仮想視点映像のカメラパラメータ及び前記ビルボード設定部により設定された前記ビルボードを用いて第二の射影変換を行い、前記第一被写体の仮想視点映像を生成すると共に、前記第一被写体の形状及び前記所定の画素値を有する第一キーの仮想視点映像を生成する第二射影変換部と、
前記第二射影変換部により生成された前記第一キーの仮想視点映像に基づいて、前記第一射影変換部により生成された前記背景の仮想視点映像、及び前記第二射影変換部により生成された前記第一被写体の仮想視点映像を合成することで、前記仮想視点映像を生成する合成部と、
を備えたことを特徴とする仮想視点変換装置。 In a virtual viewpoint conversion device that generates a virtual viewpoint image by virtually converting an input image at the time of shooting into an image of a viewpoint different from that at the time of shooting.
A background generator that generates a background image from the input image,
A first subject extraction unit that extracts a region of the first subject from the input video and generates a first key image having the shape of the first subject and a predetermined pixel value.
A second subject extraction unit that extracts a region of a second subject having a predetermined image feature from the input video and generates a second subject and a second key image having a predetermined pixel value.
A background that generates a background image with composition by synthesizing a portion of the input image indicated by the second key image generated by the second subject extraction unit with the background image generated by the background generation unit. Synthetic part and
A first projective transformation is performed on the composited background image generated by the background compositing unit using the camera parameters of the input image and the camera parameters of the virtual viewpoint image to generate a virtual viewpoint image of the background. One-projection converter and
A billboard setting unit that sets a billboard based on the camera parameters of the first key image and the input image generated by the first subject extraction unit.
For the input image and the first key image generated by the first subject extraction unit, the camera parameters of the input image, the camera parameters of the virtual viewpoint image, and the billboard set by the billboard setting unit are used. A second projective image is generated by performing a second projective transformation using the first subject to generate a virtual viewpoint image of the first subject, and also generating a virtual viewpoint image of the first key having the shape of the first subject and the predetermined pixel value. Conversion part and
Based on the virtual viewpoint image of the first key generated by the second projection conversion unit, the virtual viewpoint image of the background generated by the first projection conversion unit and the virtual viewpoint image of the background generated by the second projection conversion unit. A compositing unit that generates the virtual viewpoint image by synthesizing the virtual viewpoint image of the first subject,
A virtual viewpoint conversion device characterized by being equipped with.

請求項１または２に記載の仮想視点変換装置において、
前記背景生成部は、
前記入力映像の複数フレームから前記背景映像を生成し、
前記第二被写体抽出部は、
前記入力映像の単一フレームにおける所定の画素値特徴を有する領域を前記第二被写体の領域として抽出する、ことを特徴とする仮想視点変換装置。 In the virtual viewpoint conversion device according to claim 1 or 2.
The background generation unit
The background image is generated from a plurality of frames of the input image, and the background image is generated.
The second subject extraction unit
A virtual viewpoint conversion device characterized in that a region having a predetermined pixel value feature in a single frame of the input video is extracted as a region of the second subject.

コンピュータを、請求項１から３までのいずれか一項に記載の仮想視点変換装置として機能させるためのプログラム。 A program for causing a computer to function as the virtual viewpoint conversion device according to any one of claims 1 to 3.