JP2014102746A

JP2014102746A - Subject recognition device and subject recognition program

Info

Publication number: JP2014102746A
Application number: JP2012255375A
Authority: JP
Inventors: Hiroko Yabushita; 浩子藪下; Tatsuya Osawa; 達哉大澤; Jun Shimamura; 潤島村; Yukinobu Taniguchi; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2012-11-21
Filing date: 2012-11-21
Publication date: 2014-06-05

Abstract

PROBLEM TO BE SOLVED: To provide a subject recognition device which is capable of more accurately recognizing a subject in quick subject recognition.SOLUTION: The subject recognition device includes: query feature representing means which extracts feature parts from a subject development image to compute first local descriptors of the feature parts in the subject development image; database feature representing means which extracts feature parts from database images as image search objects to compute second local descriptors of the feature parts in the database images; and collation means which obtains a homography matrix from pairs of closest feature points generated based on results of comparison between the first local descriptors and the second local descriptors and uses the homography matrix to narrow down the pairs of closest feature points and then computes scores of similarity and specifies a database image similar to the subject development image on the basis of the scores.

Description

本発明は、撮像装置で撮影した被写体を認識する被写体認識装置及び被写体認識プログラムに関する。 The present invention relates to a subject recognition device and a subject recognition program for recognizing a subject photographed by an imaging device.

従来からカメラ等の撮像装置で撮影した画像のみを用いて、対象を特定する画像認識技術がある。例えば、予め被写体を異なる角度から撮影した画像群を学習画像群として記憶しておき、この学習画像群から生成した特徴量データと、利用者が被写体に対して任意の方向から撮影したクエリ画像から得た特徴量とを比較することで、任意の被写体を特定するものがある（例えば、非特許文献１参照）。しかし、被写体が３次元形状の場合、観測方向によって画像上の被写体の構造が変化するため、予め多数の画像群を学習画像群として撮影しておく必要があり、この作業に多大な手間がかかるという問題がある。 2. Description of the Related Art Conventionally, there is an image recognition technique for specifying a target using only an image taken by an imaging device such as a camera. For example, a group of images obtained by photographing subjects in advance from different angles is stored as a learning image group, and feature amount data generated from the learning image group and a query image captured by the user from any direction with respect to the subject. There is one that identifies an arbitrary subject by comparing the obtained feature amount (see, for example, Non-Patent Document 1). However, when the subject has a three-dimensional shape, the structure of the subject on the image changes depending on the observation direction. Therefore, it is necessary to capture a large number of image groups as learning image groups in advance, and this work takes a lot of work. There is a problem.

このような問題を解決するために、学習画像群の撮影の代わりに、仮想視点を移動させることによって予め入力された被写体の３次元形状とテクスチャから構成されるＣＧ（コンピュータグラフィックス）モデルに対し、コンピュータグラフィックス空間内で被写体を異なる方向から観測した学習画像群を合成し、この合成画像と利用者が被写体に対して任意の方向から撮影したクエリ画像の、両者から得た特徴量の照合によって対象を特定するものがある（例えば、非特許文献２参照）。 In order to solve such a problem, a CG (computer graphics) model composed of a three-dimensional shape and texture of a subject input in advance by moving a virtual viewpoint instead of shooting a learning image group is used. , A learning image obtained by observing the subject from different directions in the computer graphics space is synthesized, and the composite image and a query image taken by the user from the arbitrary direction of the subject are collated. (For example, see Non-Patent Document 2).

しかしながら、上述した従来技術は、学習画像群を全方向分生成する必要があり、加えて学習対象が複数ある場合には学習対象の分だけ学習画像群を全方向分生成する必要があり、計算処理量が膨大になるという問題がある。 However, the above-described conventional technology needs to generate learning image groups for all directions, and in addition, when there are a plurality of learning targets, it is necessary to generate learning image groups for all directions for the learning targets. There is a problem that the amount of processing becomes enormous.

そこでこれらの事情を鑑みて考案された手法として、被写体の３次元形状とテクスチャを基にこの被写体を包含する球上に被写体を投影し、この投影結果の球面展開によって、各観測方向の仮想視点画像を統合した仮想視点群統合画像を生成し、この仮想視点群統合画像と任意方向からのみ観測された少数の学習画像との照合により被写体を認識するものがある（非特許文献３参照）。この手法は入力側で被写体の全周囲情報を持つことにより、学習側で被写体の観測画像を大量に持たなくても対応付けできること、さらに被写体の３次元形状およびテクスチャを２次元画像として展開することで、学習画像との２次元マッチングが可能となり計算コストを下げることを特徴とする。よってこの手法によれば、任意方向分の学習画像を生成することなく高速な認識が可能となる。 Therefore, as a method devised in view of these circumstances, a subject is projected onto a sphere that includes the subject based on the three-dimensional shape and texture of the subject, and a virtual viewpoint in each observation direction is obtained by developing the spherical surface of the projection result. A virtual viewpoint group integrated image obtained by integrating images is generated, and a subject is recognized by collating the virtual viewpoint group integrated image with a small number of learning images observed only from an arbitrary direction (see Non-Patent Document 3). This method has information on the entire circumference of the subject on the input side, so that the learning side can be associated without having a large amount of observation images of the subject, and further, the three-dimensional shape and texture of the subject can be developed as a two-dimensional image. Thus, two-dimensional matching with the learning image is possible, and the calculation cost is reduced. Therefore, according to this method, high-speed recognition is possible without generating learning images for an arbitrary direction.

村瀬洋，Ｓ．Ｎａｙａｒ，“２次元照合による３次元物体認識−パラメトリック固有空間法″，信学論（Ｄ−ＩＩ），ｖｏｌ．Ｊ７７−Ｄ−ＩＩ，ｎｏ．１１，ｐｐ．２１７９−２１８７，Ｎｏｖ．１９９４．Murase Hiroshi, S.M. Nayar, “Three-dimensional object recognition by two-dimensional matching—parametric eigenspace method”, theory of theory (D-II), vol. J77-D-II, no. 11, pp. 2179-2187, Nov. 1994. 望戸雄史，渡辺義浩，小室孝，石川正俊，“Ａｎａｌｙｓｉｓ−Ｓｙｎｔｈｅｓｉｓ法を用いた三次元物体姿勢推定法のＧＰＵによる実装”，第１６回画像センシングシンポジウム，２０１０，講演論文集ＩＳ４−１７．Yuji Mochido, Yoshihiro Watanabe, Takashi Komuro, Masatoshi Ishikawa, “Implementation of 3D Object Pose Estimation Method Using Analysis-Synthesis Method by GPU”, 16th Image Sensing Symposium, 2010, IS4-17. 薮下浩子，島村潤，森本正志，小池秀樹，“被写体形状の球面展開に基づく３次元物体認識の一検討”，電子情報通信学会総合大会講演論文集２０１１年＿情報・システム（２），１５６．Hiroko Kinoshita, Jun Shimamura, Masashi Morimoto, Hideki Koike, “A Study on 3D Object Recognition Based on Spherical Expansion of Subject Shapes”, IEICE General Conference Proceedings 2011_Information System (2), 156. 薮下浩子，島村潤，森本正志，“被写体形状・テクスチャの球面展開に基づく３次元物体認識”，信学技報，ｖｏｌ．１１１，ｎｏ．３７９，ＰＲＭＵ２０１１−１５３，ｐｐ．７３−７８，２０１２．Hiroko Kinoshita, Jun Shimamura, Masashi Morimoto, “Three-dimensional object recognition based on spherical development of subject shape / texture”, IEICE Tech. 111, no. 379, PRMU 2011-153, pp. 73-78, 2012.

しかしながら、非特許文献３記載の技術は、３次元計測時もしくは被写体展開図生成時に生じる画像上の不均一なゆがみにより特徴が変化する。そのため被写体展開画像上において、予め被写体を撮影した観測方向の視点画像情報を含む箇所にこのゆがみが大きく発生すると、展開画像とデータベース画像とで画像から得られる特徴量が一致しなくなり、特徴点の対応誤りが発生するため、認識精度が下がるという問題がある。 However, in the technique described in Non-Patent Document 3, the feature changes due to non-uniform distortion on the image that occurs at the time of three-dimensional measurement or subject development drawing generation. For this reason, if this distortion greatly occurs on the subject development image in a portion including the viewpoint image information in the observation direction in which the subject is previously photographed, the feature amount obtained from the images does not match between the development image and the database image. Since a correspondence error occurs, there is a problem that recognition accuracy is lowered.

これに対して非特許文献４では、テクスチャ類似度に基づくスコア投票時に投票対象の特徴点自体の類似度のみならず周囲に存在する特徴点の類似度も考慮したスコア算出を行うことで、展開画像上の幾何歪みを主な理由とする誤った特徴点対応によって被写体ではないデータベース画像に高スコアが投票されることを防ぐ仕組みの投票を行うことよって前述の問題の解決を図っている。 On the other hand, in Non-Patent Document 4, the score calculation is performed by considering not only the similarity of the voting feature points themselves but also the similarity of the surrounding feature points at the time of voting based on the texture similarity. The above-mentioned problem is solved by voting in a mechanism that prevents a high score from being voted for a database image that is not a subject due to an erroneous feature point correspondence mainly due to geometric distortion on the image.

しかしながら、上記手法は被写体の形状を考慮しないため、データベース内に被写体と部分的に類似したテクスチャを有する物体を写した画像が存在すると、この物体を被写体として誤認識してしまうという問題がある。 However, since the above method does not consider the shape of the subject, there is a problem in that if an image in which an object having a texture partially similar to the subject exists in the database, this object is erroneously recognized as the subject.

本発明は、このような事情に鑑みてなされたもので、高速な被写体認識において被写体をより高精度に認識することができる被写体認識装置及び被写体認識プログラムを提供することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to provide a subject recognition device and a subject recognition program that can recognize a subject with higher accuracy in high-speed subject recognition.

本発明は、被写体の３次元形状データとテクスチャデータとに基づいて生成される被写体展開画像を用いて画像検索対象である複数のデータベース画像から被写体候補のデータベース画像を選択して出力する被写体認識装置であって、前記被写体展開画像から特徴箇所を抽出し、該被写体展開画像における特徴箇所の第１の局所記述子を算出するクエリ特徴表現手段と、画像検索対象のデータベース画像から特徴箇所を抽出し、該データベース画像における特徴箇所の第２の局所記述子を算出するデータベース特徴表現手段と、前記第１の局所記述子と前記第２の局所記述子とを比較した結果に基づいて生成した最近傍特徴点のペアから射影変換行列を求め、該射影変換行列を用いて前記最近傍特徴点のペアの絞り込みを行った後に類似度のスコアを算出し、該スコアに基づき前記被写体展開画像に類似する前記データベース画像を特定する照合手段とを備えることを特徴とする。 The present invention provides a subject recognition device that selects and outputs a database image of a subject candidate from a plurality of database images that are image search targets using a subject development image generated based on the three-dimensional shape data and texture data of the subject. A feature location is extracted from the subject development image, a query feature expression means for calculating a first local descriptor of the feature location in the subject development image, and a feature location is extracted from a database image to be searched. The database feature representation means for calculating the second local descriptor of the feature location in the database image, and the nearest neighbor generated based on the result of comparing the first local descriptor and the second local descriptor After obtaining a projective transformation matrix from the feature point pairs, and narrowing down the nearest feature point pair using the projective transformation matrix, the similarity degree Calculating the core, characterized in that it comprises a checking means for specifying the database images similar to the subject developed image on the basis of the score.

本発明は、前記照合手段は、所定の条件を満たす前記データベース画像のみに対して前記類似度のスコアを算出して、類似する前記データベース画像の特定を行うことを特徴とする。 The present invention is characterized in that the collating means calculates the similarity score only for the database image satisfying a predetermined condition and specifies the similar database image.

本発明は、前記照合手段は、ＲＡＮＳＡＣ法を用いて前記最近傍特徴点のペアの絞り込みを行うことを特徴とする。 The present invention is characterized in that the collating means narrows down the pair of nearest neighbor feature points using a RANSAC method.

本発明は、前記照合手段は、ＬＭｅｄｓ法を用いて前記最近傍特徴点のペアの絞り込みを行うことを特徴とする。 The present invention is characterized in that the collating means narrows down the pair of nearest neighbor feature points by using the LMeds method.

本発明は、前記照合手段は、前記射影変換行列を用いて前記第１の局所記述子に対応する３次元点を前記データベース画像上に投影した２次元点と、該データベース画像上の特徴点のうち、所定の条件を満たす特徴点との２次元距離を求め、該２次元距離を用いて前記スコアを算出することを特徴とする。 According to the present invention, the collating means includes a two-dimensional point obtained by projecting a three-dimensional point corresponding to the first local descriptor on the database image using the projective transformation matrix, and a feature point on the database image. Among them, a two-dimensional distance to a feature point that satisfies a predetermined condition is obtained, and the score is calculated using the two-dimensional distance.

本発明は、コンピュータを、前記被写体認識装置として機能させるための被写体認識プログラムである。 The present invention is a subject recognition program for causing a computer to function as the subject recognition device.

本発明によれば、照合時に、特徴点ペアの絞り込みを行った後に、類似度のスコアを算出して類似する画像を特定するようにしたため、高速性を保ったまま被写体認識精度をより向上させることができるという効果が得られる。 According to the present invention, at the time of collation, after narrowing down the feature point pairs, the similarity score is calculated and the similar images are specified, so that the subject recognition accuracy is further improved while maintaining high speed. The effect that it can be obtained.

本発明の一実施形態の構成を示すブロック図である。It is a block diagram which shows the structure of one Embodiment of this invention. 図１に示すクエリ特徴表現部１１の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the query characteristic expression part 11 shown in FIG. 図１に示すデータベース特徴表現部１２の処理動作を示すフローチャートである。It is a flowchart which shows the processing operation of the database feature representation part 12 shown in FIG. 図１に示す照合部１３が、クエリ特徴表現部１１が記憶部３に保存したクエリ特徴量と、データベース特徴表現部１２が記憶部３に保存したデータベース特徴量において、各データベース画像に属す特徴の局所記述子を比較し、被写体認識結果を出力する動作を示すフローチャートである。The collation unit 13 shown in FIG. 1 uses the query feature quantity stored in the storage unit 3 by the query feature expression unit 11 and the database feature quantity saved in the storage unit 3 by the database feature expression unit 12 to display the features belonging to each database image. It is a flowchart which shows the operation | movement which compares a local descriptor and outputs a subject recognition result. 幾何関係を満たさない誤対応特徴点ペアを省く処理を示す図である。It is a figure which shows the process which excludes the miscorresponding feature point pair which does not satisfy | fill geometric relationship. 投影後の点から一番近いデータベース画像上特徴点を探索する処理を示す図である。It is a figure which shows the process which searches the feature point on the database image nearest from the point after projection.

以下、図面を参照して、本発明の実施形態による被写体認識装置を説明する。図１は本実施形態の構成を示すブロック図である。被写体認識装置はコンピュータ装置によって構成する。図１において、符号１は、被写体認識処理を行う被写体認識部である。符号２は、カメラ等で撮影して得られた２次元画像を入力する画像入力部である。符号３は、被写体認識処理に必要なデータを記憶する記憶部である。符号４はキーボード等から構成する入力部である。符号５は、表示装置等から構成する表示部である。 Hereinafter, an object recognition apparatus according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing the configuration of this embodiment. The subject recognition device is configured by a computer device. In FIG. 1, reference numeral 1 denotes a subject recognition unit that performs subject recognition processing. Reference numeral 2 denotes an image input unit for inputting a two-dimensional image obtained by photographing with a camera or the like. Reference numeral 3 denotes a storage unit that stores data necessary for subject recognition processing. Reference numeral 4 denotes an input unit composed of a keyboard or the like. Reference numeral 5 denotes a display unit composed of a display device or the like.

符号１１は、画像入力部２を介して入力した被写体の３次元モデル（形状の３次元座標データ）とテクスチャ（３次元形状表面の模様データ）から展開画像を生成し、この展開画像から特徴箇所を抽出し、この特徴箇所の局所記述子を算出するクエリ特徴表現部である。符号１２は、予めさまざまな物体について１物体ずつ、この物体に対して任意の角度から撮影した画像検索対象データベース画像を記憶部３から読み出し、このすべてのデータベース画像において特徴箇所を抽出し、特徴箇所の局所記述子を算出するデータベース特徴表現部である。符号１３は特徴表現部１１で算出したクエリ特徴量と、符号１２で算出したデータベース特徴量を比較し、幾何的制約に基づく投票を行い、クエリ特徴量に最も類似するデータベース画像を特定し、このデータベース画像のｉｄを被写体認識結果として出力する照合部である。 Reference numeral 11 denotes a developed image generated from a three-dimensional model (three-dimensional coordinate data of the shape) and a texture (pattern data of the surface of the three-dimensional shape) input via the image input unit 2, and a feature location is generated from the developed image. Is a query feature expression unit that calculates a local descriptor of this feature location. Reference numeral 12 denotes an image search target database image obtained by photographing various objects one by one in advance from an arbitrary angle with respect to the object from the storage unit 3, and extracting feature points in all the database images. It is a database feature expression part which calculates the local descriptor of. Reference numeral 13 compares the query feature amount calculated by the feature representation unit 11 with the database feature amount calculated by reference numeral 12, performs voting based on geometric constraints, specifies a database image most similar to the query feature amount, It is a collation part which outputs id of a database image as a subject recognition result.

次に、図１に示す被写体認識装置の処理動作を説明する。ここでは、クエリとして被写体の３次元形状及びテクスチャが入力され、１枚もしくは複数の２次元画像を検索対象データベースとして予め記憶部３に記憶しておくものとして説明する。なお、２次元画像群のデータは対象の被写体を任意の角度から撮影したものとし、予め対象の被写体毎に唯一に付与された被写体ＩＤや名前などの各種情報が関連づけられて保存されているものとする。また、被写体の３次元形状データとテクスチャデータは、例えば携帯電話等に搭載された撮像装置で対象を撮影し、その画像から画像処理によって生成したものでもよいし、レンジファインダ等のセンサーによって計測したものでもよい。また、コンピュータグラフィックス（ＣＧ）技術によって手動で生成したものでもよい。 Next, the processing operation of the subject recognition apparatus shown in FIG. 1 will be described. Here, a description will be given assuming that a three-dimensional shape and texture of a subject are input as a query, and one or a plurality of two-dimensional images are stored in the storage unit 3 in advance as a search target database. Note that the data of the two-dimensional image group is obtained by photographing a target subject from an arbitrary angle, and is stored in advance in association with various information such as a subject ID and a name uniquely assigned to each target subject in advance. And Further, the three-dimensional shape data and texture data of the subject may be generated by photographing an object with an imaging device mounted on a mobile phone, for example, and image processing from the image, or measured by a sensor such as a range finder. It may be a thing. Alternatively, it may be manually generated by computer graphics (CG) technology.

次に、図２を参照して、図１に示すクエリ特徴表現部１１の処理動作を説明する。図２は、図１に示すクエリ特徴表現部１１の処理動作を示すフローチャートである。まず、クエリ特徴表現部１１は、画像入力部２から３次元モデルデータ（被写体の３次元モデル（形状の３次元座標データ）とテクスチャ（３次元形状表面の模様データ））を入力する（ステップＳ１）。そして、クエリ特徴表現部１１は、入力した被写体の３次元モデル（形状の３次元座標データ）とテクスチャ（３次元形状表面の模様データ）から展開画像を生成する（ステップＳ２）。 Next, the processing operation of the query feature expression unit 11 shown in FIG. 1 will be described with reference to FIG. FIG. 2 is a flowchart showing the processing operation of the query feature expression unit 11 shown in FIG. First, the query feature expression unit 11 inputs three-dimensional model data (a three-dimensional model (three-dimensional coordinate data of a shape) and a texture (pattern data of a three-dimensional shape surface)) from the image input unit 2 (step S1). ). Then, the query feature representation unit 11 generates a developed image from the input three-dimensional model (three-dimensional coordinate data of the shape) and texture (pattern data of the three-dimensional shape surface) (step S2).

次に、クエリ特徴表現部１１は、生成した展開画像から特徴箇所を検出し（ステップＳ３）、この特徴箇所の局所記述子を算出する（ステップＳ４）。クエリ特徴表現部１１は、算出した特徴箇所の局所記述子を記憶部３に保存する（ステップＳ５）。これにより、入力した３次元モデルデータの特徴箇所の局所記述子がクエリ特徴量として記憶部３に記憶されたことになる。 Next, the query feature expression unit 11 detects a feature location from the generated developed image (step S3), and calculates a local descriptor of this feature location (step S4). The query feature expression unit 11 stores the calculated local descriptor of the feature location in the storage unit 3 (step S5). Thereby, the local descriptor of the feature location of the input 3D model data is stored in the storage unit 3 as the query feature value.

次に、図３を参照して、図１に示すデータベース特徴表現部１２の処理動作を説明する。図３は、図１に示すデータベース特徴表現部１２の処理動作を示すフローチャートである。まず、データベース特徴表現部１２は、記憶部３から全ての２次元画像データ（データベース画像という）を入力する（ステップＳ１１）。この２次元画像データは、予めさまざまな物体について１物体ずつ、この物体に対して任意の角度から撮影した画像である。 Next, the processing operation of the database feature representation unit 12 shown in FIG. 1 will be described with reference to FIG. FIG. 3 is a flowchart showing the processing operation of the database feature representation unit 12 shown in FIG. First, the database feature representation unit 12 inputs all the two-dimensional image data (referred to as database images) from the storage unit 3 (step S11). The two-dimensional image data is an image obtained by photographing various objects in advance one by one with respect to this object from an arbitrary angle.

次に、データベース特徴表現部１２は、入力したデータベース画像において、特徴箇所を検出し（ステップＳ１２）、この特徴箇所の局所記述子を算出する（ステップＳ１３）。この局所記述子の算出は、特徴箇所の数だけ行う。そして、この処理を入力したデータベース画像の数だけ繰り返し、データベース特徴表現部１２は、算出した特徴箇所の局所記述子を記憶部３に保存する（ステップＳ１４）。これにより、記憶部３に記憶されている画像検索対象のデータベース画像特徴箇所の局所記述子がデータベース特徴量として記憶部３に記憶されたことになる。 Next, the database feature representation unit 12 detects a feature location in the input database image (step S12), and calculates a local descriptor of this feature location (step S13). This local descriptor is calculated by the number of feature locations. Then, this process is repeated for the number of input database images, and the database feature representation unit 12 stores the calculated local descriptor of the feature location in the storage unit 3 (step S14). As a result, the local descriptor of the database image feature location of the image search target stored in the storage unit 3 is stored in the storage unit 3 as the database feature amount.

次に、図４を参照して図１に示す照合部１３が、クエリ特徴表現部１１が記憶部３に保存したクエリ特徴量と、データベース特徴表現部１２が記憶部３に保存したデータベース特徴量において、各データベース画像に属す特徴の局所記述子を比較し、被写体認識結果を出力する動作を説明する。図４は、図１に示す照合部１３が、クエリ特徴表現部１１が記憶部３に保存したクエリ特徴量と、データベース特徴表現部１２が記憶部３に保存したデータベース特徴量において、各データベース画像に属す特徴の局所記述子を比較し、被写体認識結果を出力する動作を示すフローチャートである。 Next, referring to FIG. 4, the matching unit 13 shown in FIG. 1 performs the query feature quantity saved in the storage unit 3 by the query feature representation unit 11 and the database feature quantity saved in the storage unit 3 by the database feature representation unit 12. The operation of comparing the local descriptors of the features belonging to each database image and outputting the subject recognition result will be described. FIG. 4 shows the database feature image in the query feature quantity saved by the query feature representation section 11 in the storage section 3 and the database feature quantity saved in the storage section 3 by the database feature representation section 12 shown in FIG. 5 is a flowchart showing an operation of comparing local descriptors of features belonging to, and outputting a subject recognition result.

まず、照合部１３は記憶部３からクエリ特徴量とデータベース特徴量を読み出す（ステップＳ２１）。次に、照合部１３は読み出したデータベース特徴量のうち、あるデータベース画像１枚に属すデータベース特徴量のみを対象として、クエリ特徴量と比較を行う。比較処理ではまずクエリ特徴量と対象とするデータベース画像の特徴量間（対象データベース特徴量）で、最近傍特徴点ペアセットを生成する（ステップＳ２２）。この最近傍特徴点ペアは例えば以下の方法によって生成する。まず、各クエリ特徴量において対象データベース特徴量の中からベクトル間距離が最も小さい特徴点を選ぶ。そして、次は逆に各対象データベース特徴量においてクエリ特徴量の中からベクトル間距離が最も小さい特徴点を選ぶ。互いに最もベクトル間距離が小さい特徴点が一致した時、このクエリ特徴量中の特徴点と、対象データベース特徴量の特徴点を、特徴点ペアとする。 First, the collation unit 13 reads a query feature value and a database feature value from the storage unit 3 (step S21). Next, the matching unit 13 compares only the database feature amount belonging to one database image among the read database feature amounts with the query feature amount. In the comparison process, a nearest feature point pair set is first generated between the query feature quantity and the feature quantity of the target database image (target database feature quantity) (step S22). This nearest feature point pair is generated by the following method, for example. First, a feature point having the smallest vector distance is selected from the target database feature values in each query feature value. Then, on the contrary, the feature point having the smallest inter-vector distance is selected from the query feature amounts in each target database feature amount. When the feature points having the shortest inter-vector distance match each other, the feature point in the query feature amount and the feature point of the target database feature amount are set as a feature point pair.

なお、ここでは被写体上で同一の点を示すであろう２つの特徴点（１つはクエリ、もう１つはある１枚のデータベース画像上の点）を正しくペアリングするために、相互に最もベクトル間距離が小さい特徴点として一致した場合に特徴点ペアとしたが、どちらか片方に属す特徴点から最もベクトル間距離が小さい特徴点を求め、それをもって特徴点ペアとしてもよい。片側のみから算出すればその分計算量を少なくできる。 It should be noted that here, in order to correctly pair two feature points (one is a query and the other is a point on one database image) that will show the same point on the subject, Although the feature point pair is determined when the feature points match with a small distance between vectors, a feature point with the shortest distance between vectors may be obtained from feature points belonging to one of the feature points and may be used as a feature point pair. If it is calculated from only one side, the amount of calculation can be reduced accordingly.

次に、生成した特徴点ペアセットを為す、クエリ特徴量の特徴点の３次元座標位置とデータベース画像の特徴点の２次元座標間で非特許文献４記載のＰｎＰ（Perspective-n-Point）問題を解き、幾何関係式（射影変換行列）を求める（ステップＳ２３）。なお、必要となるカメラの内部パラメータは事前に与えておくものとする。この際、外れ値を排除する手法としてロバスト推定法を用いる。例えば、非特許文献４同様に、ロバスト推定法の一種であるＲａｎｄｏｍＳａｍｐｌｅＣｏｎｓｅｎｓｕｓ（ＲＡＮＳＡＣ）法を適用する。この処理により、特徴点ペアセットの幾何関係を算出し、特徴点ペアセットの中から幾何制約を満たさない誤対応の特徴点ペアを省くことができる。 Next, the PnP (Perspective-n-Point) problem described in Non-Patent Document 4 is performed between the three-dimensional coordinate position of the feature point of the query feature quantity and the two-dimensional coordinate of the feature point of the database image. To obtain a geometric relational expression (projection transformation matrix) (step S23). Note that the necessary camera internal parameters are given in advance. At this time, a robust estimation method is used as a method of eliminating outliers. For example, as in Non-Patent Document 4, the Random Sample Consensus (RANSAC) method, which is a kind of robust estimation method, is applied. By this process, the geometric relationship of the feature point pair set can be calculated, and the miscorresponding feature point pair that does not satisfy the geometric constraint can be omitted from the feature point pair set.

なお、ここでは誤対応特徴点ペアの除去方法にＲＡＮＳＡＣを用いたが他の方法（例えばＬＭｅｄｓやその他の方法）によって置き換えることも可能である。図５は、幾何関係を満たさない誤対応特徴点ペアを省く処理を示す図である。図５において、破線で示した特徴点ペアが誤対応特徴点ペアである。 In this example, RANSAC is used as a method for removing the miscorresponding feature point pair, but it can be replaced by other methods (for example, Lmeds or other methods). FIG. 5 is a diagram illustrating a process of omitting a miscorresponding feature point pair that does not satisfy the geometric relationship. In FIG. 5, feature point pairs indicated by broken lines are miscorresponding feature point pairs.

次に、誤対応を省いたインライア特徴点ペアセットの特徴量で類似度（スコアＳ）を算出し（ステップＳ２４）、投票を行う（ステップＳ２５）。ここではスコア算出は式１によって行う。特徴ベクトル間距離をｄとおく。

ここで説明した式（１）では、スコア算出時の特徴ベクトル間距離に対して逆数を用いたが、ガウス関数や指数関数を用いてスコア算出式を設定してもよいし、この限りではない。 Next, the similarity (score S) is calculated from the feature amount of the inlier feature point pair set without erroneous correspondence (step S24), and voting is performed (step S25). Here, the score is calculated according to Equation 1. Let the distance between feature vectors be d.

In the formula (1) described here, the reciprocal is used for the distance between feature vectors at the time of score calculation. However, the score calculation formula may be set using a Gaussian function or an exponential function, and the present invention is not limited to this. .

そして、特徴点ペアセット生成処理、外れ値排除処理、投票処理を、すべてのデータベース画像の特徴量に対して実施することにより、すべてのデータベース画像において投票結果を得る。すべての検索対象データベース内画像において１番合計スコア値が大きくなる画像を求め、マッチング結果（***結果）として出力する（ステップＳ２６）。 Then, the feature point pair set generation process, the outlier elimination process, and the voting process are performed on the feature amounts of all the database images, thereby obtaining the voting results in all the database images. An image having the largest total score value among all the search target database images is obtained and output as a matching result (voting result) (step S26).

なお、データベース画像ごとに、クエリ特徴量の特徴点との特徴点ペアセットを為す３次元−２次元特徴点間で幾何関係式を算出し、ロバスト推定を行った結果のインライア特徴点ペアセット数による比較を行い、このインライアペアセット数の多い順に規定の枚数のデータベース画像のみを対象として式（２）によって求めたスコア投票を実施しても良い。これによって照合処理コストの削減が可能である。 For each database image, the number of inlier feature point pair sets as a result of calculating a geometric relation between 3D-2D feature points that form a feature point pair set with a feature point of a query feature quantity and performing robust estimation The score voting obtained by the equation (2) may be performed only on a prescribed number of database images in the descending order of the number of inlier pair sets. This can reduce the verification processing cost.

また、前述した説明ではインライア特徴点ペアセットのみを用いて投票を実施したが、データベース画像に複数の類似した被写体を写した画像を含む場合について、更に下記の処理を用いて照合精度を上げることも可能である。 In the above description, voting is performed using only the inlier feature point pair set. However, in the case where the database image includes an image showing a plurality of similar subjects, the following processing is used to further increase the matching accuracy. Is also possible.

まず特徴点ペアセットから求めた射影変換式を用いて、全クエリ特徴量の３次元点を２次元データベース画像上に投影する。そして、図６に示すように、投影後の点から一番近いデータベース画像上特徴点を探索し、投影点とこの特徴点との２次元空間距離Ｄ、およびこの特徴点と投影もとの３次元点の特徴量の特徴ベクトル間距離ｄに基づき、式（２）で求めたスコアＳを投票する。図６は、投影後の点から一番近いデータベース画像上特徴点を探索する処理を示す図である。

First, using the projective transformation formula obtained from the feature point pair set, the three-dimensional points of all query feature values are projected onto the two-dimensional database image. Then, as shown in FIG. 6, the feature point on the database image closest to the projected point is searched, and the two-dimensional spatial distance D between the projected point and the feature point, and the feature point and the projection source 3 Based on the feature vector distance d of the feature quantity of the dimension point, the score S obtained by Expression (2) is voted. FIG. 6 is a diagram illustrating a process of searching for a feature point on the database image that is closest to the projected point.

ここで説明した式（２）では、スコア算出時の特徴ベクトル間距離ｄおよび２次元空間距離Ｄに対して逆数を用いたが、ガウス関数や指数関数を用いてスコア算出式を設定してもよいし、この限りではない。上記投票処理をすべてのデータベース画像の特徴量に対して行うことにより、すべてのデータベース画像に対する投票結果を得る。すべての検索対象データベース内画像において１番合計スコア値が大きくなる画像を求め、マッチング結果として出力する。これにより、特徴点ペアセット生成時、クエリ画像から抽出した特徴量歪みの影響でペアリングされなかった特徴点ペアを幾何的整合性からマッチングし直し、より詳細な比較を行うことができる。 In Equation (2) described here, the reciprocal number is used for the distance d between feature vectors and the two-dimensional spatial distance D at the time of score calculation, but the score calculation equation may be set using a Gaussian function or an exponential function. Good and not limited to this. By performing the voting process on the feature values of all database images, voting results for all database images are obtained. An image having the largest total score value among all the search target database images is obtained and output as a matching result. Thereby, when generating a feature point pair set, feature point pairs that have not been paired due to the influence of the feature amount distortion extracted from the query image can be rematched from the geometric consistency, and a more detailed comparison can be performed.

以上説明したように、照合時に、幾何的な制約を導入した投票を行うことで被写体形状およびテクスチャを同時に考慮する被写体認識を実現し、被写体認識精度を向上させることができる。 As described above, at the time of collation, subject recognition that considers the subject shape and texture at the same time can be realized by performing voting that introduces geometric constraints, and the subject recognition accuracy can be improved.

なお、図１における処理部の機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより被写体認識処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 A program for realizing the function of the processing unit in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed to execute subject recognition processing. You may go. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

以上、図面を参照して本発明の実施の形態を説明してきたが、上記実施の形態は本発明の例示に過ぎず、本発明が上記実施の形態に限定されるものではないことは明らかである。したがって、本発明の技術思想及び範囲を逸脱しない範囲で構成要素の追加、省略、置換、その他の変更を行っても良い。 As mentioned above, although embodiment of this invention has been described with reference to drawings, the said embodiment is only the illustration of this invention, and it is clear that this invention is not limited to the said embodiment. is there. Accordingly, additions, omissions, substitutions, and other changes of the components may be made without departing from the technical idea and scope of the present invention.

撮像装置で撮影した画像を使用して被写体を認識することが不可欠な用途に適用できる。 The present invention can be applied to an indispensable use of recognizing a subject using an image captured by an imaging device.

１・・・被写体認識部、２・・・画像入力部、３・・・記憶部、４・・・入力部、５・・・表示部、１１・・・クエリ特徴表現部、１２・・・データベース特徴表現部、１３・・・照合部 DESCRIPTION OF SYMBOLS 1 ... Subject recognition part, 2 ... Image input part, 3 ... Memory | storage part, 4 ... Input part, 5 ... Display part, 11 ... Query feature expression part, 12 ... Database feature expression unit, 13 ... collation unit

Claims

被写体の３次元形状データとテクスチャデータとに基づいて生成される被写体展開画像を用いて画像検索対象である複数のデータベース画像から被写体候補のデータベース画像を選択して出力する被写体認識装置であって、
前記被写体展開画像から特徴箇所を抽出し、該被写体展開画像における特徴箇所の第１の局所記述子を算出するクエリ特徴表現手段と、
画像検索対象のデータベース画像から特徴箇所を抽出し、該データベース画像における特徴箇所の第２の局所記述子を算出するデータベース特徴表現手段と、
前記第１の局所記述子と前記第２の局所記述子とを比較した結果に基づいて生成した最近傍特徴点のペアから射影変換行列を求め、該射影変換行列を用いて前記最近傍特徴点のペアの絞り込みを行った後に類似度のスコアを算出し、該スコアに基づき前記被写体展開画像に類似する前記データベース画像を特定する照合手段と
を備えることを特徴とする被写体認識装置。 A subject recognition device that selects and outputs a candidate database image from a plurality of database images to be searched using a subject development image generated based on the three-dimensional shape data and texture data of the subject,
Query feature expression means for extracting a feature location from the subject development image and calculating a first local descriptor of the feature location in the subject development image;
Database feature representation means for extracting a feature location from a database image to be searched for an image and calculating a second local descriptor of the feature location in the database image;
Obtaining a projective transformation matrix from a pair of nearest neighbor feature points generated based on a result of comparing the first local descriptor and the second local descriptor, and using the projective transformation matrix, the nearest neighbor feature points And a collating unit that calculates a similarity score after narrowing down the pairs and identifies the database image similar to the developed subject image based on the score.

前記照合手段は、所定の条件を満たす前記データベース画像のみに対して前記類似度のスコアを算出して、類似する前記データベース画像の特定を行うことを特徴とする請求項１に記載の被写体認識装置。 The subject recognition apparatus according to claim 1, wherein the collating unit calculates the similarity score only for the database image satisfying a predetermined condition, and identifies the similar database image. .

前記照合手段は、ＲＡＮＳＡＣ法を用いて前記最近傍特徴点のペアの絞り込みを行うことを特徴とする請求項１または２に記載の被写体認識装置。 The subject recognition apparatus according to claim 1, wherein the matching unit narrows down the pair of nearest feature points using a RANSAC method.

前記照合手段は、ＬＭｅｄｓ法を用いて前記最近傍特徴点のペアの絞り込みを行うことを特徴とする請求項１または２に記載の被写体認識装置。 The subject recognition apparatus according to claim 1, wherein the matching unit narrows down the pairs of the nearest feature points using an Lmeds method.

前記照合手段は、
前記射影変換行列を用いて前記第１の局所記述子に対応する３次元点を前記データベース画像上に投影した２次元点と、該データベース画像上の特徴点のうち、所定の条件を満たす特徴点との２次元距離を求め、該２次元距離を用いて前記スコアを算出することを特徴とする請求項１から４のいずれか１項に記載の被写体認識装置。 The verification means includes
A two-dimensional point obtained by projecting a three-dimensional point corresponding to the first local descriptor on the database image using the projective transformation matrix, and a feature point satisfying a predetermined condition among the feature points on the database image The subject recognition apparatus according to claim 1, wherein the score is calculated using the two-dimensional distance.

コンピュータを、請求項１から５のいずれか１項に記載の被写体認識装置として機能させるための被写体認識プログラム。 A subject recognition program for causing a computer to function as the subject recognition device according to any one of claims 1 to 5.