JP4443722B2

JP4443722B2 - Image recognition apparatus and method

Info

Publication number: JP4443722B2
Application number: JP2000123604A
Authority: JP
Inventors: 大器増本; 直毅指田; 博紀北川; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2000-04-25
Filing date: 2000-04-25
Publication date: 2010-03-31
Anticipated expiration: 2020-04-25
Also published as: JP2001307096A; US6888955B2; US20010038714A1

Description

【０００１】
【発明の属する技術分野】
本発明は、物体の画像情報を変換した物体モデルをデータベースに蓄積し、画像認識時にデータベースを照会して物体を認識する画像認識装置に関する。
【０００２】
【従来の技術】
インターネット等に代表されるコンピュータネットワークの進展に伴い、誰でも容易に様々な情報へアクセスすることができるようになった反面、アクセスしているのが本人であるかどうか確認する技術、すなわち認証技術の重要性が高まっている。詐称者を本人と誤らない、あるいは本人を詐称者として棄却する確率を最小限にする必要があるからである。
【０００３】
かかる技術分野で最近注目されている技術の一つが、顔の画像による認証技術である。顔は指紋や声紋と同様、本人に固有のものだからであり、画像処理技術の進展によって識別判断の基準として用いる対象となりうるからである。
【０００４】
顔画像を認証判断の基準とする方法については、従来から種々の方法が開示されている。例えば、特願平１１−１１００２０号においては、入力画像から撮影環境の状態を示す環境パラメータと対象物の状態を示す対象状態パラメータ値とを推定し、その値を用いて、入力画像の撮影環境及び対象物の状態が登録画像の撮影環境及び対象物の状態に一致するように補正した「照合用画像」を用いて認識を行う技術が開示されている。
【０００５】
以下、開示されている環境パラメータ及び対象状態パラメータを用いた当該画像認識処理について図１から図４を参照しながら説明する。まず図１は、当該画像認識処理のデータベースへの登録フェーズにおける処理の流れを示す。
【０００６】
図１において、まず登録対象となる画像を入力する（ステップＳ１１）。ここでの画像入力は、正面から撮影した顔画像１枚で良いが、認識精度を高めるためには、正面画像の他に、様々な方向から撮影した顔画像を用意することが望ましい。
【０００７】
次に、入力した画像から、顔領域を切り出して（ステップＳ１２）、顔領域の画像を得る（ステップＳ１３）。すなわち、図２に示すように、登録対象となる画像上で顔領域を矩形領域で切り出す。
【０００８】
そして、得られた顔領域画像を各ピクセルを要素に持つＮ次元ベクトルとみなし、そのベクトルをｎ次元（ｎ≦Ｎ）の部分空間に射影し（ステップＳ１４）、その射影点をＰと表す。図２では、「ｓａｓｈｉｄａ」の１点に射影されている。
【０００９】
さらに、撮影環境の状態を示す環境パラメータ値ｅ、対象物の状態を表す対象状態パラメータ値ｓを推定し、その値と射影点Ｐとをペアにしてデータベースに登録する（ステップＳ１５）。ここで、画像から撮影環境の状態を示す環境パラメータ値ｅ、対象物の状態を表す対象状態パラメータ値ｓを推定する一般性のある方法は開示されていない。
【００１０】
次に、図３は当該画像認識処理における認識フェーズにおける処理の流れを示している。図３において、画像の入力から顔領域画像の切り出しまで（ステップＳ３１〜ステップＳ３３）は、図１に示した登録フェーズの場合（ステップＳ１１〜ステップＳ１３）と同様になる。
【００１１】
したがって、部分空間への射影は、図４に示すように「ｓａｓｈｉｄａ」の１点に射影される。
【００１２】
一方、入力画像から撮影環境の状態を示す環境パラメータ値ｅと対象物の状態を表す対象状態パラメータ値ｓを推定する。次に、あらかじめ登録されている登録画像の環境パラメータ値ｅと対象状態パラメータ値ｓと一致するように、入力画像から推定したパラメータ値を調整する。この調整によって、入力画像の撮影環境及び対象物の状態が、登録画像の撮影環境及び対象物の状態に一致するような照合用画像を生成する。この照合用画像を部分空間に射影して点Ｑを得る（ステップＳ３４）。
【００１３】
そうすることで、登録画像と照合用画像とは、照明等の撮影環境、対象物の位置や姿勢等の状態について同一条件で比較することになる。しかし、パラメータ値を調整して、入力画像の撮影環境及び対象物の状態が、登録画像の撮影環境及び対象物の状態に一致するような照合用画像を生成する一般性のある方法は開示されていない。
【００１４】
次に、登録されているＰとＱの部分空間上での距離を計算する（ステップＳ３５）。登録画像すべてについて、同様にして当該空間的距離を算出して、最近接点Ｐ_mを探す（ステップＳ３６）。
【００１５】
最後に、最近接点Ｐ_mに該当する登録画像を入力画像に対応するものとして認識することになる（ステップＳ３７）。
【００１６】
【発明が解決しようとする課題】
しかし、上述したような方法では、（１）画像から撮影環境の状態を示す環境パラメータ値、対象物の状態を表す対象状態パラメータ値を推定すること、（２）パラメータ値を調整して、入力画像の撮影環境及び対象物の状態が、登録画像の撮影環境及び対象物の状態に一致するような照合用画像を生成することがポイントとなっているにもかかわらず、これらの処理を実現する一般的な方法は知られていない。
【００１７】
特願平１１−１１００２０号においては、環境パラメータのうち照明パラメータを、顔領域画像の輝度値の平均値、分散、ヒストグラムから推定することや、環境パラメータのうちカメラパラメータとして、撮影に利用したカメラの解像度や、フォーカス、露出を用いることが提案されている。また、対象状態パラメータを、顔領域画像内の肌色占有面積を利用して推定することが提案されている。
【００１８】
しかし、（１）このようなパラメータ値を正しく推定することは一般に困難である。さらに、（２）１枚あるいは少数の画像からこれらのパラメータが変化したときに画像がどのように変化するかをモデル化することも困難である。したがって、上述の方法を実際に認識処理に適用するのは困難であると考えられる。
【００１９】
そのため、画像登録時に正面から撮影した顔画像を利用していることから、入力時に顔の向きが異なっていたり、照明条件が異なっている場合には、本人を詐称者であるものとして、あるいは詐称者を本人であるものとして誤認識する可能性があるという問題点があった。
【００２０】
本発明は、上記問題点を解消するために、画像認識時における入力画像の撮影条件に依存せずに、登録画像との照合を精度良く行うことができる画像認識装置及び方法を提供することを目的とする。
【００２１】
【課題を解決するための手段】
上記目的を達成するために本発明にかかる画像認識装置は、撮影環境の変動による物体の見え方の変化を推定してモデル化する物体モデル化実行部と、物体モデル化実行部において得られた物体モデルを事前にデータベースへ登録しておく物体モデル登録部とを有し、認識対象となる物体の画像情報を入力する画像情報入力部と、物体モデル登録部において事前に登録されている物体モデルと入力された画像情報を照合して、登録されている物体モデルとの類似度を割り当てる類似度判断部と、割り当てられた物体モデルの中で最も類似していると判断された認識対象となる物体の種別を出力する物体認識部とを含み、物体モデル化実行部において、固定された画像情報入力部に対する物体の相対的な位置及び姿勢を変化させて撮影した複数の画像情報を入力し、入力された複数の画像情報に基づいて将来起こりうる撮影環境変動による物体の見え方の変化を推定してモデル化することを特徴とする。
【００２２】
かかる構成により、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動や照明条件の相異による見え方の変動等に左右されることなく、登録されている物体モデルとの照合を精度良く行うことが可能となる。
【００２３】
また、本発明にかかる画像認識装置は、認識対象となる物体の表面特性としてランバーシャン反射モデルを仮定することが好ましい。照明変動による見え方の変動を予測しやすいからである。
【００２４】
また、本発明にかかる画像認識装置は、画像情報入力部において、画像から認識対象となる物体が存在する部分を切り出し、切り出された部分画像を用いて認識対象となる物体のモデル化を行うことが好ましい。余分な画像情報による誤認識を防止することができるからである。
【００２５】
また、本発明にかかる画像認識装置は、画像情報入力部において、画像から認識対象となる物体中の特徴的な小領域を選択し、選択された小領域に含まれる情報と小領域の配置情報に基づいて認識対象となる物体のモデル化を行うことが好ましい。特徴部分が画像によって部分的に隠された状態となっている場合も対応することができるからである。
【００２６】
また、サンプルデータが少ない場合、本発明にかかる画像認識装置は、物体モデル化実行部において、入力された画像情報に基づいて、物体の姿勢変化による見え方の変動と照明条件の変化による見え方の変動とを分離してモデル化を行うことが好ましい。サンプルデータが少ない場合であっても、正確に見え方の変動を推定することができるからである。
【００２７】
また、サンプルデータが十分に有る場合、本発明にかかる画像認識装置は、物体モデル化実行部において、入力された画像情報に基づいて、物体の姿勢変化による見え方の変動と照明条件の変化による見え方の変動とを分離せずにモデル化を行うことが好ましい。サンプルデータが十分にある場合においては、あえて分離してモデル化を行うことによって、近似的にモデル化する必要はなく、直接的に見え方の変動を求めることができるからである。
【００２８】
また、本発明は、上記のような画像認識装置の機能をコンピュータの処理ステップとして実行するソフトウェアを特徴とするものであり、具体的には、撮影環境の変動による物体の見え方の変化を推定してモデル化する工程と、得られた物体モデルを事前にデータベースへ登録しておく工程とを有し、認識対象となる物体の画像情報を入力する工程と、事前に登録されている物体モデルと入力された画像情報を照合して、登録されている物体モデルとの類似度を割り当てる工程と、割り当てられた物体モデルの中で最も類似していると判断された認識対象となる物体の種別を出力する工程とを含み、物体の相対的な位置及び姿勢を変化させて撮影した複数の画像情報を入力し、入力された複数の画像情報に基づいて将来起こりうる撮影環境変動による物体の見え方の変化を推定してモデル化する画像認識方法並びにそのような工程をプログラムとして記録したコンピュータ読み取り可能な記録媒体であることを特徴とする。
【００２９】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動や照明条件の相異による見え方の変動等に左右されることなく、登録されている物体モデルとの照合を精度良く行うことができる画像認識装置を実現することが可能となる。
【００３０】
【発明の実施の形態】
（実施の形態１）
以下、本発明の実施の形態１にかかる画像認識装置について、図面を参照しながら説明する。図５は本発明の実施の形態１にかかる画像認識装置の構成図である。図５において、５１は画像情報入力部を、５２は物体モデル化実行部を、５３は物体モデル登録部を、５４は物体モデルデータベースを、５５は類似度判断部を、５６は物体認識部を、それぞれ示す。
【００３１】
図５において、画像情報入力部５１は、認識対象となる画像を撮影するカメラや、当該カメラによって撮影された写真等を読み込むスキャナ、撮影された画像を圧縮して磁気記録媒体に保存しているファイルの読込装置等の、画像情報を入力するために用意された部分を示している。そして、画像情報入力部５１から入力された画像情報に基づいて、物体モデル化実行部５２において認識対象となるべき物体について、モデル化することになる。
【００３２】
物体モデル化実行部５２における画像情報のモデル化手法には種々の方法が考えられる。例えば特願平１１−１１００２０号においては、上述したように特徴パラメータを用いて物体モデルを一義的に表す方法が開示されている。
【００３３】
しかし、このようなモデル化手法では、以下の問題が発生する。まず、モデル化する際に入力する画像が、一つの物体について一つのみであることから、光源の位置や照度等の相異によって、同一の物体を同一のカメラ位置で撮影した場合であっても、異なる物体であると誤認識されてしまう可能性が残されるという問題である。
【００３４】
また、光源の位置や照度等が同一であっても、カメラと物体の位置が相異すれば、同様に異なる物体であると認識されてしまう可能性が高いという問題である。すなわち、カメラの角度やカメラとの距離が相異すれば、撮影される画像の大小や角度が大きく変化し、固有空間上の位置が大きく前後することで、異なる物体であると誤認識されてしまう可能性が高くなることも十分に予想されるからである。
【００３５】
かかる問題を解決するため、本実施の形態においては、登録時に、固定された画像情報入力部に対して物体の姿勢を連続的に変化させて、その連続画像に基づいて、入力時の環境変動、すなわち照明条件、対象物体の状態（カメラとの相対姿勢や相対距離）の相異によって画像がどのように変化するのかを予測し、当該予測に基づいた物体モデルを、部分空間として物体モデルデータベース５４に登録する点に特徴を有する。
【００３６】
以下、本実施の形態にかかる画像認識装置におけるモデル化手法について、図６及び図７を参照しながら説明する。まず、図６は本実施の形態にかかる画像認識装置における登録フェーズのモデル化処理の流れを示している。
【００３７】
図６に示すように、まず画像を入力するのであるが（ステップＳ６１）、入力するのは一つの画像自体ではなく、連続した複数の画像である。すなわち、図７に示すように、顔画像であっても正面から撮影した画像だけではなく、登録用の画像系列として徐々に首を振った連続的な画像を入力することになる。
【００３８】
次に、入力された画像系列について、各小領域を連続した複数の画像について追跡することで、連続した複数の画像から小領域の系列として選択することになる（ステップＳ６２）。すなわち、入力された画像系列について、「目」なら「目」を表す小領域について小領域系列として選択することになる。
【００３９】
そして、選択された小領域系列に基づいて、部分空間を新たに生成する（ステップＳ６３）。具体的には、図７に示すように、連続した画像について対応する部分、例えば顔画像で有れば「目の領域」について、部分空間を同定し、それを窓部分空間と呼ぶ。
【００４０】
当該窓部分空間においては、物体の位置や姿勢等の幾何学的変動や照明の位置や照度等の変動によって生じる小領域画像の見え方の変動をカバーしている。このような窓部分空間を、「目の領域」、「鼻の領域」、・・のように個々の領域に対応して同定し、そのセットを、物体モデルとして物体モデルデータベース５４に登録する（ステップＳ６４）。
【００４１】
次に、入力された画像を実際に認識する際の処理について図８及び図９を参照しながら説明する。図８は当該画像認識処理における画像認識処理の流れ図を示している。
【００４２】
図８において、物体モデルデータベース５４の照会対象となる画像を入力する（ステップＳ８１）。次に、その画像から顔領域を切り出し（ステップＳ８２）、さらに、顔領域の中から特徴的な部分である小領域（窓）を複数選択する（ステップＳ８３）。窓の選択方法の具体例としては、特願平１１−１１００２０号において実施形態２で用いられている「エッジ強度」を用いる方法等が考えられる。そして、図９のように、各窓の画素値を要素として持つベクトル（窓ベクトル）を、物体モデルデータベース５４に登録されている各窓部分空間に射影する（ステップＳ８４）。
【００４３】
類似度判断部５５において、窓ベクトルを窓部分空間に射影したときの垂線の足の長さを計算し、その長さに基づいて小領域と窓部分空間との類似度を定義する（ステップＳ８５）。そして、最も当該小領域に近い窓部分空間を見出し（ステップＳ８６）、かかる部分空間を有する登録物体モデルを入力画像中の物体の候補とする。入力画像中のすべての窓について同様の処理を行い、最終的に物体認識部５６で結果を統合して認識を行う（ステップＳ８７）。
【００４４】
なお、本実施の形態にかかる画像認識装置におけるモデル化手法においては、光源がどこにあるのか等はモデル化時点では問わない。しかし、連続画像撮影時においては光源の位置や角度は変化させないことが必要条件となる。変化してしまうと、入力時の撮影条件の変化に対する画像変化の予測計算が困難になるからである。
【００４５】
次に、登録時の窓部分空間の同定について、より詳細に説明する。まず、物体表面上の画素に対応する小領域である面素Ｑ_iを考える。面素Ｑ_iは反射係数ａ_iを有するランベーシャン面（Lambertian）であるものと仮定する。ここでランベーシャン面とは、鏡面反射のない反射面であることを意味する。
【００４６】
一般に、登録時と同じ顔を撮影する場合であっても、面素Ｑ_iとカメラ位置の相対関係や照明条件等が登録時に撮影したときの状況と一致することはまずあり得ない。したがって、入力時の撮影条件の変化によって、対応する窓内の対応する位置の画素値も変化することになる。
【００４７】
例えば、窓を固定した座標系において、座標ベクトルｘにおける変化前の画素値をＩ（ｘ）、変化後の画素値をＩ’（ｘ）とする。照明変動がないものと仮定した上で、選択された窓において回転量、サイズ変化量等が小さい場合には、窓固定座標系において対応する点の移動量Δｘは（数１）で表される。なお、（数１）において、Ａはアフィン変換のパラメータを要素として持つ２×２行列を、ｄはアフィン変換のパラメータを要素として持つ２×１の列ベクトルを、Ｄ＝Ｉ−ＡにおいてＩは２×２の単位行列を、それぞれ示す。
【００４８】
【数１】

【００４９】
かかるΔｘが微少であるという範囲内で有ればアフィン変換で近似可能な非剛体変形も取り扱うことが可能となる。移動の前後で画素値が保存されるものと仮定して、テイラー展開を行うと、変化後の画素値をＩ’（ｘ）は変化前の画素値Ｉ（ｘ）を用いて（数２）のように近似できる。
【００５０】
【数２】

【００５１】
したがって、変化後の画素値をＩ’（ｘ）は変化前の画素値をＩ（ｘ）を用いて（数３）のように表すことができることから、右辺第２項を幾何学的変化のみに基づいた窓内各画素値の変化量ベクトルΔＩ_gとして、（数４）のように整理できる。
【００５２】
【数３】

【００５３】
【数４】

【００５４】
以上より、変化量ベクトルΔＩ_gの自由度は‘６’であり、窓画像空間における部分空間は（数５）で表すことができる以下の６つの基底ベクトルω₁、ω₂、ω₃、ω₄、ω₅、ω₆で張ることができる。
【００５５】
【数５】

【００５６】
一方、照明条件のみが変動する場合について考えると、面素Ｑ_iのレンズ方向への放射光度Ｌ_iは（数６）のように表すことができる。ここで、ベクトルｎ_iは面素Ｑ_iにおける法線ベクトルを、ベクトルｓは光線ベクトルを、それぞれ意味する。
【００５７】
【数６】

【００５８】
撮影するフォトディテクタの開口面積をｂ、ＣＣＤの光電変換特性が線形であると仮定し比例定数をｋとすると、画素値Ｉ（ｘ_i）は（数７）のように表すことができる。
【００５９】
【数７】

【００６０】
ここで、ｄはレンズの直径、ｆは焦点距離、ベクトルｕは光軸方向の単位ベクトル、ベクトルｖは面素Ｑ_iからレンズの中心に向かう単位ベクトルを意味する。
【００６１】
（数７）において、ベクトルｕ、ｂｋ、ｆ、ｄはカメラが変更されない限り一定であり、窓が十分に小さい場合にはベクトルｖは窓内のすべての画素について同一であるものと考えら、ベクトルｓも窓内すべての画素について同一であるものと考えられることから、画素値Ｉ（ｘ_i）は対応する面素の法線ベクトルｎ_iにその面素の反射係数ａ_iを乗じたベクトルａ_iｎ_i＝（ａ_iｎ_ix、ａ_iｎ_iy、ａ_iｎ_iz）^Tとベクトルｓとの内積に共通の係数を乗じたものと考えられる。
【００６２】
したがって、画素値Ｉ（ｘ_i）の自由度はベクトルａ_iｎ_iの有する自由度である‘３’であり、照明変動のみの場合の窓画像ベクトルの変動は、（数８）で表すことができる以下の３つの基底ベクトルν_x、ν_y、ν_zで張ることができる３次元の部分空間で表すことができる。
【００６３】
【数８】

【００６４】
したがって、照明条件が変化、あるいは画素Ｑ_iとカメラ位置の相対関係が変化する場合はベクトルω₁、ω₂、ω₃、ω₄、ω₅、ω₆、ν_x、ν_y、ν_zによって形成される９次元の部分空間内で変動する。したがって、画素Ｑ_iとカメラ位置の相対関係が変化する場合について十分なサンプルデータを得ることによって、ＫＬ変換を用いて９次元の窓部分空間を同定することが可能となる。
【００６５】
一例として、カメラ及び照明を固定して、面素Ｑ_iとカメラ位置の相対関係が変化する場合について説明する。まず面素Ｑ_iが形状変化せずに移動し、その結果として法線ベクトルｎが（ｎ＋Δｎ）に、レンズ中心への単位ベクトルｖが（ｖ＋Δｖ）に変化したものとする。また、面素Ｑの投影位置もベクトルｘ_tからｘに移動したものとする。
【００６６】
また、面素Ｑ_iの投影位置もベクトルｘ_i＾からｘ_iに移動したものとする。変化後の面素Ｑ_iのレンズ方向への表面放射光度Ｌ_i’は（数６）を用いて（数９）のように表すことができる。
【００６７】
【数９】

【００６８】
したがって、対応する画素の放射照度を求めることで、画素値Ｉ’（ｘ_i）は（数１０）のように表すことができる。ここでΔＩ_vをカメラとの相対位置変化に基づく窓内各画素値の変化量ベクトルとし、ΔＩ_nをカメラとの相対位置変化による照明条件変化に基づく窓内各画素値の変化量ベクトルとする。
【００６９】
【数１０】

【００７０】
ここで、先述した物体とカメラ位置の相対変化のみによる画素値の変化の関係（数４）を考慮すると、Ｉ（ｘ_i＾）＝Ｉ（ｘ）＋ΔＩ_gと考えることができるので、（数１０）は（数１１）のように表すことができる。
【００７１】
【数１１】

【００７２】
ここでΔＩ_gの自由度は‘６’であるのに対して、ΔＩ_n及びΔＩ_vの自由度は‘３’であり、かつΔＩ_n及びΔＩ_vの意味する部分空間は結局同一の部分空間を表していることから、結局変化量ベクトルΔＩ＝Ｉ’（ｘ）−Ｉ（ｘ）の変動範囲は、最大９次元の部分空間内であることがわかる。
【００７３】
この場合、サイズの変化や物体の回転に代表される幾何学的な変動について、十分なサンプルデータを取得することは現実的には困難である。しかしながら、ベクトルω₁、ω₂、ω₃、ω₄、ω₅、ω₆によって形成される幾何学的変動に対応する部分空間（以下、「ジオメトリック変動部分空間」という。）については、一枚の小領域のみから推定することが可能である。
【００７４】
そこで、サンプルデータに基づいて、まずジオメトリック変動部分空間を求め、求めたジオメトリック変動部分空間の成分を取り除いた成分の分布を求める。この分布をＫＬ変換することで、ν_x、ν_y、ν_zによって形成される測光学的変動に対応する部分空間（以下、「フォトメトリック変動部分空間」という。）を求めることができる。こうすることで、任意の部分空間をジオメトリック変動部分空間とフォトメトリック変動部分空間とを用いて表すことが可能となる。
【００７５】
また、部分空間の同定には大別して２つの方法が考えられる。一つはジオメトリック変動部分空間とフォトメトリック変動部分空間とが直交しているものと仮定する方法、今一つは十分なサンプルデータが有る場合に用いるジオメトリック変動部分空間とフォトメトリック変動部分空間とを分けずに直接同定する方法である。
【００７６】
まず、ジオメトリック変動部分空間とフォトメトリック変動部分空間とが直交しているものと仮定する方法について説明する。初めに、顔画像に関するサンプルデータの収集は、登録対象者に首を振ってもらい、顔の姿勢を変化させることで行う。
【００７７】
基準小領域は、小領域空間にプロットした一つの小領域変化系列におけるデータ点分布の平均位置、もしくは変動範囲の中心を基準とし、基準小領域ベクトルｘ_sとして保存する。かかる基準としたのは、サンプルデータの中には偽りのデータも混在し、また幾何学的変形の線形近似の限界やランベーシャン表面との仮定からの逸脱、あるいは雑音の存在等によって、本来の部分空間から逸脱しているデータも存在するからである。
【００７８】
求めた基準小領域ベクトルｘ_sから、ベクトルω₁、ω₂、ω₃、ω₄、ω₅、ω₆を（数５）に基づいて計算する。画素値の微分はソーベルフィルタ（Sobel Filter）の畳み込みによって近似的に計算するものとする。
【００７９】
このようにベクトルω₁、ω₂、ω₃、ω₄、ω₅、ω₆が求まることで、ジオメトリック変動部分空間ベクトルΩを同定することができる。ただし、これらのベクトルは必ずしも一次独立とは限らないので、行列Ｇ＝［ω₁、ω₂、ω₃、ω₄、ω₅、ω₆］^Tを特異値分解することで、部分空間ベクトルΩの正規直交基底ベクトルｕ_p（１≦ｐ≦６）を求める。ｐは行列Ｇの階数である。
【００８０】
次に、任意の窓画像ベクトルｘのジオメトリック変動部分空間Ωと直交する成分は、図１０に従って求めることができる。図１０において、ジオメトリック変動部分空間Ωの基準画像ベクトルをｘ_sとし、ベクトルｘとベクトルｘ_sとの差をジオメトリック変動部分空間Ωに直交射影したものをベクトルｘ’とする。
【００８１】
ジオメトリック変動部分空間Ωの直交射影行列Ｐは、正規直交基底ベクトルｕ_p（１≦ｐ≦６）を用いて（数１２）のように表すことができる。
【００８２】
【数１２】

【００８３】
また、図６のベクトル関係より、ｘ’＝Ｐ＊（ｘ−ｘ_s）である。ここで、記号‘＊’は行列とベクトルの乗算を意味するものとする。
【００８４】
一方、ジオメトリック変動部分空間Ωの直交補空間Ω^Tへの直交射影行列ＱはＱ＝Ｉ−Ｐ（Ｉは単位行列）と表すことができることから、任意の小領域ベクトルｘのジオメトリック変動部分空間Ωと直交する成分は、（ｘ−ｘ_s）−ｘ’＝Ｑ＊（ｘ−ｘ_s）として求めることができる。
【００８５】
こうして求まったＱ＊（ｘ−ｘ_s）の分布からＫＬ展開することによってフォトメトリック変動部分空間Ψを同定する。まず、小領域変化系列に属する全ての小領域ベクトルｘ_jからｙ_j＝Ｑ＊（ｘ_j−ｘ_s）（ｊは１≦ｊの自然数）を計算する。そして、（数１３）によって、ベクトルｙの自己相関行列Ｒを求める。
【００８６】
【数１３】

【００８７】
求まった行列Ｒの固有値・固有ベクトルを求め、降順にλ₁、λ₂、・・λ_Nとし、各固有値に対応する正規直交固有ベクトルをｖ₁、ｖ₂、・・、ｖ_Nとする。ここで、固有値を降順に所定の個数ｎまで加算した値が固有値の総和に対してしめる割合を累積寄与率と定義すると、累積寄与率が所定のしきい値を超えたときのｑ（個数）を部分空間の次元数と定める。したがって、フォトメトリック変動部分空間Ψの正規直交基底ベクトルはｖ₁、ｖ₂、・・、ｖ_qとなる。
【００８８】
このように、ジオメトリック変動部分空間Ωとフォトメトリック変動部分空間Ψが同定されるので、これらをベクトル結合することによって環境変動部分空間Γと窓部分空間Λを同定する。すなわち、（数１４）のように表すことができる。
【００８９】
（数１４）
Γ＝Ω＋Ψ
Λ＝ｘ_s＋Γ
【００９０】
よって、環境変動部分空間Γの正規直交基底ベクトルは、ジオメトリック変動部分空間Ωの正規直交基底ベクトルを並べた行列Ｕ＝［ｕ₁、ｕ₂、・・、ｕ_p］とフォトメトリック変動部分空間Ψの正規直交基底ベクトルを並べた行列Ｖ＝［ｖ₁、ｖ₂、・・、ｖ_q］になる。したがって、ベクトルｗ_i＝ｕ_i（ｉは１≦ｉ≦ｐの自然数）、ベクトルｗ_p+j＝ｖ_j（ｊは１≦ｊ≦ｑの自然数）として、環境変動部分空間Γの正規直交基底ベクトルを並べた行列Ｗ＝［ｗ₁、ｗ₂、・・、ｗ_r］（ｒ＝ｐ＋ｑ）を定めることで、環境変動部分空間Γとして部分空間を定めることが可能となる。
【００９１】
次に、十分なサンプルデータが有る場合においては、ジオメトリック変動部分空間とフォトメトリック変動部分空間とを分けずに直接部分空間を同定する方法を用いる。
【００９２】
この方法は、サンプルデータの収集や基準小領域の決定手法は、上述した方法と同様である。部分空間の同定は、ベクトル（ｘ−ｘ_s）の分布から直接ＫＬ展開することによって同定する。
【００９３】
まず、小領域変化系列に属する全ての小領域ベクトルｘ_jからｙ_j＝Ｑ＊（ｘ_j−ｘ_s）（ｊは１≦ｊ≦Ｍの自然数）を計算する。そして、ジオメトリック変動部分空間とフォトメトリック変動部分空間とが直交しているものと仮定する方法と同様に、（数１３）によってベクトルｙの自己相関行列Ｒを求める。
【００９４】
求まった行列Ｒの固有値・固有ベクトルを求め、降順にλ₁、λ₂、・・λ_Nとし、各固有値に対応する正規直交固有ベクトルをｖ₁、ｖ₂、・・、ｖ_Nとする。ここで、固有値を降順に所定の個数ｎまで加算した値が固有値の総和に対してしめる割合を累積寄与率と定義すると、累積寄与率が所定のしきい値を超えたときのｒ（個数）を部分空間の次元数と定める。したがって、環境変動部分空間Γの正規直交基底ベクトルを並べた行列Ｗ＝［ｗ₁、ｗ₂、・・、ｗ_r］として部分空間を定めることが可能となる。
【００９５】
このように、入力された画像と登録されている物体モデルとの照合は、上述した方法のいずれかを用いて物体モデルを同定することにより、入力された画像に最も近接した部分空間を同定することによって行われる。
【００９６】
以上のように本実施の形態によれば、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動や照明条件の相異による見え方の変動等に左右されることなく、登録されている物体モデルとの照合を精度良く行うことが可能となる。
【００９７】
また、本発明の実施の形態にかかる画像認識装置を実現するプログラムを記憶した記録媒体は、図１１に示す記録媒体の例に示すように、ＣＤ−ＲＯＭ１１２−１やフロッピーディスク１１２−２等の可搬型記録媒体１１２だけでなく、通信回線の先に備えられた他の記憶装置１１１や、コンピュータ１１３のハードディスクやＲＡＭ等の記録媒体１１４のいずれでも良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００９８】
また、本発明の実施の形態にかかる画像認識装置により生成された物体モデルデータ等を記録した記録媒体も、図１１に示す記録媒体の例に示すように、ＣＤ−ＲＯＭ１１２−１やフロッピーディスク１１２−２等の可搬型記録媒体１１２だけでなく、通信回線の先に備えられた他の記憶装置１１１や、コンピュータ１１３のハードディスクやＲＡＭ等の記録媒体１１４のいずれでも良く、例えば本発明にかかる画像認識装置を利用する際にコンピュータ１１３により読み取られる。
【００９９】
【発明の効果】
以上のように本発明にかかる画像認識装置によれば、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動や照明条件の相異による見え方の変動等に左右されることなく、登録されている物体モデルとの照合を精度良く行うことが可能となる。
【図面の簡単な説明】
【図１】従来の画像認識装置における物体モデル登録処理の流れ図
【図２】従来の画像認識装置における物体モデル登録処理の概念図
【図３】従来の画像認識装置における処理の流れ図
【図４】従来の画像認識装置における処理の概念図
【図５】本発明の実施の形態にかかる画像認識装置のブロック構成図
【図６】本発明の実施の形態にかかる画像認識装置における物体モデル登録処理の流れ図
【図７】本発明の実施の形態にかかる画像認識装置における物体モデル登録処理の概念図
【図８】本発明の実施の形態にかかる画像認識装置における処理の流れ図
【図９】本発明の実施の形態にかかる画像認識装置における処理の概念図
【図１０】ジオメトリック変動部分空間と直交する小領域ベクトルの求め方の説明図
【図１１】記録媒体の例示図
【符号の説明】
５１画像情報入力部
５２物体モデル化実行部
５３物体モデル登録部
５４物体モデルデータベース
５５類似度判断部
５６物体認識部
１１１回線先の記憶装置
１１２ＣＤ−ＲＯＭやフロッピーディスク等の可搬型記録媒体
１１２−１ＣＤ−ＲＯＭ
１１２−２フロッピーディスク
１１３コンピュータ
１１４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image recognition apparatus for accumulating an object model obtained by converting image information of an object in a database and recognizing the object by referring to the database at the time of image recognition.
[0002]
[Prior art]
With the development of computer networks represented by the Internet, etc., anyone can easily access various information, but on the other hand, a technology to check whether the person who is accessing is authentication technology, that is, authentication technology The importance of. This is because it is necessary to minimize the probability of rejecting the person as the person who misrepresents the person or rejecting the person as the person.
[0003]
One technique that has recently attracted attention in this technical field is an authentication technique based on facial images. This is because, like a fingerprint or a voiceprint, the face is unique to the person, and can be used as a criterion for identification judgment by the progress of image processing technology.
[0004]
Conventionally, various methods have been disclosed for using a face image as a reference for authentication judgment. For example, in Japanese Patent Application No. 11-11020, an environment parameter indicating the state of the shooting environment and a target state parameter value indicating the state of the object are estimated from the input image, and the shooting environment of the input image is calculated using these values. In addition, a technique is disclosed in which recognition is performed using a “collation image” that is corrected so that the state of the object matches the shooting environment of the registered image and the state of the object.
[0005]
The image recognition process using the disclosed environmental parameters and target state parameters will be described below with reference to FIGS. First, FIG. 1 shows the flow of processing in the registration phase of the image recognition processing in the database.
[0006]
In FIG. 1, first, an image to be registered is input (step S11). The image input here may be a single face image taken from the front, but in order to improve recognition accuracy, it is desirable to prepare face images taken from various directions in addition to the front image.
[0007]
Next, a face area is cut out from the input image (step S12), and an image of the face area is obtained (step S13). That is, as shown in FIG. 2, a face area is cut out as a rectangular area on an image to be registered.
[0008]
Then, the obtained face area image is regarded as an N-dimensional vector having each pixel as an element, and the vector is projected onto an n-dimensional (n ≦ N) partial space (step S14), and the projection point is represented as P. In FIG. 2, it is projected onto one point of “sashida”.
[0009]
Further, the environmental parameter value e indicating the state of the shooting environment and the target state parameter value s indicating the state of the object are estimated, and the value and the projection point P are paired and registered in the database (step S15). Here, a general method for estimating the environmental parameter value e indicating the state of the photographing environment and the target state parameter value s indicating the state of the object from the image is not disclosed.
[0010]
Next, FIG. 3 shows the flow of processing in the recognition phase in the image recognition processing. In FIG. 3, the process from the input of the image to the clipping of the face area image (steps S31 to S33) is the same as in the registration phase (steps S11 to S13) shown in FIG.
[0011]
Therefore, the projection onto the partial space is projected onto one point of “sashida” as shown in FIG.
[0012]
On the other hand, an environmental parameter value e indicating the state of the shooting environment and an object state parameter value s indicating the state of the object are estimated from the input image. Next, the parameter value estimated from the input image is adjusted so that the environmental parameter value e of the registered image registered in advance matches the target state parameter value s. By this adjustment, a collation image is generated so that the shooting environment of the input image and the state of the object match the shooting environment of the registered image and the state of the object. This verification image is projected onto the partial space to obtain a point Q (step S34).
[0013]
By doing so, the registered image and the comparison image are compared under the same conditions with respect to the shooting environment such as illumination and the state of the object such as the position and posture. However, there is disclosed a general method for adjusting the parameter value to generate a matching image so that the shooting environment of the input image and the state of the object match the shooting environment of the registered image and the state of the object. Not.
[0014]
Next, the distance between the registered P and Q subspaces is calculated (step S35). For all the registered images, the spatial distance is calculated in the same manner, and the nearest point P is calculated. _m Is searched (step S36).
[0015]
Finally, the most recent contact P _m Is recognized as corresponding to the input image (step S37).
[0016]
[Problems to be solved by the invention]
However, in the method as described above, (1) estimating an environment parameter value indicating the state of the shooting environment and an object state parameter value indicating the state of the object from the image, and (2) adjusting the parameter value and inputting Despite the point of generating a matching image that matches the shooting environment of the image and the state of the object to the shooting environment of the registered image and the state of the object, these processes are realized. The general method is not known.
[0017]
In Japanese Patent Application No. 11-11020, the illumination parameter among the environmental parameters is estimated from the average value, variance, and histogram of the luminance values of the face area image, and the camera used for photographing as the camera parameter among the environmental parameters. It has been proposed to use the resolution, focus, and exposure. It has also been proposed to estimate the target state parameter using the skin color occupation area in the face region image.
[0018]
However, (1) it is generally difficult to correctly estimate such parameter values. Furthermore, (2) it is difficult to model how the image changes when these parameters change from one or a small number of images. Therefore, it is considered difficult to actually apply the above method to the recognition process.
[0019]
Therefore, since the face image taken from the front is used at the time of image registration, if the orientation of the face is different or the lighting conditions are different at the time of input, the person is assumed to be a misrepresenter or a misrepresentation. There is a problem that a person may be mistakenly recognized as the person.
[0020]
In order to solve the above problems, the present invention provides an image recognition apparatus and method that can accurately perform matching with a registered image without depending on the photographing condition of the input image at the time of image recognition. Objective.
[0021]
[Means for Solving the Problems]
In order to achieve the above object, an image recognition apparatus according to the present invention is obtained by an object modeling execution unit that estimates and models a change in appearance of an object due to a change in a shooting environment, and an object modeling execution unit. An object model registration unit that registers an object model in a database in advance, an image information input unit that inputs image information of an object to be recognized, and an object model registered in advance in the object model registration unit Is a recognition target that is determined to be the most similar among the assigned object models, and a similarity determination unit that assigns a similarity to a registered object model A plurality of images taken by changing the relative position and orientation of the object with respect to the fixed image information input unit in the object modeling execution unit. Inputting image information, and wherein the modeling to estimate the changes in the appearance of the object by Possible shooting environment future changes based on the plurality of the input image information.
[0022]
With this configuration, the registered object model is not affected by changes in appearance due to differences in the posture of the object at the time of object model registration and input image recognition, or changes in appearance due to differences in lighting conditions, etc. It is possible to accurately perform the collation with.
[0023]
In the image recognition apparatus according to the present invention, it is preferable to assume a Lambertian reflection model as the surface characteristics of the object to be recognized. This is because it is easy to predict changes in appearance due to illumination fluctuations.
[0024]
In the image recognition apparatus according to the present invention, in the image information input unit, a part where an object to be recognized exists is cut out from the image, and the object to be recognized is modeled using the cut out partial image. Is preferred. This is because misrecognition due to excess image information can be prevented.
[0025]
In the image recognition apparatus according to the present invention, the image information input unit selects a characteristic small region in the object to be recognized from the image, and includes information included in the selected small region and arrangement information of the small region. It is preferable to model an object to be recognized based on the above. This is because the case where the characteristic portion is partially hidden by the image can be dealt with.
[0026]
Also, when the sample data is small, the image recognition apparatus according to the present invention, in the object modeling execution unit, based on the input image information, changes in the appearance due to the change in the posture of the object and the appearance due to the change in the illumination condition It is preferable to perform modeling by separating the fluctuations of This is because even when the sample data is small, it is possible to accurately estimate the change in appearance.
[0027]
In addition, when there is sufficient sample data, the image recognition apparatus according to the present invention is based on the change in the appearance due to the change in the posture of the object and the change in the illumination condition based on the input image information in the object modeling execution unit. It is preferable to perform modeling without separating the change in appearance. This is because when there is sufficient sample data, it is not necessary to model approximately by separating and modeling, and the change in appearance can be obtained directly.
[0028]
Further, the present invention is characterized by software that executes the functions of the image recognition apparatus as described above as a processing step of a computer. Specifically, the change in appearance of an object due to a change in a shooting environment is estimated. And the step of registering the obtained object model in the database in advance, the step of inputting image information of the object to be recognized, and the object model registered in advance And the process of assigning the similarity to the registered object model, and the type of the object to be recognized that is determined to be the most similar among the assigned object models A plurality of pieces of image information taken by changing the relative position and orientation of the object, and a change in the photographing environment that may occur in the future based on the inputted pieces of image information. Characterized in that an image recognition method and a computer-readable recording medium recorded with such a step as a program for modeling and estimating the change in the appearance of the object by.
[0029]
With this configuration, by loading and executing the program on a computer, changes in appearance due to differences in the posture of the object at the time of object model registration and input image recognition, changes in appearance due to differences in lighting conditions, etc. Therefore, it is possible to realize an image recognition apparatus that can accurately collate with a registered object model without being influenced by the above.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
Hereinafter, an image recognition apparatus according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 5 is a configuration diagram of the image recognition apparatus according to the first embodiment of the present invention. In FIG. 5, 51 is an image information input unit, 52 is an object modeling execution unit, 53 is an object model registration unit, 54 is an object model database, 55 is a similarity determination unit, and 56 is an object recognition unit. , Respectively.
[0031]
In FIG. 5, an image information input unit 51 is a camera that captures an image to be recognized, a scanner that reads a photograph taken by the camera, and the captured image is compressed and stored in a magnetic recording medium. A portion prepared for inputting image information, such as a file reading device, is shown. Based on the image information input from the image information input unit 51, the object modeling execution unit 52 models the object to be recognized.
[0032]
Various methods for modeling image information in the object modeling execution unit 52 are conceivable. For example, Japanese Patent Application No. 11-11020 discloses a method for uniquely representing an object model using feature parameters as described above.
[0033]
However, this modeling method has the following problems. First, since only one image is input for modeling, the same object is photographed at the same camera position due to differences in light source position, illuminance, etc. However, there is a possibility that a possibility of being erroneously recognized as a different object remains.
[0034]
Further, even if the position of the light source, the illuminance, and the like are the same, if the positions of the camera and the object are different, there is a high possibility that the object is recognized as a different object. That is, if the angle of the camera and the distance to the camera are different, the size and angle of the captured image change greatly, and the position in the eigenspace greatly fluctuates so that it is mistakenly recognized as a different object. This is because it is expected that there is a high possibility that the error will occur.
[0035]
In order to solve such a problem, in the present embodiment, at the time of registration, the posture of the object is continuously changed with respect to the fixed image information input unit, and the environmental fluctuation at the time of input is based on the continuous image. That is, it predicts how the image will change due to differences in lighting conditions and the state of the target object (relative posture and relative distance from the camera), and the object model database based on the prediction is used as a subspace. The point is registered in No. 54.
[0036]
Hereinafter, a modeling method in the image recognition apparatus according to the present embodiment will be described with reference to FIGS. 6 and 7. First, FIG. 6 shows the flow of registration phase modeling processing in the image recognition apparatus according to the present embodiment.
[0037]
As shown in FIG. 6, first, an image is input (step S61), but what is input is not a single image itself but a plurality of continuous images. That is, as shown in FIG. 7, even if it is a face image, not only an image photographed from the front but also a continuous image in which the head is gradually shaken is input as an image sequence for registration.
[0038]
Next, with respect to the input image series, each small area is traced with respect to a plurality of continuous images, thereby selecting a series of small areas from a plurality of continuous images (step S62). That is, for the input image series, if it is “eyes”, the small area representing “eyes” is selected as the small area series.
[0039]
Then, a partial space is newly generated based on the selected small region series (step S63). Specifically, as shown in FIG. 7, a partial space is identified for a portion corresponding to a continuous image, for example, an “eye region” if it is a face image, and this is called a window partial space.
[0040]
The window subspace covers the change in the appearance of the small area image caused by the geometrical variation such as the position and orientation of the object and the variation of the illumination position and illuminance. Such window subspaces are identified corresponding to individual regions such as “eye region”, “nose region”,..., And the set is registered in the object model database 54 as an object model ( Step S64).
[0041]
Next, processing for actually recognizing an input image will be described with reference to FIGS. FIG. 8 shows a flowchart of the image recognition process in the image recognition process.
[0042]
In FIG. 8, an image to be referred to in the object model database 54 is input (step S81). Next, a face area is cut out from the image (step S82), and a plurality of small areas (windows) that are characteristic parts are selected from the face area (step S83). As a specific example of the window selection method, a method using the “edge strength” used in the second embodiment in Japanese Patent Application No. 11-11020 can be considered. Then, as shown in FIG. 9, a vector (window vector) having the pixel value of each window as an element is projected onto each window subspace registered in the object model database 54 (step S84).
[0043]
The similarity determination unit 55 calculates the length of the leg of the perpendicular when the window vector is projected onto the window subspace, and defines the similarity between the small region and the window subspace based on the length (step S85). ). Then, a window partial space closest to the small area is found (step S86), and a registered object model having such a partial space is set as a candidate object in the input image. Similar processing is performed for all windows in the input image, and finally the object recognition unit 56 recognizes the result by integrating the results (step S87).
[0044]
In the modeling method in the image recognition apparatus according to the present embodiment, where the light source is located does not matter at the time of modeling. However, it is a necessary condition that the position and angle of the light source are not changed during continuous image shooting. This is because if it changes, predictive calculation of image change with respect to changes in shooting conditions at the time of input becomes difficult.
[0045]
Next, the identification of the window subspace at the time of registration will be described in more detail. First, a surface element Q that is a small region corresponding to a pixel on the object surface. _i think of. Area element Q _i Is the reflection coefficient a _i Is assumed to be a Lambertian surface. Here, the Lambertian surface means a reflecting surface having no specular reflection.
[0046]
In general, even when shooting the same face as when registering, the surface element Q _i It is unlikely that the relative relationship between the camera position and the lighting conditions, etc. will coincide with the situation at the time of shooting at the time of registration. Therefore, the pixel value at the corresponding position in the corresponding window also changes due to the change in the photographing condition at the time of input.
[0047]
For example, in the coordinate system in which the window is fixed, the pixel value before the change in the coordinate vector x is I (x), and the pixel value after the change is I ′ (x). Assuming that there is no illumination variation, when the rotation amount, size change amount, etc. are small in the selected window, the movement amount Δx of the corresponding point in the window fixed coordinate system is expressed by (Equation 1). . In (Equation 1), A is a 2 × 2 matrix having affine transformation parameters as elements, d is a 2 × 1 column vector having affine transformation parameters as elements, and in D = IA, I is I Each 2 × 2 unit matrix is shown.
[0048]
[Expression 1]

[0049]
If such Δx is in a very small range, non-rigid deformation that can be approximated by affine transformation can be handled. Assuming that the pixel values are preserved before and after the movement, when Taylor expansion is performed, the pixel value after change I ′ (x) uses the pixel value I (x) before change (Equation 2) It can be approximated as follows.
[0050]
[Expression 2]

[0051]
Accordingly, since the pixel value after change I ′ (x) can be expressed as (Equation 3) using the pixel value before change as I (x), the second term on the right side can be expressed only as a geometric change. Variation vector ΔI of each pixel value in the window based on _g As shown in (Equation 4).
[0052]
[Equation 3]

[0053]
[Expression 4]

[0054]
From the above, the variation vector ΔI _g Is 6 and the subspace in the window image space can be expressed by the following six basis vectors ω ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ Can be stretched with.
[0055]
[Equation 5]

[0056]
On the other hand, considering the case where only the lighting conditions change, the surface element Q _i Luminous intensity L toward the lens _i Can be expressed as (Equation 6). Where the vector n _i Is surface element Q _i Means a normal vector and vector s means a ray vector.
[0057]
[Formula 6]

[0058]
Assuming that the aperture area of the photo detector to be photographed is b, the photoelectric conversion characteristic of the CCD is linear, and the proportionality constant is k, the pixel value I (x _i ) Can be expressed as (Equation 7).
[0059]
[Expression 7]

[0060]
Here, d is a lens diameter, f is a focal length, a vector u is a unit vector in the optical axis direction, and a vector v is a surface element Q _i This means a unit vector from to the center of the lens.
[0061]
In (Equation 7), the vectors u, bk, f, d are constant unless the camera is changed, and if the window is sufficiently small, the vector v is considered to be the same for all pixels in the window, Since the vector s is also considered to be the same for all pixels in the window, the pixel value I (x _i ) Is the normal vector n of the corresponding surface element _i The reflection coefficient a of the surface element _i Vector a multiplied by _i n _i = (A _i n _ix , A _i n _iy , A _i n _iz ) ^T And the vector s are multiplied by a common coefficient.
[0062]
Therefore, the pixel value I (x _i ) Is the vector a _i n _i The variation of the window image vector in the case of only the illumination variation is the following three basis vectors ν that can be expressed by (Equation 8). _x , Ν _y , Ν _z It can be represented by a three-dimensional subspace that can be stretched with.
[0063]
[Equation 8]

[0064]
Therefore, the illumination condition changes or the pixel Q _i Vector ω when the relative relationship between camera position and camera position changes ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ , Ν _x , Ν _y , Ν _z Fluctuates within a 9-dimensional subspace formed by. Therefore, pixel Q _i By obtaining sufficient sample data for the case where the relative relationship between the camera position and the camera position changes, it becomes possible to identify a 9-dimensional window subspace using KL transformation.
[0065]
As an example, with the camera and lighting fixed, the surface element Q _i A case where the relative relationship between the camera position and the camera position changes will be described. First, surface element Q _i Move without changing the shape, and as a result, the normal vector n changes to (n + Δn) and the unit vector v to the lens center changes to (v + Δv). The projection position of the surface element Q is also a vector x _t And moved from x to x.
[0066]
In addition, surface element Q _i Is also the vector x _i ^ To x _i Suppose you have moved to. Changed surface element Q _i Surface radiation intensity L toward the lens _i 'Can be expressed as (Equation 9) using (Equation 6).
[0067]
[Equation 9]

[0068]
Therefore, by obtaining the irradiance of the corresponding pixel, the pixel value I ′ (x _i ) Can be expressed as (Equation 10). Where ΔI _v Is a change amount vector of each pixel value in the window based on a relative position change with the camera, and ΔI _n Is a change amount vector of each pixel value in the window based on a change in illumination condition due to a change in relative position with the camera.
[0069]
[Expression 10]

[0070]
Here, considering the relationship (Formula 4) of the change in the pixel value due only to the relative change between the object and the camera position, I (x _i ^) = I (x) + ΔI _g (Equation 10) can be expressed as (Equation 11).
[0071]
## EQU11 ##

[0072]
Where ΔI _g Has a degree of freedom of '6' whereas ΔI _n And ΔI _v Has a degree of freedom of '3' and ΔI _n And ΔI _v After all, the subspaces mean the same subspace, so that the variation range of the variation vector ΔI = I ′ (x) −I (x) may be within the maximum 9-dimensional subspace. Recognize.
[0073]
In this case, it is practically difficult to acquire sufficient sample data for geometrical changes represented by size changes and object rotation. However, the vector ω ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ The subspace corresponding to the geometric variation formed by (hereinafter referred to as “geometric variation subspace”) can be estimated from only one small region.
[0074]
Therefore, based on the sample data, first, a geometric fluctuation subspace is obtained, and a distribution of components obtained by removing the components of the obtained geometric fluctuation subspace is obtained. By KL transforming this distribution, ν _x , Ν _y , Ν _z The subspace corresponding to the photometric variation formed by the above (hereinafter referred to as “photometric variation subspace”) can be obtained. By doing so, it is possible to represent an arbitrary subspace using a geometric variation subspace and a photometric variation subspace.
[0075]
In addition, there are roughly two methods for identifying subspaces. One is a method that assumes that the geometric variation subspace and the photometric variation subspace are orthogonal, and the other is the geometric variation subspace and the photometric variation subspace that are used when there is sufficient sample data. It is a method of direct identification without dividing.
[0076]
First, a method for assuming that the geometric variation subspace and the photometric variation subspace are orthogonal to each other will be described. First, collection of sample data relating to a face image is performed by having the registration subject shake his / her head and changing the posture of the face.
[0077]
The reference small region is based on the average position of the data point distribution in one small region change series plotted in the small region space or the center of the fluctuation range, and the reference small region vector x _s Save as. This standard is based on the fact that the sample data includes false data, the original part due to the limitations of the linear approximation of geometric deformation, the deviation from the assumption of the lambastian surface, or the presence of noise. This is because some data deviates from the space.
[0078]
The obtained reference small region vector x _s From the vector ω ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ Is calculated based on (Equation 5). The differentiation of the pixel value is approximately calculated by convolution of a Sobel filter.
[0079]
Thus the vector ω ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ Is obtained, the geometrical variation subspace vector Ω can be identified. However, since these vectors are not necessarily linearly independent, the matrix G = [ω ₁ , Ω ₂ , Ω _Three , Ω _Four , Ω _Five , Ω ₆ ] ^T By singular value decomposition, the orthonormal basis vector u of the subspace vector Ω _p (1 ≦ p ≦ 6) is obtained. p is the rank of the matrix G.
[0080]
Next, a component orthogonal to the geometric variation subspace Ω of an arbitrary window image vector x can be obtained according to FIG. In FIG. 10, the reference image vector of the geometric variation subspace Ω is represented by x. _s And vector x and vector x _s A vector x ′ is obtained by orthogonally projecting the difference between and the geometrical variation subspace Ω.
[0081]
The orthogonal projection matrix P of the geometric variation subspace Ω is an orthonormal basis vector u _p Using (1 ≦ p ≦ 6), it can be expressed as (Equation 12).
[0082]
[Expression 12]

[0083]
From the vector relationship of FIG. 6, x ′ = P * (xx _s ). Here, the symbol “*” means multiplication of a matrix and a vector.
[0084]
On the other hand, the orthogonal complement space Ω of the geometrical variation subspace Ω ^T Since the orthogonal projection matrix Q can be expressed as Q = I−P (I is a unit matrix), the component orthogonal to the geometrical variation subspace Ω of any small region vector x is (xx _s ) −x ′ = Q * (xx _s ).
[0085]
Q * (xx) _s ) To identify the photometric variation subspace Ψ. First, all the small region vectors x belonging to the small region change series _j To y _j = Q * (x _j -X _s ) (J is a natural number of 1 ≦ j). Then, the autocorrelation matrix R of the vector y is obtained by (Equation 13).
[0086]
[Formula 13]

[0087]
Eigenvalues / eigenvectors of the obtained matrix R are obtained, and in descending order, λ ₁ , Λ ₂ ..Λ _N And the orthonormal eigenvector corresponding to each eigenvalue is v ₁ , V ₂ , ..., v _N And Here, if the ratio that the value obtained by adding eigenvalues up to a predetermined number n in descending order to the total of eigenvalues is defined as the cumulative contribution ratio, q (number) when the cumulative contribution ratio exceeds a predetermined threshold value Is defined as the number of dimensions of the subspace. Therefore, the orthonormal basis vector of the photometric variation subspace Ψ is v ₁ , V ₂ , ..., v _q It becomes.
[0088]
In this way, the geometric variation subspace Ω and the photometric variation subspace Ψ are identified, and the environment variation subspace Γ and the window subspace Λ are identified by vector combination thereof. That is, it can be expressed as (Equation 14).
[0089]
(Equation 14)
Γ = Ω + Ψ
Λ = x _s + Γ
[0090]
Therefore, the orthonormal basis vector of the environment variation subspace Γ is a matrix U = [u that is an array of orthonormal basis vectors of the geometric variation subspace Ω. ₁ , U ₂ , ..., u _p ] And a matrix V = [v] in which orthonormal basis vectors of the photometric variation subspace Ψ are arranged. ₁ , V ₂ , ..., v _q ]become. Therefore, the vector w _i = U _i (I is a natural number of 1 ≦ i ≦ p), vector w _{p + j} = V _j (Where j is a natural number of 1 ≦ j ≦ q), a matrix W = [w ₁ , W ₂ , ..., w _r ] (R = p + q) is determined, it is possible to determine a subspace as the environment-variable subspace Γ.
[0091]
Next, when there is sufficient sample data, a method of directly identifying the subspace without using the geometrically varying subspace and the photometrically varying subspace is used.
[0092]
In this method, sample data collection and reference small area determination methods are the same as those described above. The identification of the subspace is the vector (xx _s ) From the distribution of) by direct KL expansion.
[0093]
First, all the small region vectors x belonging to the small region change series _j To y _j = Q * (x _j -X _s ) (J is a natural number of 1 ≦ j ≦ M). Then, the autocorrelation matrix R of the vector y is obtained by (Equation 13) in the same manner as the method assuming that the geometric variation subspace and the photometric variation subspace are orthogonal.
[0094]
Eigenvalues / eigenvectors of the obtained matrix R are obtained, and in descending order, λ ₁ , Λ ₂ ..Λ _N And the orthonormal eigenvector corresponding to each eigenvalue is v ₁ , V ₂ , ..., v _N And Here, if the ratio that the value obtained by adding eigenvalues up to a predetermined number n in descending order to the total of eigenvalues is defined as the cumulative contribution ratio, r (number) when the cumulative contribution ratio exceeds a predetermined threshold value Is defined as the number of dimensions of the subspace. Therefore, a matrix W = [w where the orthonormal basis vectors of the environmental variation subspace Γ are arranged. ₁ , W ₂ , ..., w _r ], The subspace can be determined.
[0095]
Thus, the collation between the input image and the registered object model identifies the subspace closest to the input image by identifying the object model using any of the methods described above. Is done by.
[0096]
As described above, according to the present embodiment, it depends on the change in the appearance due to the difference in the posture of the object during the registration of the object model and the recognition of the input image, the change in the appearance due to the difference in illumination conditions, and the like. In addition, it is possible to accurately collate with the registered object model.
[0097]
The recording medium storing the program for realizing the image recognition apparatus according to the embodiment of the present invention is a CD-ROM 112-1, a floppy disk 112-2, or the like as shown in the example of the recording medium shown in FIG. Not only the portable recording medium 112 but also any other storage device 111 provided at the end of the communication line, or a recording medium 114 such as a hard disk or RAM of the computer 113, the program is loaded when the program is executed. Runs on memory.
[0098]
Further, a recording medium on which object model data generated by the image recognition apparatus according to the embodiment of the present invention is recorded is also a CD-ROM 112-1 or floppy disk 112 as shown in the example of the recording medium shown in FIG. 2 may be any of the other storage device 111 provided at the end of the communication line, the recording medium 114 such as a hard disk or a RAM of the computer 113, for example, the image according to the present invention. It is read by the computer 113 when using the recognition device.
[0099]
【The invention's effect】
As described above, according to the image recognition apparatus of the present invention, the appearance change due to the difference in the posture of the object during registration of the object model and the input image recognition, the change in the appearance due to the difference in illumination conditions, etc. Therefore, it is possible to accurately collate with the registered object model.
[Brief description of the drawings]
FIG. 1 is a flowchart of object model registration processing in a conventional image recognition apparatus.
FIG. 2 is a conceptual diagram of object model registration processing in a conventional image recognition apparatus.
FIG. 3 is a flowchart of processing in a conventional image recognition apparatus.
FIG. 4 is a conceptual diagram of processing in a conventional image recognition apparatus.
FIG. 5 is a block diagram of an image recognition apparatus according to an embodiment of the present invention.
FIG. 6 is a flowchart of object model registration processing in the image recognition apparatus according to the embodiment of the invention.
FIG. 7 is a conceptual diagram of object model registration processing in the image recognition apparatus according to the embodiment of the invention.
FIG. 8 is a flowchart of processing in the image recognition apparatus according to the embodiment of the invention.
FIG. 9 is a conceptual diagram of processing in the image recognition apparatus according to the embodiment of the invention.
FIG. 10 is an explanatory diagram of how to obtain a small region vector orthogonal to a geometrical variation subspace.
FIG. 11 is an exemplary diagram of a recording medium.
[Explanation of symbols]
51 Image information input section
52 Object Modeling Execution Unit
53 Object Model Registration Unit
54 Object Model Database
55 Similarity judgment section
56 Object recognition unit
111 Line destination storage device
112 Portable recording media such as CD-ROM and floppy disk
112-1 CD-ROM
112-2 Floppy disk
113 computer
114 Recording medium such as RAM / hard disk on computer

Claims

撮影環境の変動による物体の見え方の変化を推定してモデル化する物体モデル化実行部と、
前記物体モデル化実行部において得られた物体モデルを事前にデータベースへ登録しておく物体モデル登録部とを有し、
認識対象となる物体の画像情報を入力する画像情報入力部と、
前記物体モデル登録部において事前に登録されている前記物体モデルと入力された前記画像情報を照合して、登録されている前記物体モデルとの類似度を割り当てる類似度判断部と、
割り当てられた前記物体モデルの中で最も類似していると判断された前記認識対象となる物体の種別を出力する物体認識部とを含み、
前記物体モデル化実行部において、物体のカメラに対する相対的な位置及び姿勢を変化させて連続して撮影した複数の画像を入力し、前記複数の画像それぞれに連続して含まれる複数の小領域を選択して追跡することにより、それぞれの小領域を、小領域の系列として選択し、該選択された小領域の系列に基づいて、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動および照明条件の相異による前記物体の見え方の変動をカバーする部分空間を同定し、該同定したそれぞれの小領域の部分空間を１セットにして物体モデルとして生成することを特徴とする画像認識装置。An object modeling execution unit that estimates and models changes in the appearance of objects due to changes in the shooting environment;
An object model registration unit that registers the object model obtained in the object modeling execution unit in a database in advance,
An image information input unit for inputting image information of an object to be recognized;
A similarity determination unit that collates the object model registered in advance in the object model registration unit with the input image information and assigns a similarity to the registered object model;
An object recognition unit that outputs a type of the object to be recognized that is determined to be the most similar among the assigned object models,
In the object modeling execution unit, a plurality of images continuously captured by changing a relative position and orientation of the object with respect to the camera are input, and a plurality of small regions continuously included in each of the plurality of images are input. By selecting and tracking, each small area is selected as a series of small areas, and based on the selected series of small areas, the difference in the posture of the object at the time of object model registration and input image recognition Identifying a subspace that covers the change in the appearance due to the difference in the appearance and the change in the appearance of the object due to the difference in the illumination conditions , and generating a set of the subspaces of each of the identified small regions as an object model. A featured image recognition apparatus.

前記認識対象となる物体の表面特性としてランバーシャン反射モデルを仮定して、前記部分空間を同定する請求項１記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the partial space is identified assuming a Lambertian reflection model as a surface characteristic of the object to be recognized.

前記画像情報入力部において、画像から前記認識対象となる物体が存在する部分を切り出し、切り出された部分画像を用いて前記認識対象となる物体のモデル化を行う請求項１記載の画像認識装置。 The image recognition apparatus according to claim 1, wherein the image information input unit cuts out a portion where the object to be recognized exists from an image, and models the object to be recognized using the cut-out partial image.

前記画像情報入力部において、画像から前記認識対象となる物体中の特徴的な小領域を選択し、選択された前記小領域に含まれる情報と前記小領域の配置情報に基づいて前記認識対象となる物体のモデル化を行う請求項１記載の画像認識装置。 In the image information input unit, a characteristic small region in the object to be recognized is selected from an image, and the recognition target is determined based on information included in the selected small region and arrangement information of the small region. The image recognition apparatus according to claim 1, wherein modeling of an object is performed.

入力された前記画像情報に基づいて、前記物体の姿勢変化による見え方の変動と照明条件の変化による見え方の変動とを分離してモデル化を行う請求項１から４のいずれか一項に記載の画像認識装置。 5. The modeling is performed by separating a change in appearance due to a change in posture of the object and a change in appearance due to a change in illumination conditions based on the input image information. The image recognition apparatus described.

前記物体モデル化実行部において、前記小領域の系列に属する小領域ベクトルから、ＫＬ展開することによって、前記物体の姿勢変化及び照明条件の変化による画素の見え方の変動をカバーする部分空間を同定する、請求項１から４のいずれか一項に記載の画像認識装置。 The object modeling execution unit identifies a partial space that covers pixel appearance variations due to changes in the posture of the object and changes in illumination conditions by performing KL expansion from the small region vectors belonging to the small region series. The image recognition apparatus according to any one of claims 1 to 4.

撮影環境の変動による物体の見え方の変化を推定してモデル化する工程と、
得られた物体モデルを事前にデータベースへ登録しておく工程とを有し、
認識対象となる物体の画像情報を入力する工程と、
事前に登録されている前記物体モデルと入力された前記画像情報を照合して、登録されている前記物体モデルとの類似度を割り当てる工程と、
割り当てられた前記物体モデルの中で最も類似していると判断された前記認識対象となる物体の種別を出力する工程とを含み、
前記モデル化の工程において、前記物体のカメラに対する相対的な位置及び姿勢を変化させて連続して撮影した複数の画像を入力し、前記複数の画像それぞれに連続して含まれる複数の小領域を選択して追跡することにより、それぞれの小領域を、小領域の系列として選択し、該選択された小領域の系列に基づいて、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動および照明条件の相異による前記物体の見え方の変動をカバーする部分空間を同定し、該同定したそれぞれの小領域の部分空間を１セットにして物体モデルとして生成することを特徴とする画像認識方法。Estimating and modeling changes in the appearance of objects due to changes in the shooting environment;
A step of registering the obtained object model in the database in advance,
Inputting image information of an object to be recognized;
Collating the object model registered in advance with the input image information and assigning a similarity to the registered object model;
Outputting the type of the object to be recognized that is determined to be the most similar among the assigned object models,
In the modeling step, a plurality of images continuously captured by changing a relative position and orientation of the object with respect to the camera are input, and a plurality of small regions continuously included in each of the plurality of images are input. By selecting and tracking, each small area is selected as a series of small areas, and based on the selected series of small areas, the difference in the posture of the object at the time of object model registration and input image recognition A subspace that covers the change in the appearance due to the difference in the appearance and the change in the appearance of the object due to the difference in the illumination conditions , and generates a set of the subspaces of the identified small regions as an object model. A featured image recognition method.

撮影環境の変動による物体の見え方の変化を推定してモデル化する処理と、
得られた物体モデルを事前にデータベースへ登録しておく処理とを有し、
認識対象となる物体の画像情報を入力する処理と、
事前に登録されている前記物体モデルと入力された前記画像情報を照合して、登録されている前記物体モデルとの類似度を割り当てる処理と、
割り当てられた前記物体モデルの中で最も類似していると判断された前記認識対象となる物体の種別を出力する処理とをコンピュータに実行させ、
前記モデル化する処理は、前記物体のカメラに対する相対的な位置及び姿勢を変化させて撮影した複数の画像を入力し、前記複数の画像それぞれに連続して含まれる複数の小領域を選択して追跡することにより、それぞれの小領域を、小領域の系列として選択し、該選択された小領域の系列に基づいて、物体モデル登録時と入力画像認識時における物体の姿勢の相異による見え方の変動および照明条件の相異による前記物体の見え方の変動をカバーする部分空間を同定し、該同定したそれぞれの小領域の部分空間を１セットにして物体モデルとして生成する処理である、プログラムを記録したコンピュータ読み取り可能な記録媒体。Processing to estimate and model changes in the appearance of objects due to changes in the shooting environment,
Processing to register the obtained object model in the database in advance,
A process of inputting image information of an object to be recognized;
A process of matching the object model registered in advance with the input image information and assigning a similarity to the registered object model;
Causing the computer to execute a process of outputting the type of the object to be recognized that is determined to be the most similar among the assigned object models,
The modeling process inputs a plurality of images taken by changing the relative position and orientation of the object with respect to the camera, and selects a plurality of small regions continuously included in each of the plurality of images. By tracking, each small area is selected as a series of small areas, and based on the selected series of small areas, the appearance due to the difference in the posture of the object at the time of object model registration and input image recognition Is a process of identifying a subspace that covers the change in the appearance of the object due to a variation in illumination and a difference in illumination conditions, and generating a partial space of each identified small region as a set as an object model A computer-readable recording medium on which is recorded.