JP3994819B2

JP3994819B2 - Image identification device, image identification method, and image identification program

Info

Publication number: JP3994819B2
Application number: JP2002222723A
Authority: JP
Inventors: 典司加藤; 洋次鹿志村; 仁池田
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-07-31
Filing date: 2002-07-31
Publication date: 2007-10-24
Anticipated expiration: 2022-07-31
Also published as: JP2004062721A

Description

【０００１】
【発明の属する技術分野】
本発明は、画像識別装置、特に画像データからのパターンの識別を効率良く行う画像識別装置に関する。
【０００２】
【従来の技術】
計算機を用いて、画像データから、特定のパターンを識別するためには、パターンそのものの形状の検出を確実に行う必要と、含まれているパターンのサイズや回転角度等が様々であることに対応する必要とがある。
【０００３】
前者に対する従来例としては、線形部分空間法、サポートベクトルマシン（ＳＶＭ）法、カーネル非線形部分空間法などがある。線形部分空間法では、複数のカテゴリ毎に部分空間を定め、未知のパターンがどの部分空間に最も関連しているかを評価し、そのパターンの属するカテゴリを判定している。しかし、この方法においては、カテゴリが多く、パターンの次元が低い場合には、検出精度が低下してしまう。また、非線形性をもつパターン分布に対する識別精度も低いという問題がある。
【０００４】
ＳＶＭ法は、カーネル関数を媒介に定義した非線形変換により、低次元のパターンを高次元に写像することで、非線形性をもつパターン分布の識別を可能とする方法である。しかし、２つのカテゴリの分類しか行うことができない点や、必要な計算量が多い点に問題を抱える。
【０００５】
カーネル非線形部分空間法は、これらの問題を解決するパターン識別方法として考案され、特開２００２−９０２７４公報に開示されている。この方法は、ＳＶＭ法と同様に、カーネル関数を用いて定義した非線形変換によりパターンを高次元に写像し、この高次元非線形空間上で部分空間法を実施している。
【０００６】
後者について、すなわち、様々なサイズや回転角度をもったパターンを検出するためには、従来は、非常に多くの学習サンプルを用いることで対応してきた。すなわち、上で述べた各パターン識別法などは、一般に特徴的なパターンをもつ学習サンプルを用いて、その特徴を示すカテゴリの分布を定めていく学習をおこなう。そこで、この学習サンプルとして、サイズや角度が様々に変えられたパターンを用いるだけでなく、サイズと角度を組み合わせた変形がなされた非常に多くのパターンについても用いる必要があった。
【０００７】
【発明が解決しようとする課題】
しかしながら、前記カーネル非線形部分空間法では、部分空間を張る基底ベクトルが、全学習サンプルの非線形空間への写像に基づいて定義されるため、学習サンプルが多くなると、依然として多くの計算が必要となる問題があった。本発明の課題は、画像中の人間や動物の顔パターンを、高速かつ高精度に識別する手段を確立する点にある。
【０００８】
【課題を解決するための手段】
本発明の画像識別装置の一態様においては、非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化手段と、正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化手段と、前記解像度毎に設けられた顔識別手段であって、いずれかの解像度の粗視データを、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリと比較し、所定の位置関係にある場合には、前記粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する顔識別手段と、解像度の一番低い顔識別手段を起点として、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い顔識別手段が処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別手段と、階層的顔パターン識別手段の処理の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出手段と、を備え、反例検出手段は、顔パターン以外を含む学習サンプルを用いた学習により、顔識別手段と同様にして、顔パターン以外が含まれている可能性が高いと検出することで、顔パターンが含まれないことを検出することを特徴とする。
【０００９】
また、本発明の画像識別装置の一態様においては、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出手段を、階層的顔パターン識別手段の前または後ろに有し、反例検出手段は、顔パターン以外を含む学習サンプルを用いた学習により、顔識別手段と同様にして、顔パターン以外が含まれている可能性が高いと検出することを特徴とする。
【００１０】
また、本発明の画像識別装置の一態様においては、非線形変換はカーネル関数を用いた計算手段によって与えられることを特徴とする。
【００１１】
また、本発明の画像識別装置の一態様においては、カテゴリは、複数の学習データを非線形変換して得られたベクトルの組を基底ベクトルとする部分空間によって構築されることを特徴とする。また、本発明の画像識別装置は、部分空間を張る基底ベクトルは、新たな学習サンプルの非線形変換による写像が与えられた時に、この写像とそれまでに生成されている部分空間との関連性が高くなるように更新されることを特徴とする。
【００１２】
また、本発明の画像識別装置の一態様においては、非線形変換を定義するカーネル関数は、学習サンプルの非線形変換による写像と、部分空間との関連性に応じて変形されることを特徴とする。また、本発明の画像識別装置は、各解像度に対応した顔検出手段は装置内に設けられた並列演算装置により並列的に処理されることを特徴とする。
【００１３】
また、本発明の画像識別方法は、コンピュータが備える下記各手段が実行する方法であって、正規化手段が、非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化工程と、粗視化手段が、正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化工程と、記憶手段が、前記解像度毎に、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリを記憶する記憶工程と、評価手段が、いずれかの解像度の粗視データを、前記記憶手段に記憶されたその解像度のカテゴリと比較し、前記非線形変換により定められる空間上での位置関係を評価する評価工程と、判断手段が、前記評価手段により両者が所定の位置関係にあると評価された場合には、その粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する判断工程と、階層的顔パターン識別手段が、解像度の一番低い前記粗視データを起点として、前記評価手段及び前記判断手段による処理を実行し、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い前記粗視データに対する処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別工程と、反例検出手段が、階層的顔パターン識別工程の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出工程と、を含み、反例検出工程においては、顔パターン以外を含む学習サンプルを用いた学習により、評価工程及び判断工程と同様にして、顔パターン以外が含まれている可能性が高いと検出することで顔パターンが含まれないことを検出することを特徴とする。
また、本発明の画像識別プログラムは、非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化手段と、正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化手段と、前記解像度毎に、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリを記憶する記憶手段と、いずれかの解像度をもつ粗視データを、前記記憶手段に記憶されたその解像度のカテゴリと比較し、前記非線形変換により定められる空間上での位置関係を評価する評価手段と、前記評価手段により両者が所定の位置関係にあると評価された場合には、その粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する判断手段と、解像度の一番低い前記粗視データを起点として、前記評価手段及び前記判断手段による処理を実行し、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い前記粗視データに対する処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別手段と、階層的顔パターン識別手段の処理の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出手段、としてコンピュータを機能させ、反例検出手段は、顔パターン以外を含む学習サンプルを用いた学習により、顔識別手段と同様にして、顔パターン以外が含まれている可能性が高いと検出することで顔パターンが含まれないことを検出することを特徴とする。
【００１４】
【発明の実施の形態】
以下に、本発明の好適な実施形態を図面を用いて説明する。図中、同一構成となるものについては説明を省略する。
【００１５】
図１のブロック図は、本発明の実施の形態に係る装置の構成を示している。装置は、演算を行うＣＰＵ２をはじめ、記憶部４、利用者の指示入力部６、表示部８、データ入力部１０、データ出力部１２、およびアプリケーションソフトウエア入力部１４を含む構成となっており、これらはデータを通信する通信網によって結ばれている。すなわち、この装置は、一般的なコンピュータ上で、本発明のアルゴリズムを記載したアプリケーションソフトウエアを実行することで実現される。利用者は、ＣＤ−ＲＯＭ等の記憶媒体や、ネットワークを介して頒布されたアプリケーションソフトウエアを、そのアプリケーションソフトウエア入力部１４を用いてコンピュータに入力し、キーボード等の指示入力部６を使ってＣＰＵ２に実行させる。ＣＰＵ２の動作は、オペレーティングシステム（ＯＳ）と呼ばれるソフトウエアの管理下にあり、利用者ならびにアプリケーションソフトウエアの指示は、このＯＳを通じてＣＰＵ２に伝えられる。本実施形態のアプリケーションソフトウエアやＯＳを始めとする演算実行上必要な情報は、メモリやハードディスク等からなる記憶部４によって一時的または恒久的に保持される。また、実行にあたって必要となる画像データは、ＣＣＤカメラ、スキャナ、ＣＤ−ＲＯＭ等の記憶媒体、あるいはネットワークによるデータ取得等のデータ入力部１０を通して得られる。そして、必要な演算は、そして、必要な演算がＣＰＵ２によって成されると、処理された画像データは、ＭＯ等の記憶媒体、ネットワークによるデータ転送、プリンタ等のデータ出力部１２を通じて出力される。また、利用者は、ディスプレイなどの表示部８によって、処理前後の画像データ等を見ることができる。
【００１６】
図２は、ＣＰＵ２によって行われる画像処理演算の概略を示すブロック図である。データ入力部１０から入力された画像データは、画像正規化部２０によって正規化変換を受け、さらにパターン識別部１００によって詳細なパターンの識別をされる。なお、ここで言う正規化とは、顔パターンの大きさ、回転角度、位置、明るさなどの条件を、パターン識別部１００の想定する状態（これを正規形と呼ぶことにする）へと変換することである。
【００１７】
画像正規化部２０で行われる正規化のための変換は基本的な部分変換からなる要素に分割されており、各部分変換をどのように行えばよいかは、それぞれの部分変換に対応した部分変換検出部２６が算出する。図示した例では、画像データのサイズ（拡大と縮小）に関係したサイズ部分変換検出部２６ａ、画像データの回転角度に関係した回転部分変換検出部２６ｂ、画像データのシフト（平行移動）に関係したシフト部分変換検出部２６ｃの３つの部分変換検出部２６を備える。これらの部分変換検出部２６は、後で詳しく述べるように、画像データに粗視化のための空間フィルタを施して得た粗視データに対して部分変換の状態検出を行い、その結果を正規化処理部２８に渡す。そして、正規化処理部２８は、変換にともなう誤差が最も小さいと判定された変換を画像データに施す。この一連の過程は、通常何度か繰り返され、最終的には、サイズ、回転、シフトの全てについて正規化が行われることになる。もちろん、顔パターンの状況によっては、繰り返しを行わないことも可能である。
【００１８】
パターン識別部１００は、画像正規化部２０によって正規化が行われた画像データに対し、空間フィルタを用いて様々な解像度の粗視データへと粗視化する処理を行い、さらにこの粗視データに対し、顔パターンの識別を行う顔識別部１０２を実行する。図示した例においては、主成分分析のモードを適当な次元だけ足し合わせる粗視化がなされており、２５次元の粗視データに対する顔識別部１０２ａと、１００次元の粗視データに対する顔識別部１０２ｂをはじめ、その間の解像度にも複数の顔識別部１０２が設けられている。また、最も高い次元である１００次元の粗視データに対し、顔パターン以外が含まれることを判定する反例検出部１０４が設けられている。後で詳細に記すように、顔パターンの識別は解像度が一番低い２５次元の顔識別部１０２ａから行われ、顔パターンがある可能性が高いと判定された場合には、次に低い解像度の顔識別部１０２が判定に用いられる。そして、最も解像度が高い１００次元においても、顔パターンのある可能性が高いと判定された場合には、最後に反例検出が行われる。
【００１９】
以下では、画像正規化部２０とパターン識別部１００の詳細な説明を行う。
【００２０】
図３のブロック図は、画像正規化部２０の構成の概略を示している。入力された画像データは、記憶部４に設けられた画像保持部３０に保持される。そして、正規化用粗視化部３２において、この画像データに対し空間解像度を落として大まかな特徴を取り出す粗視化を行い粗視データを得る。この粗視化のために用いる空間フィルタ手段は特に限定されないが、例えば、適当な画像データに対する主成分分析で得たモード成分のうち寄与率の大きな所定次元数のモード成分の和を算出する方法や、フーリエ分解を行い所定の解像度以上のモード成分を取り出す方法などを用いる。粗視化を行う理由は、データ量を減少させ、次に述べる正規化が高速で実行可能になることにある。
【００２１】
続いて粗視データは、並列的に複数配置された部分変換検出部２６に送られる。各部分変換検出部２６では、図４に模式的に示したように、粗視データを画像空間Ｇ内のベクトルｘであるとみなし、パターン識別のための非線形変換によって作られる空間Ｆに写像する。この空間Ｆに写像されたベクトルを写像ベクトルと呼ぶことにし、Φ（ｘ）と書く。部分変換検出部２６は、例えば、サイズと、回転と、シフトについて検出する場合には、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃからなる。そして、各部分変換検出部２６には、正規化用部分空間学習部３４、正規化用部分空間射影部３６、部分変換評価部３７が含まれ、さらに部分変換評価部３７には変換の大きさ評価部３８と推定誤差評価部４０が含まれる。空間Ｆ内には、正規化用部分空間学習部３４が学習サンプルを用いて事前に学習サンプルに特徴的なカテゴリを表す部分空間Ωを構築しており、写像ベクトルΦ（ｘ）は、正規化用部分空間射影部３６によって、この部分空間Ωに射影される。この射影されたベクトルを射影ベクトルと呼びΦ’（ｘ）と表記する。そして、変換の大きさ評価部３８は、射影ベクトルΦ’（ｘ）が部分空間Ωを張る基底ベクトルのうちのどれに近いかを評価して、変換に必要な大きさを算出する。例えば、サイズ部分変換検出部２６ａにおいては、学習時に、基底ベクトルΦ₁は約１．５倍の大きさをもつ学習サンプルの近傍にあり、他の基底ベクトルΦ₂は約２倍の大きさをもつ学習サンプルの近傍にあるといった対応関係を示すルックアップテーブルを作成している。変換の大きさ評価部３８ａは、このルックアップテーブルを参照して、現在の顔パターンを正規形に変換するためには何倍に拡大すればよいのかを算出することができる。また、推定誤差評価部４０は、写像ベクトルΦ（ｘ）と射影ベクトルΦ’（ｘ）の距離Ｅを基にして推定誤差を算出する。これは、距離Ｅが近ければ射影に含まれる誤差は小さく、距離Ｅが大きければ射影結果は大きな誤差を含むであろうと判断されることを意味する。
【００２２】
これらの結果は、変換判定部４２と変換実施部４４とを含む正規化処理部２８に渡される。そして、変換判定部４２は、どの部分変換検出部２６の推定誤差が最小となるかを判定する。例えば、回転部分変換検出部２６ｂの推定誤差が一番小さいときには、変換実施部４４が、対応する変換の大きさ（すなわち回転させる角度）の分だけもとの画像データを回転させ、画像保持部３０のもつ画像を更新する。更新された画像データは、必要に応じて、さらに複数回、同様の正規化を施される。繰り返しの基準は様々に考えられるが、例えば、あらかじめ所定の回数を設定する方法や、実空間において適当な対比データとから算出した相関、あるいは前記変換の大きさ評価手段が求めた値を所定の閾値と比較する方法などを用いることも可能である。
【００２３】
次に、部分変換検出部２６において用意される空間Ｆをカーネル関数を用いて構築する手段について、数学的表現を交えて詳細に説明する。カーネル関数を用いる方法において特徴的なことは、上で述べた写像ベクトルΦ（ｘ）の作成方法が陽に示されないことである。
【００２４】
粗視データを表す画像空間Ｇ上のｄ次元ベクトルｘを、ｄ_F次元の空間Ｆに写像する式（１）の非線形写像は、適当なカーネル関数ｋ（ｘ，ｙ）を選ぶことで、式（２）の関係を満たすように決められる。
【００２５】
【数１】

ここで、φ_i（ｘ）は適当なカーネル関数の固有関数であり、対応する固有値をλ_iである（ｉ＝１，．．．，ｎ）。
【００２６】
次に、粗視データのカテゴリを分類するｍ次元部分空間Ωを、空間Ｆに張る方法及びその学習方法を説明する。まず、部分空間Ωの基底ベクトルの初期値として、画像空間Ｇ上のｍ個のベクトルｘ₁，．．．，ｘ_m（以下ではプレイメージと呼ぶ）に対応した部分空間Ω上のベクトルΦ₁，．．．，Φ_mを適当に決める。具体的には、例えば、一様乱数を発生させてランダムに与える。ここで、画像空間上の学習サンプルを示すｄ次元ベクトルｘを用いて、この部分空間Ωを修正するように、プレイメージを学習させることを考える。学習サンプルのベクトルｘの空間Ｆへの写像Φ（ｘ）を部分空間Ωに射影したベクトルΦ’（ｘ）は、基底ベクトルの一次結合で表現される。その結合係数をα_iとすると、この射影と、もとの写像ベクトルΦ（ｘ）との距離Ｅは式（３）−（５）で表される。
【００２７】
【数２】

ここで、式（５）への変形には、カーネル関数の定義式（２）を用いている。また、係数α_iは、射影の定義に従いＥが最小の値をとるように、式（６）で与えられる。行列Ｋは、ｋ（ｘ_i，ｘ_j）を（ｉ，ｊ）成分とする行列である。
【００２８】
プレイメージの学習では、部分空間Ωと学習サンプルｘ_iとの距離を最も減少させる方向にプレイメージをΔｘ_i動かす。このΔｘ_iは最急降下法によって式（７）で与えられる。
【００２９】
【数３】

ここで、ηは学習係数であり、正の定数である。また、行列ｇ_ab（ｘ）は、非線形写像によって空間Ｆに埋め込まれている多様体の計量テンソルであり、カーネル関数を用いて式（８）で与えられている。この学習は、高次元空間の線形最適化問題なので、非線形最適化問題に比べ収束性が良く、短時間で終了する。
【００３０】
次にカーネル関数の学習方法について記す。カーネル関数としては、初期には、ガウス関数カーネルや、多項式カーネルなどの既知の関数を与える。学習中には、カーネル関数を式（９）の等角写像によって変形する。
【００３１】
【数４】

その学習則は、学習サンプルに対する係数α_iのばらつきが、どの係数α_iに対しても均一になるようにＣ（ｘ）を与えるものとする。具体的には、係数α_iのばらつきが既定値に対して大きい場合は、係数α_iに対応する部分空間の基底ベクトルのプレイメージｘ_i近傍に関して、Ｃ（ｘ）の値を大きくする。これにより、ｘ_iの近傍は空間Ｆにおいて、式（１０）のように拡大される。
【００３２】
【数５】

したがって、係数α_iを大きな値とする学習サンプルの数は相対的に減少し、係数α_iの学習サンプルに対するばらつきは減少する。逆に係数α_iのばらつきが既定値に対して小さい場合は、係数α_iに対応する基底ベクトルのプレイメージｘ_i近傍に関してＣ（ｘ）の値を小さくする。なお、ここで述べた方法では、Ｃ（ｘ）は部分空間Ωの基底のプレイメージに対してしか適用できないが、プレイメージ近傍に関してはプレイメージにおけるＣ（ｘ）の値を式（１１）のように外挿することで変更が可能となる。
【００３３】
【数６】

ここで、学習に用いる学習サンプルの与え方について説明する。例えば、回転に関する正規化を行う場合には、画像中において正規化の対象となる顔パターンが画像の中心位置に正立（頭が上に、顎が下に配置される）する画像データを複数枚用意し、これらに対し−１８０度から１８０度までの範囲で一様乱数を用いて与えた角度、または等間隔に与えた角度に回転させる。また、シフトについては、同じく顔パターンが画像の中心位置に正立した画像を複数枚用意し、縦方向および横方向に、例えば半値幅が適当なピクセル数をもつガウス分布の乱数に従ってシフトさせる。乱数で与える代わりに確率密度が一様となるように規則的に与えても良い。サイズの場合にも、同様にして、顔パターンが画像の中心位置に正立した画像を拡大および縮小させれば良い。このようにして学習を行うことで、学習サンプルのもつ変換の大きさ（例えば回転の場合にはその角度）と、学習サンプルの部分空間への射影の関係が明らかになる。具体的には、例えば係数α₁が大きければ９０度程度回転したものであるといった関係が導かれる。これを詳細に調べ、ルックアップテーブルや、適切な関数を作成することで、変換の大きさ評価部３８の評価手段が確立する。
【００３４】
以上の学習手続きにより、非線形変換で写像される空間Ｆに、粗視データをカテゴリ分けする部分空間Ωが張られる。学習の過程においては、プレイメージの学習およびカーネル関数の学習を交互に複数回反復するのが望ましいが、学習サンプルがあまり複雑でない場合には、どちらかの方法を１回だけ行うなどの簡略化をすることも可能である。
【００３５】
最後に、学習が完了し正規化が行われる段階において、画像正規化部２０が実行される手順の主要部分を図５に示したフローチャートを用いて説明する。画像データが入力される（Ｓ１）と、正規化用粗視化部３２は空間フィルタを用いて粗視データを作成する（Ｓ２）。粗視データは、サイズ部分変換検出部２６ａ、回転部分変換検出部２６ｂ、シフト部分変換検出部２６ｃに送られる。正規化用部分空間射影部３６は、式（６）で定義される射影の一次結合の係数α_iを求める（Ｓ３）。このα_iの求め方は、必ずしも式（６）の定義に従う必要はなく、適当な反復法を用いて式（５）のＥが最小となるように求めても良い。次に、変換の大きさ評価部３８が、こうして得られたα_iをルックアップテーブルと比較する等して変換の大きさを求め（Ｓ４）、推定誤差評価部４０は、Ｅの大きさ、あるいはＥの単調増加関数値を推定誤差として算出する（Ｓ５）。正規化処理部２８における変換判定部４２は、推定誤差が最小となる部分変換検出部２６を判定し（Ｓ６）、もとの画像データに対して、対応する変換の大きさで、変換を行う。こうして得られた画像データは、適当な判断基準に従って、再変換されるか否かが決められる（Ｓ８）。なお、先にも述べたように、この一連の演算において、式（１）で定義される非線形変換は直接は用いられず、したがって、その形状を知る必要もない。
【００３６】
図６に、サイズ、回転、シフトの各要素からなる正規化をおこなった結果を示す。この実験は、図の右側の写真で示したように、目の近傍を写した２つの写真が正規化されていく様子を、一回の変換毎に追跡したものである。右上の一連の写真では、初期（左上）に反転している写真が、最初のステップで約９０度半時計回りに回転され、次のステップでやや左にシフトされ、といった変換を受け、最後には正立した所望の大きさに正規化されている。左側の３次元のグラフは、この正規化の過程における、サイズ（倍率）、角度（度）、距離（ピクセル）を逐次追跡したものである。左上の黒丸は、初期の写真が、１８０度の回転と、１．３倍程度の拡大と、若干のシフトを受けていることを示している。そして、一回の変換毎に３つの座標軸のいずれか一つに沿って移動し、最終的に右側の正規化された位置に移っている。右下の一連の写真、及び対応する左のグラフの白丸も同様の流れを示しており、この場合には、拡大を中心に正規化が行われている。なお、ここでは、顔パターン全体ではなく目の近傍に限定しているが、顔パターン全体とした場合にも基本的な効果は全くかわらない。ただし、顔パターン全体とした場合には、図示した例とは、学習サンプルを変えなければならないことは言うまでもない。
【００３７】
なお、正規化用粗視化部３２で用いる空間フィルタの解像度には任意性があるが、ここで示した例では、主成分分析の方法により２５次元程度の粗視化を行っている。また、空間Ｆに張る部分空間Ωの次元もいろいろな値を取ることが可能であるが、ここでは２５次元とした。学習サンプルの数は、検出に必要な精度にもよるが、例えば、１００人程度の顔パターンを、各部分変換検出部２６で、一人につき１００通り程度変化させればよい。この結果、部分変換検出手段２６を３つ用いた場合には、全学習サンプル数は３万程度になる。一方、本実施の形態を用いずに同じ自由度を与えると、全学習サンプル数は１００万程度になってしまう。したがって、本実施形態を用いることで学習サンプル数を格段に軽減できることがわかる。また、部分変換検出部２６の検出する部分変換は、ここでは、サイズ、回転、シフトとした。これらの要素は、特に限定されないが、単純な変換をおこなうと変換が容易となる。すなわち、サイズおよび回転については、一次変換で記述できる形式を用い、シフトについては剪断性をもたない一様な平行移動を用いると良い。もちろん、扱うパターンの特性に応じて、これよりも複雑な変換を割り当てることもできる。また、画像データの輝度に関する変換等を割り当てることも可能である。
【００３８】
上に説明した非線形変換は、カーネル関数を用いて定義された。しかし、非線形変換の構築方法には任意性がある。ここではニューラルネットワークのアルゴリズムに従ったオートエンコーダを用いて非線形変換を行う方法について説明する。
【００３９】
図７に、オートエンコーダの概略を示す。オートエンコーダは、多層のパーセプトロンの一種であり、入力層６０のニューロン数と、出力層６２のニューロン数が同じで、中間層６４のニューロン数はこれよりも少なくなっている。
【００４０】
このオートエンコーダを部分変換検出部２６として用いるためには、次のようにする。まず、カーネル関数を用いる場合と同様にして作成した学習サンブルを入力層６０へ入力するとともに、同じ値を教師信号として出力層６２に与え、恒等写像を実現するように各シナプスの重みを学習させる。この学習は通常のバックプロパゲーション法で行うことができる。
【００４１】
こうして学習されたオートエンコーダの出力層６２の出力は、非線形変換による写像が作る空間Ｆを表現しているとみなすことができる。また、オートエンコーダの中間層６４のニューロンの出力は、空間Ｆ内に張られたカテゴリを分類する部分空間Ωへの射影に相当する。したがって、入力層６０に粗視データを入力し、中間層６４の出力を得ることで、正規化用部分空間射影部３６を実現することができる。また、学習時に、学習サンプルの特徴と中間層６４の出力との関係を調べ、ルックアップテーブルを作成することで、変換の大きさ評価部３８を実施することができる。さらに、推定誤差評価部４０が評価する推定誤差は、入力層６０のベクトルと出力層６２のベクトルとの距離、あるいはその単調増加の関数によって算出可能である。この距離が変換の精度に対応していることは、距離が短いほど空間Ｆへの写像が入力を精度よく近似できていることから明らかである。
【００４２】
以上に、画像正規化部２０によって、画像データを正規化する様子を説明した。ここからは、画像正規化部２０が出力した画像データから顔パターンを識別する、パターン識別部１００について説明する。
【００４３】
図８は、パターン識別部１００の構成を示すブロック図である。パターン識別部１００は、複数の空間解像度をもつ識別用粗視化部１０６と、各識別用粗視化部１０６に接続された顔識別部１０２、そして反例検出部１０４からなる。識別用粗視化部１０６は、画像正規化部２０における正規化用粗視化部３２と同様に、画像データに空間フィルタを施して粗視データを出力する役割を果たしている。その解像度は自由に設定でき、ここでは最低次元を２５次元、最高次元を１００次元とし、その間にも複数の識別用粗視化部１０６を設けている。顔識別部１０２は、各識別用粗視化部１０６に設けられており、顔パターンの識別を行う。
【００４４】
入力された画像データは、まず、空間解像度が最も低い識別用粗視化部１０６ａに入力され、図示した例においては、２５次元の粗視データに変換される。そして、粗視データは、顔識別部１０２ａに入力される。顔識別部１０２ａは、識別用部分空間学習部１０８ａ、識別用部分空間射影部１１０ａ、および識別用判定部１１２ａを含んでおり、画像正規化部２０で説明した部分変換検出部２６とよく似た動作を行う。すなわち、入力された粗視データは、カーネル関数で定義される非線形変換によって空間Ｆに式（１）のように写像される。この空間Ｆでは、識別用部分空間学習部１０８ａによって事前に学習が行われており、学習サンプルの顔パターンを特徴づける部分空間Ωが張られている。識別用部分空間射影部１１０ａは、空間Ｆに写像された写像ベクトルを、この部分空間Ωに射影する。これにより、射影された射影ベクトルの一次結合の係数α_iが決められ、射影の垂線の長さＥが式（５）から得られる。識別用判定部１１２ａは、両者の位置関係、すなわちＥの大きさを適当な閾値などで評価して、このデータを部分空間Ωのカテゴリに含めるか否かを判定する。閾値の決定は、適当なサンプルデータに対する正答率に基づくなどして決めればよい。判定の結果、顔パターンが含まれている可能性が高いと判断されると、次に解像度の低い識別用粗視化部１０６及び、対応する顔識別部１０２が実行される。
【００４５】
反例検出部１０４は、顔識別部１０２と同様の構成をしており、反例用部分空間学習部１１４、部分空間射影部１１６、および反例用判定部１１８を含んでいる。顔識別部１０２との違いは、反例用部分空間学習部１１４によって、顔以外のパターンが学習される点である。すなわち、顔以外のパターンを含む学習サンプルを用いて、顔以外のパターンが含まれることを特徴とする部分空間Ωが形成される。部分空間射影部１１６が非線形変換の写像をこの部分空間Ωに射影する点と、反例用判定部１１８が写像ベクトルと射影ベクトルの位置関係に基づいて分類を行う点は同じである。
【００４６】
顔識別部１０２と、反例検出部１０４の学習の方法も、画像正規化部において説明した方法と同様である。すなわち、顔識別部１０２においては、識別したい顔パターンの学習サンプルを複数用意し、それをもとに、部分空間の基底ベクトルに対応するプレイメージの更新と、カーネルの変形を行う。なお、このパターン識別部１００は、通常、画像正規化部２０によって正規化された画像データに対してパターン認識を行う。したがって、顔パターンは正規化されていることが期待できるので、学習サンプルはサイズ、回転角度、シフト等に関して正規化されたものだけを用いればよい。反例検出部１０４の学習サンプルとしては顔パターン以外のものを用いればよい。ただし、一般に顔識別部１０２によって識別しにくいものを学習させることで効果を発揮するので、正規化された顔パターンに類似した紛らわしいものを中心に学習させておくとよい。
【００４７】
図１０に、ここで述べた識別を試験的に実施した結果を示す。左側は、本発明を用いずに、５０次元に粗視化されたデータに対してのみ検出を実行している。一方、右側は、本実施例を用いた場合で、２５次元、５０次元、１００次元の３つの解像度に、顔識別部１０２を用いて階層的に検出を行った結果である。ただし、反例検出部１０４は含めていない。使用した画像データは、ひとつの画像データの中に複数の顔を含んでおり、その中から顔パターンを検出したものである。いずれも９０％の確率で顔を検出できる。横軸は、ひとつの画像データの中から顔以外のパターンを誤って見つけた個数であり、縦軸はその比率を示している。従来の方法では、間違いが無かった比率は６３パーセントで、間違いが１つだけ合った比率は２２パーセントであった。本発明では、この値はそれぞれ、８２パーセントと１４パーセントになっている。この結果、画像一つあたりの誤検出率は、０．４０個から、０．２４個に向上している。もちろん、１００次元の高解像度での検出には多くの計算時間を必要とするが、本実施形態では、２５次元の解像度において顔パターンが含まれる可能性が低いと判定した場合にはそれ以上の解像度での検出を行わないので、無駄な計算時間を必要とせず、効率的で高精度な検出が達成できている。なお、図示はしないが、この実験においてさらに、顔パターンが含まれないことを検出する反例検出部１０４を各解像度に含めた場合には、誤検出率はほぼ０になり、その有効性が確認できている。
【００４８】
最後に、本実施の形態における特徴的な点を列挙しておく。本実施の形態の画像正規化部２０により、入力された画像データにおける顔パターンの正規化を、非常に少ない学習サンプルをもとに学習しただけで、実現することができる。また、回転、拡大と縮小、平行移動などに分類して正規化を行った場合には、対応した学習サンプルだけを用いて学習させればよく、非常に効率的な学習が可能となる。また、正規化をニューラルネットワークを用いて行った場合には、非線形性をもつパターン分布に対しても容易に正規化を行うことができる。また、正規化をカーネル関数で定義された非線形変換を利用して行った場合には、非線形性をもつパターン分布に対しても精度よく正規化を行うことができる。また、並列計算機を用いて正規化を行った場合には、迅速な正規化の実行が可能となる。
【００４９】
本実施の形態のパターン識別部１００により、本質的に非線形性を有する顔パターンの特徴を、非線形変換を用いて高精度に識別できる。また、低分解能から高分解能へと階層化された判定を行うため、顔パターンが含まれないと容易に判定できるものに時間をかけることなく高速に識別できる。また、反例を検出する手段を併用した場合には、判定の精度が向上する。また、カーネル関数で定義される非線形変換を用いてパターンの識別が行われる場合には、信頼性の高い識別が可能となる。また、カテゴリを表す部分空間を、非常に高速に構築することができる。また、学習サンプルをもちいて部分空間における基底ベクトルを効率良く張り直すことができる。また、学習サンプルを用いてカーネル関数を容易に変形できるので、パターンの識別の向上を容易に図る事が可能となる。また、並列計算機を用いた場合には、各解像度におけるパターンの識別を効率良く計算することができる。
【００５０】
これら画像正規化部２０とパターン識別部１００は、お互いに補完しあうことで、非常に高精度で高速な顔パターンの識別が可能になる。
【図面の簡単な説明】
【図１】本実施形態の計算機の構成を示す概略図である。
【図２】画像正規化部およびパターン識別部の概略を示すブロック図である。
【図３】画像正規化部の詳細を示すブロック図である。
【図４】非線形変換の様子を表す模式図である。
【図５】画像正規化部の処理手順を示すフローチャートである。
【図６】画像正規化部の試験結果を示す図である。
【図７】画像正規化部に用いるオートエンコーダの概略図である。
【図８】パターン識別部の概略を示すブロック図である。
【図９】パターン識別部の処理手順を示すフローチャートである。
【図１０】パターン識別部の試験結果を示す図である。
【符号の説明】
２０画像正規化部、２６部分変換検出部、２８正規化処理部、１００パターン識別部、１０２顔識別部、１０４反例検出部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an image identification device, and more particularly to an image identification device that efficiently identifies a pattern from image data.
[0002]
[Prior art]
In order to identify a specific pattern from image data using a computer, it is necessary to reliably detect the shape of the pattern itself, and the size and rotation angle of the included pattern vary. There is a need to do.
[0003]
Conventional examples of the former include a linear subspace method, a support vector machine (SVM) method, and a kernel nonlinear subspace method. In the linear subspace method, a subspace is defined for each of a plurality of categories, the subspace to which the unknown pattern is most related is evaluated, and the category to which the pattern belongs is determined. However, in this method, when there are many categories and the pattern dimension is low, the detection accuracy is lowered. In addition, there is a problem that the discrimination accuracy for a pattern distribution having nonlinearity is low.
[0004]
The SVM method is a method that makes it possible to identify a pattern distribution having nonlinearity by mapping a low-dimensional pattern to a high dimension by nonlinear transformation defined with a kernel function as a medium. However, there is a problem in that only two categories can be classified and a large amount of calculation is required.
[0005]
The kernel nonlinear subspace method is devised as a pattern identification method for solving these problems, and is disclosed in Japanese Patent Laid-Open No. 2002-90274. In this method, similarly to the SVM method, a pattern is mapped in a high dimension by non-linear transformation defined using a kernel function, and a subspace method is performed on this high-dimensional non-linear space.
[0006]
Conventionally, in order to detect patterns having various sizes and rotation angles, a large number of learning samples have been used. That is, in each of the pattern identification methods described above, learning is generally performed by using a learning sample having a characteristic pattern to determine a distribution of categories indicating the characteristic. Therefore, it is necessary to use not only a pattern whose size and angle are changed variously as a learning sample, but also a very large number of patterns which are modified by combining size and angle.
[0007]
[Problems to be solved by the invention]
However, in the above-mentioned kernel nonlinear subspace method, since the basis vectors that extend the subspace are defined based on the mapping of all learning samples to the nonlinear space, a large number of learning samples still requires a lot of computation. was there. An object of the present invention is to establish means for identifying a human or animal face pattern in an image at high speed and with high accuracy.
[0008]
[Means for Solving the Problems]
In one aspect of the image identification device of the present invention, normalizing means for normalizing the face pattern included in the image data using nonlinear conversion to normalize the image data, and the normalized image data Coarse-graining means for generating a plurality of coarse-grained data by applying spatial filters having different spatial resolutions, and face identifying means provided for each of the resolutions, with the coarse-grained data of any resolution being Compared with a category constructed in the space defined by the non-linear transformation by a normalized face pattern learning sample having the above, the category is included in the image data corresponding to the coarse-grained data when there is a predetermined positional relationship. The face identification unit that determines that the face pattern represented by is likely to be included and the face identification unit that has the lowest resolution are used as a starting point. If it is determined that there is a high possibility that a face pattern is included, the next low-resolution face identification means executes the process, and the hierarchical face pattern is identified using multiple resolutions. A facial pattern identification means; A counter example detecting unit that detects that a face pattern is not included in image data or coarse-grained data having a predetermined resolution before or after the processing of the hierarchical face pattern identifying unit, and the counter example detecting unit includes: By detecting using a learning sample that includes other than the face pattern, it is detected that there is a high possibility that the pattern other than the face pattern is included in the same manner as the face identification unit, thereby detecting that the face pattern is not included. It is characterized by that.
[0009]
Also, the image identification device of the present invention In one aspect Has anti-example detecting means for detecting that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the hierarchical face pattern identifying means. By learning using a learning sample including a pattern other than a pattern, it is detected that there is a high possibility that a pattern other than a face pattern is included in the same manner as the face identification unit.
[0010]
Also, the image identification device of the present invention In one aspect Is characterized in that the non-linear transformation is given by a calculation means using a kernel function.
[0011]
Also, the image identification device of the present invention In one aspect The category is constructed by a subspace having a vector set obtained by nonlinearly transforming a plurality of learning data as a base vector. Further, in the image identification device of the present invention, when a basis vector extending a subspace is given a mapping by nonlinear transformation of a new learning sample, the relationship between this mapping and the subspace generated so far is It is characterized by being updated to be higher.
[0012]
Also, the image identification device of the present invention In one aspect Is characterized in that the kernel function defining the nonlinear transformation is deformed according to the relationship between the mapping of the learning sample by the nonlinear transformation and the subspace. The image identification apparatus according to the present invention is characterized in that face detection means corresponding to each resolution are processed in parallel by a parallel arithmetic unit provided in the apparatus.
[0013]
The image identification method of the present invention is a method executed by each of the following units included in the computer, wherein the normalizing unit performs normalization conversion on the face pattern included in the image data using nonlinear conversion, and A normalizing step for normalizing image data, a coarse-graining unit that applies a spatial filter having a different spatial resolution to the normalized image data to generate a plurality of coarse-grained data, and a storage unit Storing a category constructed in a space defined by the non-linear transformation by a learning sample of a normalized face pattern having the resolution for each resolution, and an evaluation means for coarse-graining of any resolution Evaluation that compares the data with the resolution category stored in the storage means and evaluates the positional relationship in the space defined by the nonlinear transformation Process If the evaluation means evaluates that both are in a predetermined positional relationship, the image data corresponding to the coarse-grained data may include the face pattern represented by the category. The determination step of determining that the level is high, and the hierarchical face pattern identification unit executes processing by the evaluation unit and the determination unit starting from the coarse-grained data having the lowest resolution, and the face identification unit of a certain resolution Hierarchical face pattern identification using multiple resolutions by executing processing on the coarse-grained data with the next lowest resolution when it is determined that a pattern is likely to be included A face pattern identification process; A counterexample detection step for detecting that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the hierarchical face pattern identification step; In the process, the face pattern is not included by detecting that there is a high possibility that a part other than the face pattern is included, similarly to the evaluation process and the determination process, by learning using the learning sample including the part other than the face pattern. Detecting that It is characterized by.
In addition, the image identification program of the present invention performs normalization conversion on a face pattern included in image data using non-linear conversion, normalizes the image data, and spatializes the normalized image data. A coarse-graining means for generating a plurality of coarse-grained data by applying spatial filters with different resolutions, and a space defined by the nonlinear transformation by a learning sample of a normalized face pattern having the resolution for each resolution. The storage means for storing the category and the coarse-grained data having any resolution are compared with the resolution category stored in the storage means, and the positional relationship in the space determined by the nonlinear transformation is evaluated. And when the evaluation means evaluates that the two are in a predetermined positional relationship, the image data corresponding to the coarse-grained data is pre- A determination unit that determines that the face pattern represented by the category is likely to be included, and the processing by the evaluation unit and the determination unit are executed with the coarse-grained data having the lowest resolution as a starting point. When it is determined by the face identification means that there is a high possibility that a face pattern is included, the processing of the coarse-grained data having the next lowest resolution is executed, so that the hierarchical face pattern is obtained using a plurality of resolutions. Hierarchical face pattern identification means for identification And a counter-example detecting means for detecting that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the processing of the hierarchical face pattern identifying means, Function as a computer The counterexample detection means does not include a face pattern by detecting that there is a high possibility that a non-face pattern is included in the same manner as the face identification means by learning using a learning sample including other than the face pattern. Detect It is characterized by that.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Preferred embodiments of the present invention will be described below with reference to the drawings. In the figure, description of components having the same configuration is omitted.
[0015]
The block diagram of FIG. 1 shows the configuration of an apparatus according to an embodiment of the present invention. The apparatus includes a CPU 2 that performs calculations, a storage unit 4, a user instruction input unit 6, a display unit 8, a data input unit 10, a data output unit 12, and an application software input unit 14. These are connected by a communication network for communicating data. That is, this apparatus is realized by executing application software describing the algorithm of the present invention on a general computer. A user inputs a storage medium such as a CD-ROM or application software distributed via a network to the computer using the application software input unit 14 and uses an instruction input unit 6 such as a keyboard. The CPU 2 is made to execute. The operation of the CPU 2 is under the control of software called an operating system (OS), and instructions from the user and application software are transmitted to the CPU 2 through this OS. Information necessary for execution of operations such as application software and OS according to the present embodiment is temporarily or permanently held by the storage unit 4 including a memory and a hard disk. Further, image data necessary for execution is obtained through a storage medium such as a CCD camera, scanner, CD-ROM, or data input unit 10 such as data acquisition by a network. Then, when necessary calculations are performed by the CPU 2, the processed image data is output through a storage medium such as an MO, data transfer via a network, and a data output unit 12 such as a printer. Further, the user can view the image data before and after the processing by the display unit 8 such as a display.
[0016]
FIG. 2 is a block diagram showing an outline of the image processing calculation performed by the CPU 2. The image data input from the data input unit 10 is subjected to normalization conversion by the image normalization unit 20, and further, a detailed pattern is identified by the pattern identification unit 100. Note that normalization here refers to conversion of conditions such as the size, rotation angle, position, and brightness of the face pattern into a state assumed by the pattern identification unit 100 (this will be referred to as a normal form). It is to be.
[0017]
The conversion for normalization performed in the image normalization unit 20 is divided into elements consisting of basic partial conversions, and how each partial conversion should be performed depends on the part corresponding to each partial conversion. The conversion detection unit 26 calculates. In the illustrated example, the size part conversion detection unit 26a related to the size (enlargement and reduction) of the image data, the rotation part conversion detection unit 26b related to the rotation angle of the image data, and the shift (parallel movement) of the image data. Three partial conversion detection units 26 of the shift partial conversion detection unit 26c are provided. As will be described in detail later, these partial conversion detection units 26 perform partial conversion state detection on the coarse-grained data obtained by applying a spatial filter for coarse-graining to the image data, and normalize the result. To the processing unit 28. Then, the normalization processing unit 28 performs the conversion that is determined to have the smallest error accompanying the conversion on the image data. This series of processes is usually repeated several times, and finally, normalization is performed for all of the size, rotation, and shift. Of course, depending on the situation of the face pattern, it is possible not to repeat.
[0018]
The pattern identification unit 100 performs a process of coarse-graining the image data that has been normalized by the image normalization unit 20 into coarse-grained data having various resolutions using a spatial filter. On the other hand, the face identification unit 102 for identifying the face pattern is executed. In the illustrated example, coarse-graining is performed in which the principal component analysis modes are added by appropriate dimensions, and a face identification unit 102a for 25-dimensional coarse-grained data and a face identification unit 102b for 100-dimensional coarse-grained data. In addition, a plurality of face identification units 102 are also provided for the resolution between them. Further, a counterexample detection unit 104 is provided that determines whether the highest dimension of 100-dimensional coarse-grained data includes a face pattern other than the face pattern. As will be described in detail later, the face pattern is identified from the 25-dimensional face identification unit 102a having the lowest resolution, and if it is determined that there is a high possibility that there is a face pattern, The face identification unit 102 is used for determination. If it is determined that there is a high possibility of a face pattern even in 100 dimensions with the highest resolution, counterexample detection is performed last.
[0019]
Hereinafter, the image normalization unit 20 and the pattern identification unit 100 will be described in detail.
[0020]
The block diagram of FIG. 3 shows an outline of the configuration of the image normalization unit 20. The input image data is held in an image holding unit 30 provided in the storage unit 4. Then, in the normalizing coarse-grain unit 32, coarse-graining data is obtained by performing coarse-graining to extract a rough feature by reducing the spatial resolution of the image data. The spatial filter means used for the coarse-graining is not particularly limited. For example, a method of calculating the sum of mode components having a predetermined dimensionality having a large contribution ratio among mode components obtained by principal component analysis with respect to appropriate image data. Alternatively, a method of extracting a mode component having a predetermined resolution or higher by performing Fourier decomposition is used. The reason for coarse-graining is that the amount of data is reduced and normalization described below can be performed at high speed.
[0021]
Subsequently, the coarse-grained data is sent to the partial conversion detectors 26 arranged in parallel. As shown schematically in FIG. 4, each partial conversion detection unit 26 regards the coarse-grained data as a vector x in the image space G and maps it to a space F created by nonlinear conversion for pattern identification. . The vector mapped to the space F is called a mapped vector and is written as Φ (x). For example, when detecting the size, rotation, and shift, the partial conversion detection unit 26 includes a size partial conversion detection unit 26a, a rotation partial conversion detection unit 26b, and a shift partial conversion detection unit 26c. Each partial conversion detection unit 26 includes a normalization subspace learning unit 34, a normalization subspace projection unit 36, and a partial conversion evaluation unit 37, and the partial conversion evaluation unit 37 further includes a conversion size. An evaluation unit 38 and an estimation error evaluation unit 40 are included. In the space F, the subspace learning unit 34 for normalization uses the learning sample to construct a subspace Ω representing a characteristic category of the learning sample in advance, and the mapping vector Φ (x) is normalized. The partial space projection unit 36 projects this partial space Ω. This projected vector is called a projected vector and expressed as Φ ′ (x). Then, the conversion magnitude evaluation unit 38 evaluates which of the base vectors the projection vector Φ ′ (x) is close to the partial space Ω, and calculates the magnitude required for the conversion. For example, in the size partial conversion detection unit 26a, during learning, the basis vector Φ ₁ Is in the vicinity of a training sample that is approximately 1.5 times as large as other basis vectors Φ ₂ Creates a look-up table showing a correspondence relationship in the vicinity of a learning sample having a size about twice as large. The conversion size evaluation unit 38a can calculate how many times the current face pattern should be enlarged in order to convert the current face pattern into the normal form with reference to the lookup table. In addition, the estimation error evaluation unit 40 calculates an estimation error based on the distance E between the mapping vector Φ (x) and the projection vector Φ ′ (x). This means that if the distance E is close, the error included in the projection is small, and if the distance E is large, it is determined that the projection result will include a large error.
[0022]
These results are passed to a normalization processing unit 28 including a conversion determination unit 42 and a conversion execution unit 44. Then, the conversion determination unit 42 determines which partial conversion detection unit 26 has the smallest estimation error. For example, when the estimation error of the rotation part conversion detection unit 26b is the smallest, the conversion execution unit 44 rotates the original image data by the corresponding conversion size (that is, the rotation angle), and the image holding unit 30 images are updated. The updated image data is subjected to the same normalization a plurality of times as necessary. There are various repetition criteria. For example, a method of setting a predetermined number of times in advance, a correlation calculated from appropriate contrast data in real space, or a value obtained by the magnitude evaluation means of the conversion is predetermined. It is also possible to use a method of comparing with a threshold.
[0023]
Next, means for constructing the space F prepared in the partial conversion detection unit 26 using a kernel function will be described in detail with mathematical expressions. What is characteristic in the method using the kernel function is that the method for creating the mapping vector Φ (x) described above is not explicitly shown.
[0024]
A d-dimensional vector x on the image space G representing coarse-grained data is expressed as d _F The nonlinear mapping of Expression (1) that maps to the dimensional space F is determined so as to satisfy the relationship of Expression (2) by selecting an appropriate kernel function k (x, y).
[0025]
[Expression 1]

Where φ _i (X) is an eigenfunction of an appropriate kernel function, and the corresponding eigenvalue is represented by λ _i (I = 1,..., N).
[0026]
Next, a method of extending the m-dimensional subspace Ω for classifying the coarse-grained data category in the space F and a learning method thereof will be described. First, as an initial value of a basis vector of the subspace Ω, m vectors x on the image space G ₁ ,. . . , X _m Vector Φ on the subspace Ω corresponding to (hereinafter referred to as pre-image) ₁ ,. . . , Φ _m Determine appropriately. Specifically, for example, a uniform random number is generated and given at random. Here, let us consider that the pre-image is learned so as to correct this subspace Ω by using the d-dimensional vector x indicating the learning sample in the image space. A vector Φ ′ (x) obtained by projecting the mapping Φ (x) of the learning sample vector x onto the space F onto the subspace Ω is expressed by a linear combination of basis vectors. The coupling coefficient is α _i Then, the distance E between this projection and the original mapping vector Φ (x) is expressed by equations (3)-(5).
[0027]
[Expression 2]

Here, the kernel function definition formula (2) is used for the transformation into the formula (5). The coefficient α _i Is given by equation (6) so that E takes the minimum value according to the definition of projection. The matrix K is k (x _i , X _j ) Is a matrix having (i, j) components.
[0028]
In pre-image learning, subspace Ω and learning sample x _i Δx in the direction that reduces the distance between _i move. This Δx _i Is given by equation (7) by the steepest descent method.
[0029]
[Equation 3]

Here, η is a learning coefficient and is a positive constant. Matrix g _ab (X) is a manifold metric tensor embedded in the space F by a non-linear mapping, and is given by Equation (8) using a kernel function. Since this learning is a linear optimization problem in a high-dimensional space, it has better convergence than a nonlinear optimization problem and is completed in a short time.
[0030]
Next, the kernel function learning method is described. As a kernel function, a known function such as a Gaussian function kernel or a polynomial kernel is given initially. During learning, the kernel function is transformed by the conformal mapping of equation (9).
[0031]
[Expression 4]

The learning rule is the coefficient α for the learning sample _i Which coefficient α _i Also, C (x) is given so as to be uniform. Specifically, the coefficient α _i If the variation of _i Preimage x of the subspace basis vector corresponding to _i For the neighborhood, the value of C (x) is increased. As a result, x _i Is expanded in the space F as shown in Expression (10).
[0032]
[Equation 5]

Therefore, the coefficient α _i The number of training samples with a large value decreases relatively, and the coefficient α _i The variation for the learning sample is reduced. Conversely, coefficient α _i If the variation of _i Preimage x of basis vectors corresponding to _i Decrease the value of C (x) for the neighborhood. In the method described here, C (x) can be applied only to the base pre-image of the subspace Ω, but for the vicinity of the pre-image, the value of C (x) in the pre-image is expressed by Equation (11). Thus, the change can be made by extrapolation.
[0033]
[Formula 6]

Here, how to give a learning sample used for learning will be described. For example, when normalization related to rotation is performed, a plurality of pieces of image data in which the face pattern to be normalized in the image is erected at the center position of the image (the head is located above and the chin is located below) Prepare them and rotate them to an angle given using uniform random numbers in the range of -180 degrees to 180 degrees, or angles given at equal intervals. As for the shift, a plurality of images in which the face pattern is erected at the center position of the image is prepared, and the image is shifted in the vertical and horizontal directions, for example, according to a random number of Gaussian distribution having an appropriate number of pixels at half width. Instead of giving a random number, it may be given regularly so that the probability density is uniform. Similarly, in the case of the size, an image in which the face pattern is upright at the center position of the image may be enlarged and reduced. By performing learning in this way, the relationship between the magnitude of the conversion of the learning sample (for example, its angle in the case of rotation) and the projection of the learning sample onto the subspace becomes clear. Specifically, for example, the coefficient α ₁ If is large, a relationship of 90 degrees rotation is derived. By examining this in detail and creating a lookup table or an appropriate function, the evaluation means of the conversion size evaluation unit 38 is established.
[0034]
Through the learning procedure described above, the subspace Ω for categorizing the coarse-grained data is created in the space F mapped by the nonlinear transformation. In the learning process, it is desirable to repeat the pre-image learning and kernel function learning multiple times alternately. However, if the learning sample is not very complicated, either method is simplified. It is also possible to do.
[0035]
Finally, the main part of the procedure executed by the image normalization unit 20 when learning is completed and normalization is performed will be described with reference to the flowchart shown in FIG. When image data is input (S1), the normalizing coarse-grain unit 32 creates coarse-grained data using a spatial filter (S2). The coarse-grained data is sent to the size partial conversion detection unit 26a, the rotation partial conversion detection unit 26b, and the shift partial conversion detection unit 26c. The normalization subspace projection unit 36 is a linear combination coefficient α of the projection defined by the equation (6). _i Is obtained (S3). This α _i However, it is not always necessary to follow the definition of Equation (6), and an appropriate iterative method may be used so that E in Equation (5) is minimized. Next, the conversion magnitude evaluation unit 38 obtains α thus obtained. _i Is compared with a lookup table to determine the magnitude of the conversion (S4), and the estimation error evaluation unit 40 calculates the magnitude of E or a monotonically increasing function value of E as an estimation error (S5). The conversion determination unit 42 in the normalization processing unit 28 determines the partial conversion detection unit 26 that minimizes the estimation error (S6), and performs conversion on the original image data with the corresponding conversion size. . Whether or not the image data obtained in this way is reconverted is determined according to an appropriate judgment criterion (S8). As described above, in this series of operations, the non-linear transformation defined by Expression (1) is not directly used, and therefore it is not necessary to know its shape.
[0036]
FIG. 6 shows the result of normalization consisting of size, rotation, and shift elements. In this experiment, as shown in the photograph on the right side of the figure, the manner in which two photographs showing the vicinity of the eyes are normalized is tracked for each conversion. In the upper right series of photos, the photo inverted in the initial (upper left) is rotated about 90 degrees counterclockwise in the first step, shifted slightly to the left in the next step, etc. Is normalized to the desired upright size. The three-dimensional graph on the left is obtained by sequentially tracking the size (magnification), angle (degree), and distance (pixel) in the normalization process. The black circle in the upper left indicates that the initial photo has undergone a 180 ° rotation, an enlargement of about 1.3 times, and a slight shift. And it moves along any one of three coordinate axes for every conversion, and finally moves to the normalized position on the right side. A series of photographs at the lower right and the corresponding white circles on the left graph show the same flow. In this case, normalization is performed mainly for enlargement. In this case, the face pattern is limited to the vicinity of the eyes, not the entire face pattern, but the basic effect is not changed at all when the entire face pattern is used. However, when the entire face pattern is used, it goes without saying that the learning sample must be changed from the illustrated example.
[0037]
Note that the resolution of the spatial filter used in the normalizing coarse-grain unit 32 is arbitrary, but in the example shown here, coarse-graining of about 25 dimensions is performed by the principal component analysis method. Also, the dimension of the subspace Ω spanning the space F can take various values, but here it is 25 dimensions. Although the number of learning samples depends on the accuracy required for detection, for example, a face pattern of about 100 people may be changed by each partial conversion detection unit 26 by about 100 ways per person. As a result, when three partial conversion detecting means 26 are used, the total number of learning samples is about 30,000. On the other hand, if the same degree of freedom is given without using this embodiment, the total number of learning samples will be about one million. Therefore, it can be seen that the number of learning samples can be significantly reduced by using this embodiment. In addition, the partial conversion detected by the partial conversion detection unit 26 is here a size, rotation, and shift. These elements are not particularly limited, but conversion is facilitated by simple conversion. That is, for the size and rotation, a format that can be described by linear transformation is used, and for the shift, uniform translation without shearing is preferably used. Of course, more complex conversions can be assigned according to the characteristics of the pattern to be handled. It is also possible to assign a conversion related to the luminance of the image data.
[0038]
The nonlinear transformation described above was defined using a kernel function. However, the nonlinear transformation construction method is arbitrary. Here, a method for performing non-linear transformation using an auto encoder according to an algorithm of a neural network will be described.
[0039]
FIG. 7 shows an outline of the auto encoder. The auto encoder is a kind of multilayer perceptron. The number of neurons in the input layer 60 and the number of neurons in the output layer 62 are the same, and the number of neurons in the intermediate layer 64 is smaller.
[0040]
In order to use this auto encoder as the partial conversion detection unit 26, the following is performed. First, a learning sample created in the same manner as in the case of using a kernel function is input to the input layer 60, and the same value is given as a teacher signal to the output layer 62 to learn the weight of each synapse so as to realize the identity mapping. Let This learning can be performed by a normal back propagation method.
[0041]
The output of the output layer 62 of the auto encoder learned in this way can be regarded as expressing the space F created by the mapping by nonlinear transformation. Further, the neuron output of the intermediate layer 64 of the auto encoder corresponds to the projection onto the subspace Ω that classifies the categories stretched in the space F. Therefore, the normalization partial space projection unit 36 can be realized by inputting coarse-grained data to the input layer 60 and obtaining the output of the intermediate layer 64. Further, at the time of learning, the relationship between the characteristics of the learning sample and the output of the intermediate layer 64 is examined, and the conversion magnitude evaluation unit 38 can be implemented by creating a lookup table. Further, the estimation error evaluated by the estimation error evaluation unit 40 can be calculated by the distance between the vector of the input layer 60 and the vector of the output layer 62 or a monotonically increasing function thereof. The fact that this distance corresponds to the accuracy of conversion is clear from the fact that the mapping to the space F can approximate the input more accurately as the distance is shorter.
[0042]
In the foregoing, the manner in which the image normalization unit 20 normalizes the image data has been described. From here, the pattern identification unit 100 that identifies a face pattern from the image data output by the image normalization unit 20 will be described.
[0043]
FIG. 8 is a block diagram showing the configuration of the pattern identification unit 100. The pattern identification unit 100 includes a recognition coarse-graining unit 106 having a plurality of spatial resolutions, a face identification unit 102 connected to each identification coarse-graining unit 106, and a counterexample detection unit 104. Similar to the normalization coarse-graining unit 32 in the image normalization unit 20, the identification coarse-graining unit 106 plays a role of applying a spatial filter to image data and outputting coarse-grained data. The resolution can be freely set. Here, the lowest dimension is 25 dimensions and the highest dimension is 100 dimensions, and a plurality of identification coarse-graining units 106 are provided between them. The face identification unit 102 is provided in each identification coarse-graining unit 106 and identifies a face pattern.
[0044]
The input image data is first input to the identification coarse-grain unit 106a having the lowest spatial resolution, and is converted into 25-dimensional coarse-grained data in the illustrated example. The coarse-grained data is input to the face identification unit 102a. The face identification unit 102a includes an identification subspace learning unit 108a, an identification subspace projection unit 110a, and an identification determination unit 112a, and is very similar to the partial conversion detection unit 26 described in the image normalization unit 20. Perform the action. That is, the input coarse-grained data is mapped to the space F as shown in Expression (1) by nonlinear transformation defined by a kernel function. In this space F, learning is performed in advance by the identification subspace learning unit 108a, and a subspace Ω characterizing the face pattern of the learning sample is stretched. The identification subspace projection unit 110a projects the mapping vector mapped to the space F onto the subspace Ω. As a result, the coefficient α of the linear combination of the projected projection vector _i And the length E of the projection perpendicular is obtained from equation (5). The determination unit for identification 112a evaluates the positional relationship between the two, that is, the magnitude of E using an appropriate threshold or the like, and determines whether to include this data in the subspace Ω category. The threshold value may be determined based on the correct answer rate for appropriate sample data. As a result of the determination, if it is determined that there is a high possibility that a face pattern is included, the identification coarse-graining unit 106 and the corresponding face identifying unit 102 having the next lowest resolution are executed.
[0045]
The counterexample detection unit 104 has the same configuration as the face identification unit 102, and includes a counterexample subspace learning unit 114, a subspace projection unit 116, and a counterexample determination unit 118. The difference from the face identification unit 102 is that a pattern other than a face is learned by the counterexample subspace learning unit 114. That is, a subspace Ω that includes a pattern other than a face is formed using a learning sample including a pattern other than a face. The point that the subspace projection unit 116 projects the mapping of the nonlinear transformation to the subspace Ω is the same as the point that the counterexample determination unit 118 classifies based on the positional relationship between the mapping vector and the projection vector.
[0046]
The learning method of the face identification unit 102 and the counterexample detection unit 104 is the same as the method described in the image normalization unit. That is, the face identification unit 102 prepares a plurality of learning samples of the face pattern to be identified, and updates the pre-image corresponding to the subspace basis vectors and modifies the kernel based on the learning samples. The pattern identifying unit 100 normally performs pattern recognition on the image data normalized by the image normalizing unit 20. Therefore, since it can be expected that the face pattern is normalized, it is only necessary to use a learning sample that is normalized with respect to size, rotation angle, shift, and the like. A learning sample other than the face pattern may be used as the learning sample of the counterexample detection unit 104. However, since it is effective to learn what is difficult to identify by the face identifying unit 102 in general, it is preferable to learn mainly from a confusing thing similar to a normalized face pattern.
[0047]
FIG. 10 shows the result of the identification performed here as a test. On the left side, detection is performed only on data coarse-grained in 50 dimensions without using the present invention. On the other hand, the right side shows the result of hierarchical detection using the face identification unit 102 at three resolutions of 25 dimensions, 50 dimensions, and 100 dimensions when this embodiment is used. However, the counterexample detection unit 104 is not included. The used image data includes a plurality of faces in one image data, and a face pattern is detected from them. Both can detect faces with a probability of 90%. The horizontal axis is the number of patterns other than the face that are erroneously found in one image data, and the vertical axis indicates the ratio. In the conventional method, the ratio without error was 63%, and the ratio with only one error was 22%. In the present invention, this value is 82 percent and 14 percent, respectively. As a result, the false detection rate per image is improved from 0.40 to 0.24. Of course, detection at 100-dimensional high resolution requires a lot of calculation time, but in this embodiment, when it is determined that the possibility of including a face pattern is low at 25-dimensional resolution, more than that is required. Since detection at the resolution is not performed, efficient and highly accurate detection can be achieved without requiring unnecessary calculation time. Although not shown in the figure, in this experiment, when the counter example detection unit 104 that detects that a face pattern is not included is included in each resolution, the false detection rate is almost zero, and its effectiveness is confirmed. is made of.
[0048]
Finally, characteristic points in the present embodiment are listed. By the image normalization unit 20 of the present embodiment, normalization of the face pattern in the input image data can be realized only by learning based on a very small number of learning samples. Also, normalize by classifying into rotation, enlargement and reduction, parallel movement, etc. If Therefore, it is sufficient to perform learning using only the corresponding learning sample, and very efficient learning is possible. Normalization is performed using a neural network. If In addition, normalization can be easily performed for a pattern distribution having nonlinearity. Also, normalization is performed using a nonlinear transformation defined by the kernel function. If In addition, normalization can be performed with high accuracy even for non-linear pattern distributions. In addition, normalization is performed using a parallel computer. If This makes it possible to perform normalization quickly.
[0049]
The pattern identifying unit 100 according to the present embodiment can identify a feature of a face pattern having nonlinearity with high accuracy using nonlinear transformation. In addition, since the hierarchical determination from the low resolution to the high resolution is performed, what can be easily determined that the face pattern is not included can be quickly identified without taking time. Also used in combination with means to detect counterexamples If you do The accuracy of judgment is improved. Also, pattern identification is performed using a nonlinear transformation defined by the kernel function. in case of Highly reliable identification is possible. In addition, a subspace representing a category can be constructed very quickly. In addition, the basis vectors in the subspace can be efficiently re-established using the learning sample. In addition, since the kernel function can be easily deformed using the learning sample, it is possible to easily improve the pattern identification. In addition, using a parallel computer In case The pattern identification at each resolution can be calculated efficiently.
[0050]
The image normalization unit 20 and the pattern identification unit 100 complement each other so that the facial pattern can be identified with very high accuracy and high speed.
[Brief description of the drawings]
FIG. 1 is a schematic diagram showing a configuration of a computer according to the present embodiment.
FIG. 2 is a block diagram illustrating an outline of an image normalization unit and a pattern identification unit.
FIG. 3 is a block diagram illustrating details of an image normalization unit.
FIG. 4 is a schematic diagram showing a state of nonlinear conversion.
FIG. 5 is a flowchart illustrating a processing procedure of an image normalization unit.
FIG. 6 is a diagram illustrating test results of an image normalization unit.
FIG. 7 is a schematic diagram of an auto encoder used in an image normalization unit.
FIG. 8 is a block diagram illustrating an outline of a pattern identification unit.
FIG. 9 is a flowchart illustrating a processing procedure of a pattern identification unit.
FIG. 10 is a diagram illustrating a test result of a pattern identification unit.
[Explanation of symbols]
20 image normalization unit, 26 partial conversion detection unit, 28 normalization processing unit, 100 pattern identification unit, 102 face identification unit, 104 counterexample detection unit.

Claims

非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化手段と、
正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化手段と、
前記解像度毎に設けられた顔識別手段であって、いずれかの解像度の粗視データを、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリと比較し、所定の位置関係にある場合には、前記粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する顔識別手段と、
解像度の一番低い顔識別手段を起点として、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い顔識別手段が処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別手段と、
階層的顔パターン識別手段の処理の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出手段と、
を備え、
反例検出手段は、顔パターン以外を含む学習サンプルを用いた学習により、顔識別手段と同様にして、顔パターン以外が含まれている可能性が高いと検出することで、顔パターンが含まれないことを検出することを特徴とする画像識別装置。Normalization means for normalizing the face pattern included in the image data using non-linear transformation, and normalizing the image data;
Coarse-graining means for generating a plurality of coarse-grained data by applying a spatial filter having a different spatial resolution to the normalized image data;
A face identification unit provided for each resolution, wherein the coarse-grained data of any resolution is constructed in a space defined by the nonlinear transformation by a normalized face pattern learning sample having the resolution. The face identification means for determining that there is a high possibility that the face data represented by the category is included in the image data corresponding to the coarse-grained data,
Starting with the lowest-resolution face identification means as a starting point, when it is determined by the face identification means of a certain resolution that a face pattern is likely to be included, the face identification means with the next lowest resolution executes the process. A hierarchical face pattern identifying means for identifying a hierarchical face pattern using a plurality of resolutions;
Counter-example detecting means for detecting that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the processing of the hierarchical face pattern identifying means;
With
The counterexample detection unit detects that there is a high possibility that a non-face pattern is included in the same manner as the face identification unit by learning using a learning sample including a non-face pattern, so that the face pattern is not included. An image identification device characterized by detecting this .

請求項１記載の画像識別装置であって、
非線形変換はカーネル関数を用いた計算手段によって与えられることを特徴とする画像識別装置。An image identification apparatus according to claim 1 Symbol placement,
A non-linear transformation is given by a calculation means using a kernel function.

請求項２記載の画像識別装置であって、
カテゴリは、複数の学習データを非線形変換して得られたベクトルの組を基底ベクトルとする部分空間によって構築されることを特徴とする画像識別装置。The image identification device according to claim 2 ,
The category is constructed by a partial space in which a set of vectors obtained by nonlinear transformation of a plurality of learning data is used as a base vector.

請求項３記載の画像識別装置であって、
部分空間を張る基底ベクトルは、新たな学習サンプルの非線形変換による写像が与えられた時に、この写像とそれまでに生成されている部分空間との関連性が高くなるように更新されることを特徴とする画像識別装置。The image identification device according to claim 3 ,
The base vector that extends the subspace is updated so that the mapping between the new learning sample and the subspace generated so far becomes high when a mapping is obtained by nonlinear transformation of the new learning sample. An image identification device.

請求項３記載の画像識別装置であって、
非線形変換を定義するカーネル関数は、学習サンプルの非線形変換による写像と、部分空間との関連性に応じて変形されることを特徴とする画像識別装置。The image identification device according to claim 3 ,
An image identification apparatus, wherein a kernel function that defines a nonlinear transformation is deformed according to a relationship between a mapping of a learning sample by a nonlinear transformation and a partial space.

請求項１乃至５のいずれか１記載の画像識別装置であって、
各解像度に対応した顔検出手段は、装置内に設けられた並列演算装置により並列的に処理されることを特徴とする画像識別装置。The image identification device according to any one of claims 1 to 5 ,
An image identification apparatus characterized in that face detection means corresponding to each resolution are processed in parallel by a parallel arithmetic unit provided in the apparatus.

コンピュータが備える下記各手段が実行する方法であって、
正規化手段が、非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化工程と、
粗視化手段が、正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化工程と、
記憶手段が、前記解像度毎に、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリを記憶する記憶工程と、
評価手段が、いずれかの解像度の粗視データを、前記記憶手段に記憶されたその解像度のカテゴリと比較し、前記非線形変換により定められる空間上での位置関係を評価する評価工程と、
判断手段が、前記評価手段により両者が所定の位置関係にあると評価された場合には、その粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する判断工程と、
階層的顔パターン識別手段が、解像度の一番低い前記粗視データを起点として、前記評価手段及び前記判断手段による処理を実行し、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い前記粗視データに対する処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別工程と、
反例検出手段が、階層的顔パターン識別工程の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出工程と、
を含み、
反例検出工程においては、顔パターン以外を含む学習サンプルを用いた学習により、評価工程及び判断工程と同様にして、顔パターン以外が含まれている可能性が高いと検出することで顔パターンが含まれないことを検出することを特徴とする画像識別方法。A method executed by each of the following means included in the computer,
Normalization means performs normalization conversion on the face pattern included in the image data using non-linear conversion, and normalizes the image data; and
The coarse-graining means performs a coarse-graining step of generating a plurality of coarse-grained data by applying spatial filters having different spatial resolutions to the normalized image data;
A storage step for storing, for each resolution, a category constructed in a space defined by the non-linear transformation by a learning sample of a normalized face pattern having the resolution;
Evaluation means, either the resolution of the coarse-grained data, an evaluation step of comparison with the resolution of the category stored in the storage means, for evaluating the positional relationship in the space defined by said non-linear conversion,
When the judging means evaluates that the both are in a predetermined positional relationship by the evaluating means, it is highly likely that the face pattern represented by the category is included in the image data corresponding to the coarse-grained data. A judging process for judging;
There is a possibility that the hierarchical face pattern identification means executes processing by the evaluation means and the judgment means starting from the coarse-grained data having the lowest resolution, and a face pattern is included by the face identification means of a certain resolution A hierarchical face pattern identification step of identifying a hierarchical face pattern using a plurality of resolutions by executing processing on the coarse-grained data having the next lowest resolution when it is determined that
A counterexample detection step in which the counterexample detection means detects that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the hierarchical face pattern identification step;
Including
In the counterexample detection step, the face pattern is included by detecting that there is a high possibility that other than the face pattern is included by learning using the learning sample including other than the face pattern in the same manner as the evaluation step and the determination step. An image identification method characterized by detecting failure.

非線形変換を利用して画像データに含まれる顔パターンに対する正規化変換を行い、前記画像データを正規化する正規化手段と、
正規化した前記画像データに空間解像度の異なる空間フィルタを施して複数の粗視データを生成する粗視化手段と、
前記解像度毎に、その解像度をもつ正規化された顔パターンの学習サンプルにより前記非線形変換で定められる空間に構築されたカテゴリを記憶する記憶手段と、
いずれかの解像度をもつ粗視データを、前記記憶手段に記憶されたその解像度のカテゴリと比較し、前記非線形変換により定められる空間上での位置関係を評価する評価手段と、
前記評価手段により両者が所定の位置関係にあると評価された場合には、その粗視データに対応する画像データに前記カテゴリの表す顔パターンが含まれている可能性が高いと判断する判断手段と、
解像度の一番低い前記粗視データを起点として、前記評価手段及び前記判断手段による処理を実行し、ある解像度の顔識別手段によって顔パターンが含まれている可能性が高いと判断されたときは次に解像度の低い前記粗視データに対する処理を実行することで、複数解像度を利用して階層的な顔パターンの識別を行う階層的顔パターン識別手段と、
階層的顔パターン識別手段の処理の前または後ろにおいて、画像データまたは所定の解像度の粗視データに対し、顔パターンが含まれないことを検出する反例検出手段、
としてコンピュータを機能させ、
反例検出手段は、顔パターン以外を含む学習サンプルを用いた学習により、顔識別手段と同様にして、顔パターン以外が含まれている可能性が高いと検出することで顔パターンが含まれないことを検出することを特徴とする画像識別プログラム。Normalization means for normalizing the face pattern included in the image data using non-linear transformation, and normalizing the image data;
Coarse-graining means for generating a plurality of coarse-grained data by applying a spatial filter having a different spatial resolution to the normalized image data;
Storage means for storing, for each resolution, a category constructed in a space defined by the non-linear transformation by a learning sample of a normalized face pattern having the resolution;
The coarse-grained data having any resolution is compared with the category of the resolution stored in the storage means, and the evaluation means for evaluating the positional relationship on the space defined by the nonlinear transformation,
Judgment means for judging that there is a high possibility that the face pattern represented by the category is included in the image data corresponding to the coarse-grained data when the evaluation means evaluates that both are in a predetermined positional relationship. When,
When the coarse-grained data having the lowest resolution is used as a starting point, the processing by the evaluation unit and the determination unit is executed, and when it is determined by the face identification unit of a certain resolution that a face pattern is likely to be included Next, a hierarchical face pattern identifying means for identifying a hierarchical face pattern using a plurality of resolutions by executing processing on the coarse-grained data having a low resolution ;
Counter-example detecting means for detecting that a face pattern is not included in image data or coarse-grained data of a predetermined resolution before or after the processing of the hierarchical face pattern identifying means;
Function as a computer
The counterexample detection means does not include a face pattern by detecting that there is a high possibility that a non-face pattern is included in the same manner as the face identification means by learning using a learning sample including other than the face pattern. An image identification program characterized by detecting the above .