JP3819236B2

JP3819236B2 - Pattern recognition method and computer-readable storage medium storing program for performing pattern recognition

Info

Publication number: JP3819236B2
Application number: JP2000378612A
Authority: JP
Inventors: 修山口; 和広福井
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2000-12-13
Filing date: 2000-12-13
Publication date: 2006-09-06
Anticipated expiration: 2020-12-13
Also published as: JP2002183732A

Description

【０００１】
【発明の属する技術分野】
本発明は、パターン認識方法及びパターン認識を行わせるプログラムを記憶したコンピュータ読み取り可能な記憶媒体に関する。
【０００２】
【従来の技術】
複数の画像間の互いの位置合わせや、画像中から特定の物体を検出する際に用いられる方法は、画像マッチングと呼ばれている。
【０００３】
この画像マッチングには、例えば、ブラウン(L.G.Brown)の"A SURVEY OF IMAGE REGISTRATION TECHNIQUES", acm computing surveys, Vol.24,No.4,pp,325-376: (白井(訳)「画像の位置合わせ手法の概観」,コンピュータサイエンス acm computing surveys'92 bit 別冊,pp.77-120)に示されたように、複数の画像の画素ごとの類似性の測度を定義して、この画像間中でその測度が最も高い場所を決定する方法がある。
【０００４】
画素ごとの濃淡情報を用いた画像マッチング方法には、類似性測度として、SSD(Sum of Square Difference)、SAD(Sum of Absolute Difference)や相互相関係数といった統計的アプローチが用いられていた。
【０００５】
例えば、相互相関係数Cの場合には、類似性測度を以下の式(1)のように表す。
【数１】

ただし、I(k,l)は、画素(k,l)の位置の画素値であり、m_１,m_２は、I_１,I_２の画素の平均値であり、σ_１,σ_２はI_１,I_２の画素の分散値である。
【０００６】
このような相互相関係数Cを用いて、画像中からあるモデル画像(以下テンプレート画像と称する)の位置を検出するテンプレートマッチングでは、テンプレート画像の平行移動、回転といった幾何学的変換のパラメータの増加に伴って検出のコストが大きくなった。
【０００７】
また、画素の明るさの変化を制御するために、ヒストグラム平坦化処理やエッジ強調フィルタ、エッジ抽出処理といった類似性測度を計算する前の前処理となる画像処理が必要となる場合が多かった。
【０００８】
さらに、これらの前処理は、使用する類似性測度(SSD、SAD、相互相関係数)やテンプレートが有する特徴との相性の問題があった。
【０００９】
例えば、SSDにおいて、濃淡情報を直接用いて類似性測度を定義する場合には、画素の濃淡値の大きさがそのまま測度に影響するため、以下に示す性質があった。
【００１０】
図19(a)〜(e)は濃淡値を類似性測度に用いた場合のテンプレートマッチングを説明する図であり、(a)は対象画像、(b)は(a)の一部に光が当たった画像、(c)は(b)をエッジ処理した画像、(d)は(a)とは異なる撮像された画像、(e)は(d)から抽出領域(手)を抽出した画像である。
【００１１】
(a)は説明のため濃淡値が1種類で表示した対象物の画像である(領域1)。(b)は対象物の右上から光が照射された時の画像であり、この光が当たった領域を領域2とし、この領域2の濃淡値は領域1と異なる。よって、濃淡値の違いが類似度に影響を及ぼす。(c)は(b)をエッジ処理した画像であり、領域1と2の境界の画素だけが影響を受けるため、(b)に比べて類似度への影響は小さいものの、エッジだけの情報を用いてテンプレートマッチングを行うことは汎用的ではなかった。
【００１２】
また、(d)のような複雑な背景を持つ画像からテンプレート(手)を抽出した後の(e)は、背景画像が複雑なため、この背景部分の濃淡値がテンプレートマッチングに大きな影響を及ぼす原因になった。
【００１３】
したがって、テンプレートマッチングを行うために、濃淡値を直接類似性測度に用いた場合には、上述したような濃淡値の変化のため、検出精度を一定以上に保っていなかった。
【００１４】
次に、濃淡値をそのまま類似性測度に用いないテンプレートマッチングには、ヴノー等（A.Venot, J.F.Lebruchec, J.C.Roucayrol）の"A New Class of Similarity Measures for Robust Image Registration ": Computer Vision, Graphics, Image Processing,28,pp.176-184(1984)がある。これは、同じ撮像条件で撮像された2つの画像を重ねて、それぞれの対応する画素の画素値(濃淡値)の差を求め、この差が負に変化した(符号が変化した)数を特徴量(類似性測度)としてマッチングを行った方法である。
【００１５】
例えば、相互相関係数を用いた相関法では、差のある位置における画素の「濃淡値」が類似性測度に影響をおよぼすが、この符号の変化は、差のある位置の「個数」に関連するために、濃淡値の変動に対して影響が少ない。
【００１６】
しかしながら、この方法は、医用画像処理に適用されているが、試行のたびに、相関法と同様に、2つの画像の特徴量の計算が必要となるため、検出コストが大きくなるといった問題があった。
【００１７】
次に、画素間の大小関係を類似性測度として用いた手法に、リプトン等の（P.Lipson, E.Grimson, P.Sinha）"Configuration Based Scene Classification and Image Indexing": CVPR'97 pp1007-1013と、シンハ（P.Sinha）の"Object Recognition via Image Invariants: A Case Study": Investigative Ophthalmology Visual Science, vol.35,pp.1735-1740,May 1994がある。
【００１８】
P.Lipson等は、類似画像の検索を目的として、画像をいくつかのブロックに分け、そのブロック間の画像特徴量の大小関係に着目した検索を行い、画像の構造化を行っている。
【００１９】
この構造化のもとになったシンハの文献は、画像中から例えば顔領域を検出するために、画素間の大小関係を抽出し、その抽出結果をテンプレートとする方法が記載されてある。このシンハの方法は、記憶された大量の画像から定性的な関係を抽出して、この抽出結果をテンプレートにして保持するという方法であるが、動画像等の画像が時系列的に複数個並んだ画像列に対して、あるテンプレートの対象領域の追跡を行う方法（トラッキング）では、テンプレートを逐次更新する必要があり、更新のための検出コストが大きくなるといった問題があった。
【００２０】
次に、画素間の濃淡値の増分符号を類似性測度とするテンプレートマッチングに、“増分符号相関による自然画像照合":金子俊一、村瀬一郎、福島孝明、五十嵐悟:電気学会研究会資料IIS-98-58,pp.31-35,1998がある。金子等は、画素間の濃淡値の増分符号に着目して、その符号の数を類似性測度にする考え方を導入している。しかしながら、2つの画像の類似性の判断を行う時、画素間の濃淡値が同一であって、濃淡値の符号が等号となるような場合には、濃淡値が増加した場合と同様に扱われているために、精密なマッチング、画像を分類するといった観点からはマッチングの精度が低くなるといった問題点があった。
【００２１】
【発明が解決しようとする課題】
上述したように、従来のマッチングは、前処理や特徴量等の照合数の増加にともなう計算コストが大きく、また撮像された画像が同一領域であっても、撮像される環境の変化によって濃淡値に変動がおきた場合、この変動に対応ができず検出精度を低下させるといった問題があった。
【００２２】
そこで本発明は上記従来の問題点に鑑みてなされたもので、検出精度を向上させた場合であっても計算コストを大幅に増加させることなく、また、画像に変動が生じた場合に、この変動の影響を受けにくい類似性測度を用いて、テンプレートマッチングの検出精度を向上させるパターン認識方法及びパターン認識を行わせるプログラムを記憶したコンピュータ読み取り可能な記憶媒体の提供を目的とする。
【００２３】
【課題を解決するための手段】
上記目的を達成するために本発明のパターン認識方法は、列方向及び行方向にそれぞれ複数の画素を配列した第1の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した基準3値比較画像を求める工程と、列方向及び行方向にそれぞれ複数の画素を配列した第2の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した対象3値比較画像を求める工程と、前記基準3値比較画像と前記対象3値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求める工程と、前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定する工程とを有する。
【００２４】
また、本発明のパターン認識方法は、各階層の１画素の画素値が１段階下位の階層の２×２画素の画素値を用いて求められ、前記１段階下位の階層の２×２画素が前記各階層の各画素を列方向および行方向にそれぞれ２分割した位置に配置されている、複数階層の階層構造によって、対象画像を表現し、前記対象画像の階層構造と同様の複数階層の階層構造によって、基準画像を表現し、前記対象画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した対象3値比較画像を求め、前記基準画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した基準3値比較画像を求め、前記対象画像の、前記階層中の少なくとも1つの前記対象3値比較画像と、この対象3値比較画像に対応する前記基準3値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求め、前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定することを特徴とする。
【００２５】
また、本発明の記憶媒体は、パターン認識を行うプログラムをコンピュータ読み取り可能なように記憶させた記憶媒体であって、列方向及び行方向にそれぞれ複数の画素を配列した第1の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較させ、その大小関係および同値関係を3種類の符号で表した基準3値比較画像を求めさせ、列方向及び行方向にそれぞれ複数の画素を配列した第2の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較させ、その大小関係および同値関係を3種類の符号で表した対象3値比較画像を求めさせ、前記基準3値比較画像と前記対象3値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求めさせ、前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定させることを特徴とする。
【００２６】
また、本発明の記憶媒体は、パターン認識を行うプログラムをコンピュータ読み取り可能なように記憶させた記憶媒体であって、各階層の１画素の画素値が１段階下位の階層の２×２画素の画素値を用いて求められ、前記１段階下位の階層の２×２画素が前記各階層の各画素を列方向および行方向にそれぞれ２分割した位置に配置されている、複数階層の階層構造によって、対象画像を表現させ、前記対象画像の階層構造と同様の複数階層の階層構造によって、基準画像を表現させ、前記対象画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較させ、その大小関係および同値関係を3種類の符号で表した対象3値比較画像を求めさせ、前記基準画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較させ、その大小関係および同値関係を3種類の符号で表した基準3値比較画像を求めさせ、前記対象画像の、前記階層中の少なくとも1つの前記対象3値比較画像と、この対象3値比較画像に対応する前記基準3値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求めさせ、前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定させることを特徴とする。
【００２９】
このような構成によれば、隣接する画素間の濃淡値の大小関係、同一関係といった画像の定性的な関係を、画像の特徴量とすることで、計算コストの低減と、より精度の高いマッチングを行うことができる。
【００３０】
【発明の実施の形態】
以下、本発明の実施の形態について図面を参照して説明する。
【００３１】
図1乃至図10は第1の実施の形態を示すものである。
【００３２】
図1(a)〜(d)はテンプレートマッチングの動作を説明するものであり、(a)は対象画像の平面図、(b)はテンプレート画像の平面図、(c)はテンプレート画像を対象画像中で移動させることを説明する平面図、(d)は対象画像中でテンプレート画像を検出した平面図である。
【００３３】
パターン認識法なるテンプレートマッチングとは、s×t行列の画素からなる対象画像中から、v×w行列の画素からなるテンプレート画像が略合致する位置(パターン領域)を検出することである。なお、説明のために、対象画像の面積は、テンプレート画像の面積よりも大きいとする。また、各画像は、2値以上の多値濃淡で表され、それぞれの画像の濃度分解能は同一である。
【００３４】
例えば、(a)の対象画像は地形図であり、(b)のテンプレート画像は池である。より詳しくは、テンプレート画像(太枠)を対象画像全体に、(c)の矢印のように移動させてそれぞれの位置での類似性を比較し、最も類似性の高い位置がテンプレート画像であるとして、(d)のように対象画像中の一部分をテンプレート画像として検出することがテンプレートマッチングである。ここで、図中左右方向が行方向であり、図中上下方向が列方向である。
【００３５】
(c)の時、テンプレート画像の、対象画像中の各位置での類似性を判断するための測度として、以下の特徴量を用いる。
【００３６】
まず、m×nの画素からなる画像Iに対して、画像I内の位置(x，y)の画素を画素値I(x，y)とする。ただし、1≦x≦m，1≦y≦nの関係を有し、濃度分解能は、2ビットであり、0〜3の4段階で表された数値とする。
【００３７】
図2は、4段階の濃度分解能を有する4×4画素の画像I_Sである。
【００３８】
この任意の画像I_Sに対して、隣接する画素間の画素値から以下の2つの3値画像Ｑ_ｈ，Ｑ_ｖを特徴量として求める。
【００３９】
これによって、列方向及び行方向に隣接する画素間の画素値の大小関係、同値関係を調べる。
【数２】

図3(a)は、画像I_Sから求められたＱ_ｈの画像（行方向の基準もしくは対象3値比較画素マトリクス）であり、(b)はＱ_ｖの画像（列方向の基準もしくは対象3値比較画素マトリクス）である。(a)では、画像I_Sの最左上の画素00とその右の画素01との関係は、それぞれの濃度が2と4であり、計算した結果Ｑ_Vは「-2」となるため、記号「＜」が表示される。同様に(b)では、画像00とその下の画素10との関係は、それぞれの濃度が2と3であり、計算した結果Ｑ_hは「-1」となるため、記号「＜」が表示される。
【００４０】
以降、画像I_S全体で、同様の計算を行う。この結果、Ｑ_ｖは(m-1)×n画素の画像つまり(3×4)画素の3値画像が得られ、Ｑ_ｈはm×(n-1)画素の画像つまり(4×3)画素の3値画像が得られる。
【００４１】
この得られた2つの3値画像を以下のように表し、以下QT（Qualitative Trinary Representation）表現と称する。
QT(I)=(Ｑ_ｖ，Ｑ_ｈ)
また、各画素値pについて(ap+b)倍された画像(aI+b)については、
QT(I)=QT(aI+b)
なる関係が成立する。
【００４２】
画像の各画素値が(ap+b)倍されたとしても、各画素間の大小関係は変化しないため、同じQT表現が得られる。これは、画像全体で行うノルムの正規化、もしくは平均と分散とを用いた正規化によっても特徴量(QT表現)が変化することはないことをあらわす。
【００４３】
また、3値画像を求める前にヒストグラム平坦化を予め行った画像HistNorm(I)について、
QT(I)=QT(HistNorm(I))
なる関係が成立する。
【００４４】
次に、QT表現の間の類似度について図4を参照して説明する。
【００４５】
図4(a)は、各画像とその画像から得られた3値画像とを表すQT表現の説明図であり、(b)は各画像の3値画像を比較した結果を示すQT表現の説明図である。
【００４６】
(a)左側の画像1は対象画像であり、画像1の下には画像1のQT表現（左側が行方向の基準3値比較画像、右側が列方向の基準3値比較画像）が表示され、右側の画像2はテンプレート画像であり、画像2の下には画像2のQT表現（左側が行方向の対象3値比較画像、右側が列方向の対象3値比較画像）が表示される。画像1のQT表現をQT(I₁)とし、画像2のQT表現をQT(I₂)とする。
【００４７】
各QT表現は、以下のように表すことができる。
QT(I₁)=(Q_v1，Q_h1)
QT(I₂)=(Q_v2，Q_h2)
このQT(I₁)の(Q_v1，Q_h1)とQT(I₂)の(Q_v2，Q_h2)との各成分を比較し、それぞれ対応する位置での符号が同一である画素の個数D_v、D_h（個数A、個数B）を以下のように求めていく。
【数３】

それぞれの3値画像の対応する各画素の符号が同じ場合を1に、それ以外の場合を0として、すべての画素について計算し、同じ符号の画素の個数を求める。
【００４８】
なお、得られた個数の最大値は、対象画像とテンプレート画像が同一である画像を比較したときであるから、
D_v(I,I)＝n(m-1)
D_h(I,I)＝m(n-1)
となる。
【００４９】
このようにして、(a)から各画素の符号が同一であるものの個数を計算する。(b)では同一の符号を有する画素にのみ網掛けをして表示している。この例では、以下の様な数が求められる。
D_v(I₁,I₂)＝1
D_h(I₁,I₂)＝6
次に、画像1と画像2との類似度QTSを求める。QTSは以下に示す式であり、その範囲は0.0〜1.0である。
【数４】

上述したm、n、D_v、D_hの数値をそれぞれ代入すると、類似度QTS＝0.29が求められる。
【００５０】
また、図5に示すようなQT表現を有した画像の類似度について説明する。
【００５１】
画像1、2、3は4×4画素からなる互いに異なる画像であり、これら画像1、2、3の濃度レベルは4段階とする。
【００５２】
上述したような方法により、画像1，2，3のQT表現と、そのQT表現における同一符号の個数を求め、その結果から類似度を計算する。
【００５３】
なお、画像2，3における同一符号の個数は、
D_v(I₂,I₃)＝12
D_h(I₂,I₃)＝8
であり、そのときの類似度QTSは0.83である。この結果、画像2に対して、画像3の方が画像1よりも類似度が高いと判断される。
【００５４】
なお、テンプレート画像や対象画像の画素数が同じであれば、類似度QTSの分母は共通となるため、分子だけを比較して類似度QTSを求めても良い。
【００５５】
このようにQT表現を用いて類似度を求める場合、乗算を用いることなく比較演算、及び和演算のみで類似度を得るため、従来の方法で同精度を得る場合に比べて、計算コストを低減できる。また、統計学的な平均、分散を用いないため、マッチングの際のテンプレート画像の移動を複数回行うことなく一回で済むことも計算コストの低減に寄与する。
【００５６】
また、図6(a)はQT表現と相互関数との関係を示すグラフであり、(b)は複数の対象画像である。
【００５７】
グラフ中の各点は、(b)に示すような複数の対象画像とある一つのテンプレート画像との相関値もしくはQT表現の結果(QTS)を表している。
【００５８】
グラフ中、点線で囲まれた範囲、つまり相互関数では略同一であると判断された画像であっても、QT表現でみれば類似度が異なっていることが分かる。この時の対象画像は、(b)に示すような様々は形状の画像が含まれていた。
【００５９】
また、従来、金子等による画素間の濃淡値の増分符号を類似性測度とする画像照合方法もあったが、この方法は、濃淡値が同じであることを示す等号が増加符号と同じ定義にしており、更に水平方向の比較しか行っていない。つまり、画像照合は、大小関係だけで表現した2値表現を用いて行っており、また、水平方向のみの計算結果から画像照合の結果を出力している。
【００６０】
このような金子等の従来の画像照合方法と本発明との差異について、図7(a)の従来の画像照合の説明図、(b)本発明の説明図を用いて説明する。
【００６１】
(a)、(b)において、各画像は、(4×4)画素からなり濃淡値は4段階である。また、各画像間に記載された数値は、画像間の類似度を示しており、1に近いほど同一画像であることを示す。
【００６２】
ここで、(a)、(b)中の左側の画像を画像1とし、右側の画像を画像2とし、中央の画像を画像3とする。
【００６３】
(a)に示す従来では、画像1の隣接する画素から求められた符号と、画像2の隣接する画素から求められた符号とを比較した結果、対応する位置での符号が全て同一であるため類似度は1.0であり、画像1と画像2とは同一画像と判断する。また、画像1と画像3は、画素(3,1)〜(3,4)が同一の濃淡値であり、符号は等号であるが、等号符号は増加符号の扱いと同一であるため、類似度を1.0とし、その結果同一画像であるとみなす。同様に、画像2と画像3との間でも画素(2,1)〜(2,4)は同一の濃淡値であるが、類似度1.0と判断され、同一の画像であるとみなす。
【００６４】
これに対して、(b)に示す本発明では、符号が等号である画素については、大小関係とは別に定義しているため、上述したような問題がなくなり、同一な濃度値を有する画素は同一であると判断され、類似度を0.5と計算される。
【００６５】
このように大小関係、同一関係を、3値表現を使って表現することにより、従来よりも、より精度の高い類似度計算を行うことができる。
【００６６】
次に、上述したようなQT表現を用いた類似度計算の動作について、図8の電子機器のブロック図と、図9のフローチャートを参照して説明する。なお、テンプレート画像、対象画像は予め撮像カメラ等により撮像される。
【００６７】
電子機器1は、モニタ2、本体3、キーボード4、とマウス5を有している。本体2内には、デジタルカメラ等の撮像カメラで撮像された画像が記憶されたハードディスク等の記憶装置10と、ROM，RAMからなるメモリ11と、記憶装置10、メモリ11、各要素(モニタ2、キーボード4、マウス5)が接続され、演算や各要素の制御を行うCPU12からなる。
【００６８】
記憶装置10への画像の記憶は、本体2と撮像カメラを直接接続して行う方法や、インターネット等の回線を介して、キーボード4及びマウス5を操作して方法がある。画像を表示する場合には、モニタ2を使用する。
【００６９】
(1)まず、記憶装置10からテンプレート画像をメモリ11に読み取り、CPU12によってテンプレート画像のQT表現を求める(S1)。
【００７０】
(2)次に、CPU12によって記憶装置10から対象画像をメモリ11に読み取り、この対象画像を拡大した画像、縮小した複数の対象画像を生成し、記憶装置10もしくはメモリ11に記憶する(S2)。
【００７１】
(3)次に、元の対象画像、拡大及び縮小した複数の対象画像のQT表現を、CPU12によって求め、対象画像と対応させて記憶装置10もしくはメモリ11に記憶する(S3)。
【００７２】
(4)次に、テンプレート画像のQT表現の3値画像と、ある倍率の対象画像のQT表現の3値画像中のテンプレート画像と同じ大きさの任意の領域との間で、それぞれに対応する位置での同一符号の個数を求め、その個数から類似度をCPU12によって計算し、記憶装置10もしくはメモリ11に一時的に記憶する(S4)。
【００７３】
(5)次に、この対象画像のQT表現の3値画像内全てに対して、テンプレート画像のQT表現の3値画像の類似度を計算をしたか否かがCPU12によって判断される(S5)。全ての計算が終了していればS9へ進み、未終了であればS6へ進む。
【００７４】
(6)未終了である場合には、この任意の領域の対象画像中の位置を1画素分だけ行もしくは列方向方向にずらした位置での3値画像と、テンプレート画像の3値画像と、がそれぞれ対応する位置での類似度をCPU12によって計算し、記憶装置10もしくはメモリ11に一時的に記憶する(S6)。なお、類似度の計算は、任意の位置の行方向の計算を全て行った後、列方向方向に1画素ずらして、1画素ずらした位置で行方向に全ての計算を行っていく。
【００７５】
(7)次に、(6)で記憶した類似度と(4)で記憶した類似度とをCPU12によって比較する(S7)。比較された結果、(4)よりも(6)の類似度が大きければ、(8)へ進み、小さければ(5)へ進む。
【００７６】
(8)次に、類似度が大きい場合には、(4)の類似度を記憶装置10もしくはメモリ11から削除する(S8)。S5へ進む。
【００７７】
(9)ある倍率の対象画像中における全ての計算が終了した場合には、他の倍率の対象画像の3値画像と、テンプレート画像の3値画像と、がそれぞれ対応する位置での類似度を計算したか否かがCPU12によって判断される(S9)。全ての他の倍率の対象画像の計算が終わっていれば、(11)へ進み、未終了であれば(10)に進む。
【００７８】
(10)未終了である場合には、記憶装置10もしくはメモリ11に記憶される倍率の異なる対象画像をCPU12によって読み出す(S10)。(4)へ進む。
【００７９】
(11)終了している場合には、記憶装置10もしくはメモリ11に記憶された類似度(対象画面の倍率及び対象画像中の位置を含む)を最大類似度であるとみなす(S11)。この最大類似度に記憶された位置がテンプレート画像の位置であると判断する。
【００８０】
以上述べた様な第1の実施の形態では、3値表現のQT表現を用いることで変動に強い類似性測度を得、より類似度計算の精度が高く、かつこの精度の向上に伴う計算コストの増加を抑制する、テンプレート画像の検索を行うことができる。
【００８１】
なお、テンプレート画像は、階層的な構造を持っていても良い。例えば、テンプレート画像(画像に写る被対象物の構造)が複雑である場合には、全体の構造と各部分の構成とに分けて、複数のテンプレート画像から構成することができる。
【００８２】
ここで、図10(a)〜(i)はテンプレート画像を階層的な構造として有する場合の説明図である。例として、人間の顔を考える。
【００８３】
(a)に示すように、テンプレート画像を、顔全体、右目、左目、鼻、口の5つのテンプレート画像に分ける。階層構造は、上段に顔全体のテンプレート画像があり、顔全体の下段に右目、左目、鼻、口の各テンプレート画像が並列にある。各階層間では、テンプレート画像の解像度は、同じでも、異なっていても良い。
【００８４】
(b)は、各テンプレート画像の配置と探索範囲(黒の枠)を示している。一番外側の枠は、顔全体のテンプレート画像の範囲を示している。顔全体の枠の中で、右目、左目、鼻、口の探索範囲は、各テンプレート画像を囲む枠で表示されている。各テンプレート画像はこの枠内でマッチングが行われる。ここで右目と鼻の範囲、及び左目と鼻の範囲のように、その一部が重複していても良い。
【００８５】
対象画像は(c)であり、この対象画像から上述した各テンプレート画像を検出する。顔全体のテンプレート画像は(d)であり、まず、このテンプレート画像を対象画像中からQT表現を用いて検出する。検出結果は、(e)中の枠1，2である。
【００８６】
次に、枠1，2を対象画像から切り取り、これら枠の大きさを同一にする((f)参照)。これら枠1，2を顔全体のテンプレート画像とみなして、その画像の中で(b)の各テンプレート画像(右目、左目、鼻、口)とのマッチングを、QT表現を用いて行っていく。
【００８７】
(g)は、(b)と(f)とを重ね合わせた画像であり、マッチングの結果から右側の画像は類似度が低い(ない)ため、検出対象からはずす。左側の画像は、類似度が（閾値を超えた）高いため、テンプレート画像であると判断する。
【００８８】
したがって、(h)中に、枠で示すように、対象画像からテンプレート画像が検出される。
【００８９】
このように、テンプレート画像を階層的な構造として保持し、マッチングを行うことも可能である。また図10では各階層ごとにマッチングを行ったが、各階層のマッチングを並行して行い、最も各テンプレート画像の適合が良い位置がテンプレート画像の位置であると判断しても良い。例えば、図10では5つのテンプレート画像があるが、この5つのテンプレート画像の各位置での類似度の合計値をその位置の類似度とし、最も大きな類似度を有した位置がテンプレート画像の位置であるとすることもできる。これは以下で述べる第3の実施の形態にも適用できる。
【００９０】
次に、本発明の第2の実施の形態の構成について図11を参照して説明する。
【００９１】
なお、以下の各実施の形態において同一構成要素は同一符号を付し重複する説明は省略する。
【００９２】
第2の実施の形態の特徴は、QT表現に加えて、従来の定量的な類似度（例えば相関法による類似度）を併用したことである。
【００９３】
図11(a)〜(e)は第2の実施の形態の動作の説明図であり、(a)は対象画像とテンプレート画像、(b)はQT表現で対象画像中から各閾値でテンプレート画像を抽出したマッチング結果の図、(c)はQT表現と相関法それぞれのマッチング結果の図、(d)QT表現と相関法とを組み合わせて求めたマッチング結果の図、(e)対象画像中のテンプレート画像が一致した場所を示す画像である。
【００９４】
(a)に示すように、テンプレート画像はモニタであり、対象画像は、このモニタが配置された室内を撮像した画像である。
【００９５】
この対象画像中からQT表現を用いてテンプレート画像のマッチングを行った結果が、(b)である。(b)では、マッチングする際のQTSに対する閾値を4つ設定しており、t1〜t4に向かって大きくなる。したがって、最上段の閾値t1のマッチング結果では、テンプレート画像にマッチングする候補は多く、逆に最下段の閾値t4のマッチング結果では、マッチングする候補は少なくなっている。
【００９６】
この閾値の中から、例えば、t3のマッチング結果をQT表現によるマッチング結果とする((c)上段)。また、同じ対象画像とテンプレート画像とを用いて相関法によってマッチングした結果が(c)下段である。
【００９７】
このQT表現によるマッチング結果と、相関法によるマッチング結果とを、比較して、同時にマッチング結果として検出される場所を、テンプレート画像の位置とする。ただし、同時にマッチングしている場所が複数個存在する場合には、複数個の中で相関法による検出結果のうち最大の類似度を有する場所を真のテンプレート画像の位置とする。この結果、(d)中の実線が真のテンプレート画像の位置であり、破線が同時にマッチングされ検出された位置である。
【００９８】
このようにして、(e)に示すように、対象画像中の白枠で囲まれた領域がテンプレート画像であると判断する。
【００９９】
以上述べた様な第2の実施の形態では、QT表現と相関法とを組み合わせてテンプレートマッチングを行うことで、より検出精度を上げることができる。
【０１００】
また、類似度の計算コストが相関法に比べて低いQT表現を用いて、対象画像中の類似度を求めておき、続いて類似度が所望の閾値以上の場所のみ相関法を用いてマッチングを行うことで、相関法だけ用いた場合の計算コストよりも低く、かつ精度の高いマッチングを行うことができる。
【０１０１】
次に、本発明の第3の実施の形態の構成について図12,13を参照して説明する。
【０１０２】
第3の実施の形態の特徴は、動画像などの画像が複数個並んだ画像列に対して、あるテンプレート画像の対象領域の追跡（トラッキング）を行い、その動画像からテンプレート画像の位置を検出したことである。
【０１０３】
図12は、撮像カメラによって撮像された対象画像であり、図中上から下に時間軸を持つ。なお、左側は所定間隔で撮像された対象画像であり、右側はその対象画像中のテンプレート画像を白枠で示した画像である。
【０１０４】
ここで、テンプレート画像は手の領域であり、このテンプレート画像が時間と共に画像内を右から左へと移動することで、テンプレート画像が検出される位置が変わっていく。また、時間の経過と共に手の位置、形状が変わるため対象画像も変わる。
【０１０５】
図13の第3の実施の形態のフローチャートを参照して、動作を説明する。
【０１０６】
なお、テンプレート画像は、予め記憶装置10もしくはメモリ11内に記憶されている。また、対象画像は、撮像カメラ等で撮像された画像であり、この画像は所定間隔で、記憶装置10もしくはメモリ11内に送られ、記憶される。本実施の形態の対象画像は、図12に示すように5つとする。
【０１０７】
(1)まず、記憶装置10もしくはメモリ11からテンプレート画像を読み出し、CPU12によってテンプレート画像のQT表現を求める(S21)。
【０１０８】
(2)次に、記憶装置10もしくはメモリ11から対象画像(例えば図12の左側最上段の画像)を読み出す(S22)。
【０１０９】
(3)次に、読み出された対象画像の縮尺を変化させた複数の対象画像をCPU12によって求め、これら対象画像を記憶装置10もしくはメモリ11に記憶する(S23)。
【０１１０】
(4)次に、対象画像ごとにQT表現を求め、対象画像に対応させてQT表現を、記憶装置10もしくはメモリ11に記憶する(S24)。
【０１１１】
(5)次に、テンプレート画像のQT表現の3値画像と、ある倍率の対象画像のQT表現の3値画像中のテンプレート画像と同じ大きさの任意の領域との間で、それぞれに対応する位置での同一符号の個数を求め、そして類似度をCPU12によって求め、記憶装置10もしくはメモリ11に一時的に記憶する(S25)。
【０１１２】
(6)次に、この対象画像のQT表現の3値画像内全てに対して、テンプレート画像のQT表現の3値画像の類似度を計算したか否かがCPU12によって判断される(S26)。全ての計算が終了していればS30へ進み、未終了であればS27へ進む。
【０１１３】
(7)未終了である場合には、対象画像中のこの任意の領域の位置を1画素分だけ行方向もしくは列方向にずらした位置での3値画像と、テンプレート画像の3値画像と、がそれぞれ対応する位置での類似度をCPU12によって求め、記憶装置10もしくはメモリ11に一時的に記憶する(S27)。なお、計算は任意の領域の行方向を全て行った後、列方向を行う。
【０１１４】
(8)次に、(7)で記憶した類似度と(5)で記憶した類似度とをCPU12によって比較する(S28)。比較された結果、(5)よりも(7)の類似度が大きければ、(9)へ進み、小さければ(6)へ進む。
【０１１５】
(9)類似度が大きい場合には、(5)の類似度を記憶装置10もしくはメモリ11から削除する(S29)。（6）へ進む。
【０１１６】
(10)ある倍率の対象画像中における全ての計算が終了した場合には、他の倍率の対象画像の3値画像と、テンプレート画像の3値画像と、がそれぞれ対応する位置での類似度を計算したか否かがCPU12によって判断される(S30)。全ての他の倍率の対象画像の計算が終わっていれば、（12）へ進み、未終了であれば（11）に進む。
【０１１７】
(11)未終了である場合には、記憶装置10もしくはメモリ11に記憶される倍率の異なる対象画像をCPU12によって読み出す(S31)。（3）へ進む。
【０１１８】
(12)終了している場合には、記憶装置10もしくはメモリ11に記憶された類似度(対象画面の倍率及び対象画像中の位置を含む)が最大類似度であるとみなす(S32)。この最大類似度に記憶された位置がテンプレート画像の位置であると判断する。
【０１１９】
(13)次に、対象画像中で最大類似度とされた領域を、新たなテンプレート画像としてCPU12によって取り出し、記憶装置10もしくはメモリ11に記憶する。そして、この新たなテンプレート画像のQT表現を求め、テンプレート画像に対応させて記憶する(S33)。
【０１２０】
(14)次に、テンプレートマッチングすべき対象画像が存在するか否かがCPU12によって判断される(S34)。存在する場合には、（2）に進み、存在しなければ処理を終了する。
【０１２１】
なお、本実施の形態では、さらにマッチングすべき対象画像が存在するため、（2）へ進み、対象画像を図12左側の上から2番目の画像として、上述したような（2）〜（13）のマッチングを行う。この2番目の画像を処理した後は、（14）にて対象画像を図12左側の上から3番目、4番目、5番目と変更しながら同様のマッチングを行っていく。
【０１２２】
以上述べたような第3の実施の形態では、テンプレート画像が対象画像中で移動し、対象画像、テンプレート画像が時々刻々変化していく場合であっても、テンプレート画像を逐次更新し追跡することで、対象画像が動画像においてもテンプレートマッチングを行うことができる。
【０１２３】
次に、本発明の第4の実施の形態の構成について図14，15を参照して説明する。
【０１２４】
第4の実施の形態の特徴は、QT表現を4分木(Quadtree)法と組み合わせて適用させたことである。
【０１２５】
図14(a)、(b)は、4分木データ構造の説明図であり、階層は3層あり、各階層の一つの画像は、2×2画素から構成される。ノードからの枝が4つある階層構造示している。階層1が最上層であり、階層3が最下層である(図14(a)参照)。
【０１２６】
階層1の画像は、2×2画素からなる。階層1の下層である階層2の画像は、階層1の1画素を4つ分割した4×4画素からなり、この4つに分割された画素の平均濃度が、階層1の一つの画素の濃度となる。同様に、階層2の下層である階層3の画像は、階層2の1画素を4つに分割した8×8画素からなり、この4つに分割された画素の平均濃度が、階層2の一つの画素の濃度とする(図14(b)参照)。また、各階層間の関係を本実施の形態では、より下層の画素値の平均値をより上層の濃度として各階層を結び付けていたが、この下層の画素の画素値の最大と最小の平均や中央値など、関連の仕方に制限はない。
【０１２７】
図15(a)は、各階層の画像であり、(b)は各階層のQT表現であり、(c)は3値表現の大小、等号関係を所定の関係を持って数値化したものである。
【０１２８】
各階層の最小単位を2×2画素とすれば、階層1のQT表現は1組、階層2のQT表現は4組、階層3のQT表現は8組となる。1組のQT表現は4つの符号値から構成される。大小関係、等号関係の3値表現の符号値の組み合わせは3^４＝81種類あるが、そのうち実際にはありえない組み合わせを削除すると57種類となる。
【０１２９】
(b)に示される各組のQT表現を、この57種類の組み合わせを数値化した番号(0〜56)をつけると、(c)のように表せる。この数値化した番号をQT-ID番号と称する。また、各QT表現間の関係は、2×2画素からなるため4段階で表され、これは対応する画素の符号が同一である個数を用いて求められる。全ての符号が同一であれば、4として、全ての符号が異なっていれば、0とする。
【０１３０】
このように、ある対象画像を得た後、この対象画像を4分木構造で表現し、各層のQT表現をQT-ID番号(0〜4で表される類似関係を含んでいる)で表現し直す。
【０１３１】
2つの画像をそれぞれ図15（c）のような木構造で表現した際の、2つの画像の類似性の判断法について述べる。
【０１３２】
すべての階層における各ノードのQT-ID番号を用いて類似性を判断する場合、それぞれの各階層のノードの対応するID番号が同じかどうか、もしくは符号の対応する数がいくらであるかを求め、構造間の類似性を、その数によって判断する。
【０１３３】
各画像を階層的に表現することにより、上位階層では画像の大域的な性質を表現することになり、第1の実施の形態で述べたような、隣接関係に基づいただけのマッチングに加え、大域的な性質の類似性も比較することになる。
【０１３４】
また、ある階層にのみ着目し、ある階層のQT-ID番号同士の比較を行って、類似性を判断してもよい。
【０１３５】
この方法は、例えば、類似画像を検索する際に用いることができる。階層が上位であるほど抽象化された画像になるため、大まかな検索を行う際には上位の階層で行い、より精度の高い検索を行う際には下位の階層でマッチングを行う。
【０１３６】
また、画像の検索を行う際には、検索のための索引（インデックス）が必要となることがあり、このQT-ID番号を索引の情報として使用することも可能である。
【０１３７】
以上述べた様な第4の実施の形態では、計算コストをより抑制するためには上位の階層でマッチングを行い、検出精度をより高めるためには、より下位の階層でマッチングを行うことで、必要に応じてマッチングを使い分けることができる。
【０１３８】
また、各画像が階層的な構造を有することで、各画像から求められたQT表現同士のマッチングだけでなく、精緻なマッチングを行うことも可能である。
【０１３９】
次に、本発明の第5の実施の形態の構成について図16を参照して説明する。
【０１４０】
第5の実施の形態の特徴は、テンプレート画像中でマッチングの計算に関与しない画素を設定したことである。
【０１４１】
図16(a),(b)は第5の実施の形態の説明図であり、図中「φ」の画素がテンプレートマッチングを行う際に無視する部分である。
【０１４２】
(a)に示すように、無視する画素はテンプレート画像の四隅であり、この四隅の画素は、予めマッチングする画素からはずす。そして、テンプレート画像の形状を略十字形に変形する。
【０１４３】
また、(b)に示すように、テンプレート画像の中央部の複数の画素を無視する画素とする。これは、例えば、テンプレート画像がドーナツ形状であれば、空洞部のマッチングは不要であるため、予め無視する領域とすることができる。また、テンプレート画像が手であれば、指と指との隙間に当たる画素は、この無視する領域として設定することができる。
【０１４４】
このように、予め所定の画素を無視することにより、計算コストを低減し、かつ検出時間を短時間にすることができる。また、無視する領域の類似度の計算を行わないことにより、無意味な領域の影響を受けない。
【０１４５】
なお、この無視する画素を一部に含む領域の位置の指定は、任意に設定することが可能である。
【０１４６】
次に、本発明の第6の実施の形態について図17を参照して説明する。
【０１４７】
第6の実施の形態の特徴は、複数の対象画像に対してマッチングを行う際に、複数のテンプレート画像ごとに分類したことである。
【０１４８】
図17(a)〜(c)はその説明図であり、(a)は3種類のテンプレート画像であり、(b)は複数の対象画像であり、(c)は分類結果である。
【０１４９】
テンプレート画像は、(a)に示す通り、3種類あり、A、B、Cとする。ある記憶領域に複数の対象画像が混在した状態で保持されている((b)参照)。テンプレート画像、対象画像、それぞれのQT表現を求めて、各テンプレート画像を各対象画像とマッチングさせていく。各対象画像は、3つのテンプレート画像のうちいずれかを画像中に有する。マッチングを行い、その結果類似度が最大となるテンプレート画像ごとに対象画像を振り分けていき分類を行う((c)参照)。また、この後、分類された対象画像ごとに更に詳細なマッチングを行っても良い。この方法は、例えばクラスタリングの前処理として使用したり、パターン認識のためのマルチカテゴリ辞書の選択のために用いたりすることができる。
【０１５０】
なお、本発明は上記各実施の形態には限定されず、その主旨を逸脱しない範囲で種々変形して実施できることは言うまでもない。例えば、テンプレート画像は、複数の画像の平均画像を用いて作成したものであっても構わない。この場合に、平均画像は各画素の濃淡値から平均画素値を作ってQT表現を求めてもよいし、それぞれの画像のQT表現を求め、各画像の対応する画素位置の最も多い値を用いてテンプレート画像としても良い。また、変形する物体を検出するために、一つのテンプレート画像に対して、種々変形した形状を考慮したテンプレート画像を複数設定することも可能である。
【０１５１】
また、3値画像を求めるためには隣り合う画素間で計算を行っていたが、斜め方向に配置された画素間や、所定距離はなれた画素間で3値画像を求めても良い。例えば、図18のように隣接していない画素間での大小関係、同値関係を用いてもよい。
【０１５２】
また、大小関係に段階を持たせてn（ｎは自然数）段階として大小関係を表すことで、より検出精度を向上させることもできる。
【０１５３】
また、テンプレートマッチングやトラッキング（動画像中からテンプレート画像を抽出）は、テンプレート画像を対象画像中で平行移動させたり、対象画像を拡大、縮小したりしたが、対象画像を所望角度回転させた各画像をマッチングの対象としても良い。
【０１５４】
また、QT表現で用いられる画素値は、実施例では4段階でなくとも、64段階(6bit)、128段階(7bit)に設定することも可能である。また、得られる画像が256段階(8bit)の画像の場合に、濃淡方向の解像度を落として例えば64段階にし、同様のQT表現を求めるといった処理を行っても良い。256段階の場合には、大小関係が成立していた部分が、64段階に解像度が落ちたために同値関係になるといった別の側面からの情報が得られるようになる。もちろん、複数の解像度を使って多重解像度のQT表現を用いても良い。
【０１５５】
また、Quadtreeを用いた場合には、2×2画素からなる画像を一つの単位としているが、この画像単位を構成する画素数は、適宜設定可能である。
【０１５６】
また、カラー画像に対してマッチングを行うことも可能である。その場合には、カラー画像の各R,G,Bの濃淡に対して、同様の処理を行うことによってマッチングを行うことができる。また、別の色表現に変換し、この色表現から大小関係、等号関係を求めることもできる。
【０１５７】
また、対象画像としては可視光の下で撮像された画像でなく、赤外線を照射して撮像(撮影)された赤外線画像を用いることもできる。一般に、赤外線画像は、被撮像物によっては画像全体が白っぽくなり、飽和したような画像が得られる。この場合に、従来の正規化相関法を用いてマッチングを行うと、明るさに比例して相関値が大きくなるため、誤認識をおこしやすい。しかしながら、QT表現を用いたマッチングは、撮像される画像の性質(波長特有の画像の写り方)に左右されず、誤認識をせずにマッチングを行うことができる。また、コントラストの悪い超音波画像や医用画像、レーダ画像等の各種画像に対しても誤認識を低減したマッチングを行うことができる。
【０１５８】
また、QT表現を求める際に対象画像及びテンプレート画像の解像度レベルが大きい（例えば128、256段階）場合には、同一符号に任意の範囲を持たせることも可能である。例えば、256段階で濃度を表現する場合には、等号範囲を±5以内とし、比較した結果この範囲内であれば同一であるとみなして、等号とする等である。
【０１５９】
また、本発明の実施の形態における処理をコンピュータで実行可能なプログラムで実現し、このプログラムをコンピュータで読み取り可能な記憶媒体として実現することも可能である。
【０１６０】
なお、本発明における記憶媒体としては、磁気ディスク、フロッピーディスク、ハードディスク、光ディスク(CD-ROM，CD-R，DVD等)、光磁気ディスク(MO等)、半導体メモリ等、プログラムを記憶でき、かつコンピュータが読み取り可能な記憶媒体であれば、その記憶形式は何れの形態であってもよい。
【０１６１】
また、記憶媒体からコンピュータにインストールされたプログラムの指示に基づきコンピュータ上で稼動しているOS(オペレーションシステム)や、データベース管理ソフト、ネットワーク等のMW(ミドルウェア)等が本実施の形態を実現するための各処理の一部を実行してもよい。
【０１６２】
さらに、本発明における記憶媒体は、コンピュータと独立した媒体に限らず、LANやインターネット等により伝送されたプログラムをダウンロードして記憶または一時記憶した記憶媒体も含まれる。
【０１６３】
また、記憶媒体は1つに限らず、複数の媒体から本実施形態における処理が実行される場合も、本発明における記憶媒体に含まれ、媒体の構成は何れの構成であってもよい。
【０１６４】
なお、本発明におけるコンピュータは、記憶媒体に記憶されたプログラムに基づき、本実施の形態における各処理を実行するものであって、パソコン等の1つからなる装置、複数の装置がネットワーク接続されたシステム等の何れの構成であってもよい。
【０１６５】
また、本発明におけるコンピュータとは、パソコンに限らず、情報処理機器に含まれる演算処理装置、マイコン等も含み、プログラムによって本発明の機能を実現することが可能な機器、装置を総称している。
【０１６６】
【発明の効果】
以上述べた様な本発明によれば、変動に対して強い類似性測度を用いることで、計算コストを抑えつつ、検出精度をより高くすることができる。
【図面の簡単な説明】
【図１】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)は対象画像、(b)はテンプレート画像、(c)は対象画像中を移動するテンプレート画像の説明図、(d)はマッチング結果の説明図。
【図２】本発明のパターン認識方法の第1の実施の形態の対象画像の説明図。
【図３】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)は横方向のQT表現の説明図、(b)は縦方向のQT表現の説明図。
【図４】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)は各画像のQT表現の説明図、(b)は各画像から得られたQT表現を比較した結果の説明図。
【図５】本発明のパターン認識方法の第1の実施の形態の類似度の説明図。
【図６】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)はQT表現による類似度と相関法による類似度との関係を示すグラフ、(b)は複数の対象画像。
【図７】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)は従来のマッチングの説明図、(b)はQT表現を用いたマッチングの説明図。
【図８】本発明のパターン認識方法の第1の実施の形態の電子機器の構成を示すブロック図。
【図９】本発明のパターン認識方法の第1の実施の形態の動作を示すフローチャート。
【図１０】本発明のパターン認識方法の第1の実施の形態を説明するもので、(a)は階層構造を有するテンプレート画像、(b)は各テンプレート画像の配置関係と検出領域の説明図、(c)は対象画像、(d)はテンプレート画像、(e)は対象画像中から検出された複数のテンプレート画像(上位階層)の候補の位置の説明図、(f)は対象画像から切り取られたテンプレート画像の候補の説明図、(g)テンプレート画像(上位階層)の候補に下位階層のテンプレート画像をマッチングさせた時の説明図、(h)はテンプレート画像が検出された結果を表示する対象画像の説明図。
【図１１】本発明のパターン認識方法の第2の実施の形態を説明するもので、(a)は対象画像とテンプレート画像、(b)は各閾値ごとのマッチング結果の説明図、(c)はQT表現によるマッチング結果と相関法によるマッチング結果との説明図、(d)はQT表現と相関法とを用いたマッチング結果の説明図、(e)は対象画像中で検出されたテンプレート画像。
【図１２】本発明のパターン認識方法の第3の実施の形態の説明図。
【図１３】本発明のパターン認識方法の第3の実施の形態の動作を示すフローチャート。
【図１４】本発明のパターン認識方法の第5の実施の形態を説明するもので、(a)は4分木の構造の説明図、(b)は各階層における画素の説明図。
【図１５】本発明のパターン認識方法の第5の実施の形態を説明するもので、(a)は各階層の画素の説明図、(b)は各階層のQT表現の説明図、(c)は各QT表現をQT-ID番号で示した時の説明図。
【図１６】本発明のパターン認識方法の対象画像からQT表現を無視する領域を設定する時の説明図。
【図１７】本発明のパターン認識方法の第4の実施の形態を説明するもので、(a)は複数のテンプレート画像の説明図、(b)は複数の対象画像の説明図、(c)はテンプレート画像ごとに分類された対象画像の説明図。
【図１８】本発明のパターン認識方法の第1の実施の形態とは異なるQT表現の説明図。
【図１９】従来のパターン認識方法を説明するもので、(a)は対象画像の説明図、(b)は対象画像に光が差し込んだ場合の説明図、(c)は(b)の画像に背景処理を施した対象画像の説明図、(d)は(b)の画像にエッジ処理を施した対象画像の説明図、(e)は撮像カメラで撮像された実際の対象画像、(f)は(e)からテンプレート画像を検出する説明図。
【符号の説明】
1 電子機器
2 モニタ
3 本体
4 キーボード
5 マウス
10 記憶装置
11 メモリ
12 CPU[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a pattern recognition method. as well as The present invention relates to a computer-readable storage medium storing a program for performing pattern recognition.
[0002]
[Prior art]
A method used for mutual alignment between a plurality of images and for detecting a specific object from the images is called image matching.
[0003]
For this image matching, for example, Brown (LGBrown) "A SURVEY OF IMAGE REGISTRATION TECHNIQUES", acm computing surveys, Vol.24, No.4, pp, 325-376: `` Overview of the method '', computer science acm computing surveys '92 bit separate volume, pp. 77-120), a measure of similarity for each pixel of multiple images is defined and There is a way to determine the place with the highest measure.
[0004]
In the image matching method using gray level information for each pixel, statistical approaches such as SSD (Sum of Square Difference), SAD (Sum of Absolute Difference), and cross-correlation coefficients are used as similarity measures.
[0005]
For example, in the case of the cross-correlation coefficient C, the similarity measure is expressed as the following equation (1).
[Expression 1]

Where I (k, l) is the pixel value at the position of pixel (k, l), and m ₁ , m ₂ I ₁ , I ₂ Is the average value of the pixels of σ ₁ , σ ₂ Is I ₁ , I ₂ Is the variance value of the pixels.
[0006]
In template matching that detects the position of a model image (hereinafter referred to as a template image) from the image using such a cross-correlation coefficient C, the geometric transformation parameters such as translation and rotation of the template image are increased. As a result, the cost of detection increased.
[0007]
In addition, in order to control the change in the brightness of a pixel, image processing that is preprocessing before calculating a similarity measure such as histogram flattening processing, edge enhancement filter, and edge extraction processing is often required.
[0008]
Furthermore, these pre-processing have problems of compatibility with the similarity measures used (SSD, SAD, cross-correlation coefficient) and the features of the template.
[0009]
For example, in the case of defining a similarity measure by directly using shading information in SSD, the size of the shading value of a pixel directly affects the measure, and thus has the following characteristics.
[0010]
19 (a) to ( e ) Is a diagram for explaining template matching when the gray value is used for the similarity measure, (a) is the target image, (b) is an image in which a part of (a) is exposed to light, and (c) is (b) is an edge-processed image, (d) is a captured image different from (a), and (e) is an image obtained by extracting an extraction region (hand) from (d).
[0011]
(a) is an image of an object displayed with one kind of gray value for explanation (region 1). (b) is an image when light is irradiated from the upper right of the object. The region irradiated with this light is defined as region 2, and the density value of this region 2 is different from that of region 1. Therefore, the difference in the gray value affects the similarity. (c) is an image obtained by performing edge processing on (b) .Since only the pixels at the boundary between

regions

1 and 2 are affected, the effect on similarity is small compared to (b). Using it to perform template matching was not universal.
[0012]
In addition, (e) after extracting a template (hand) from an image with a complex background such as (d), the background image is complex, so the gray value of this background part has a large effect on template matching. Caused.
[0013]
Therefore, when the gray value is directly used as a similarity measure for template matching, the detection accuracy is not kept above a certain level due to the change in the gray value as described above.
[0014]
Next, for template matching that does not directly use the gray value for the similarity measure, "A New Class of Similarity Measures for Robust Image Registration" by A. Venot, JFLebruchec, JCRoucayrol: Computer Vision, Graphics, Image Processing 28, pp. 176-184 (1984). This is characterized by overlapping the two images captured under the same imaging conditions to determine the difference in pixel value (shading value) of each corresponding pixel, and the number of this difference changed negatively (sign changed) This is a method of performing matching as a quantity (similarity measure).
[0015]
For example, in the correlation method using the cross-correlation coefficient, the “tone value” of the pixel at the difference position affects the similarity measure, but this sign change is related to the “number” of the difference position. Therefore, there is little influence on the fluctuation of the gray value.
[0016]
However, this method is applied to medical image processing. However, as with the correlation method, calculation of feature quantities of two images is required for each trial, which increases the detection cost. It was.
[0017]
Next, Lipton et al. (P. Lipson, E. Grimson, P. Sinha) "Configuration Based Scene Classification and Image Indexing": CVPR'97 pp1007-1013 P. Sinha's “Object Recognition via Image Invariants: A Case Study”: Investigative Ophthalmology Visual Science, vol.35, pp.1735-1740, May 1994.
[0018]
For example, P. Lipson and others divide an image into several blocks for the purpose of searching for similar images, perform a search focusing on the magnitude relationship of image feature amounts between the blocks, and perform image structuring.
[0019]
Sinha's document, which is the basis of this structure, describes a method of extracting the magnitude relationship between pixels and using the extraction result as a template in order to detect, for example, a face region from an image. This Sinha method is a method in which a qualitative relationship is extracted from a large amount of stored images and the extraction result is stored as a template. A plurality of images such as moving images are arranged in time series. In the method of tracking the target area of a certain template for an image sequence (tracking), there is a problem that the template needs to be updated sequentially, and the detection cost for the update increases.
[0020]
Next, for template matching with the similarity code of the incremental sign of the gray value between pixels, “Natural image matching by incremental sign correlation”: Shunichi Kaneko, Ichiro Murase, Takaaki Fukushima, Satoru Igarashi: IEICE Technical Committee Materials IIS- 98-58, pp.31-35,1998. Kaneko et al. Introduced the idea of focusing on the incremental code of the gray value between pixels and making the number of codes a similarity measure. However, when judging the similarity of two images, if the gray value between pixels is the same and the sign of the gray value is equal, it is handled in the same way as when the gray value is increased. Therefore, there is a problem that the matching accuracy is lowered from the viewpoint of precise matching and image classification.
[0021]
[Problems to be solved by the invention]
As described above, the conventional matching has a high calculation cost due to an increase in the number of collations such as pre-processing and feature amounts, and even if the captured images are in the same region, the gray value is changed by the change of the captured environment. When the fluctuation occurs, there is a problem that the fluctuation cannot be dealt with and the detection accuracy is lowered.
[0022]
Therefore, the present invention has been made in view of the above-described conventional problems, and even when the detection accuracy is improved, the calculation cost is not significantly increased, and when the image is fluctuated, It is an object of the present invention to provide a pattern recognition method for improving the template matching detection accuracy and a computer-readable storage medium storing a program for performing pattern recognition using a similarity measure that is not easily affected by fluctuations.
[0023]
[Means for Solving the Problems]
In order to achieve the above object, the pattern recognition method of the present invention includes a first image in which a plurality of pixels are arranged in the column direction and the row direction, respectively. Adjacent to each other in the column and row directions Comparing the gray value between two different pixels, obtaining a reference ternary comparison image representing the magnitude relationship and the equivalence relationship with three types of codes, and arranging a plurality of pixels in the column direction and the row direction, respectively In the second image, Adjacent to each other in the column and row directions A step of comparing a gray value between two different pixels, obtaining a target three-value comparison image representing the magnitude relationship and the equivalent relationship with three types of codes, and the reference three-value comparison image and the target three-value comparison image And a step of obtaining the number of coincident symbols at positions corresponding to each other and a step of determining the similarity between the target ternary comparison image and the reference ternary comparison image from the number.
[0024]
Moreover, the pattern recognition method of the present invention includes: The pixel value of one pixel in each layer is obtained using the pixel value of 2 × 2 pixels in the lower layer of the first level, and the 2 × 2 pixels in the lower layer of the first layer move the pixels in the respective layers in the column direction and The target image is expressed by a hierarchical structure of a plurality of hierarchies arranged at positions divided in two in the row direction, and the reference image is expressed by a hierarchical structure of a plurality of hierarchies similar to the hierarchical structure of the target image, Of the target image Adjacent to each other in the column direction and row direction for each level Comparing the gray value between two different pixels, obtaining a target ternary comparison image representing the magnitude relationship and the equivalence relationship with three types of codes, Adjacent to each other in the column direction and row direction for each level Comparing the gray value between two different pixels, obtaining a reference ternary comparison image representing the magnitude relationship and the equivalence relationship with three types of codes, the target image, Said At least one in the hierarchy Said The target three-value comparison image and the reference three-value comparison image corresponding to the target three-value comparison image are obtained at the positions corresponding to each other, and the number of the corresponding signs is obtained. From the number, the target three-value comparison image is obtained. And similarity between the reference ternary comparison image and the reference ternary comparison image.
[0025]
The storage medium of the present invention is a storage medium in which a program for performing pattern recognition is stored so as to be readable by a computer, and in the first image in which a plurality of pixels are arranged in the column direction and the row direction, Adjacent to each other in the column and row directions A grayscale value between two different pixels is compared, and a reference ternary comparison image in which the magnitude relationship and the equivalence relationship are represented by three types of codes is obtained, and a plurality of pixels are arranged in the column direction and the row direction, respectively. In 2 images, Adjacent to each other in the column and row directions The gray value between two different pixels is compared, and a target ternary comparison image in which the magnitude relationship and the equivalence relationship are represented by three types of codes is obtained, and the reference ternary comparison image and the target ternary comparison image The number of coincidence of the codes is obtained at the corresponding positions, and the similarity between the target ternary comparison image and the reference ternary comparison image is determined from the number.
[0026]
The storage medium of the present invention is a storage medium in which a program for pattern recognition is stored so as to be readable by a computer, The pixel value of one pixel in each layer is obtained using the pixel value of 2 × 2 pixels in the lower layer of the first level, and the 2 × 2 pixels in the lower layer of the first layer move the pixels in the respective layers in the column direction and The target image is expressed by a hierarchical structure of a plurality of hierarchies arranged at positions divided into two in the row direction, and the reference image is expressed by a hierarchical structure of a plurality of hierarchies similar to the hierarchical structure of the target image, Of the target image Adjacent to each other in the column direction and row direction for each level The gray value between two different pixels is compared, and a target ternary comparison image in which the magnitude relationship and the equivalence relationship are represented by three types of codes is obtained, and the reference image Adjacent to each other in the column direction and row direction for each level The gray value between two different pixels is compared, and a reference ternary comparison image in which the magnitude relationship and the equivalence relationship are represented by three kinds of codes is obtained, Said At least one in the hierarchy Said The target three-value comparison image and the reference three-value comparison image corresponding to the target three-value comparison image are obtained at the positions corresponding to each other, and the number of the corresponding signs is obtained, and the target three-value comparison is performed from the number. The similarity between the image and the reference ternary comparison image is determined.
[0029]
According to such a configuration, the qualitative relationship between images, such as the magnitude relationship between gray levels and the same relationship between adjacent pixels, is used as a feature amount of the image, thereby reducing calculation cost and matching with higher accuracy. It can be performed.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0031]
1 to 10 show a first embodiment.
[0032]
1 (a) to 1 (d) illustrate template matching operations, (a) is a plan view of a target image, (b) is a plan view of the template image, and (c) is a template image as a target image. FIG. 4D is a plan view for explaining that the template image is detected in the target image.
[0033]
The template matching that is a pattern recognition method is to detect a position (pattern region) where a template image composed of pixels of a v × w matrix substantially matches from a target image composed of pixels of an s × t matrix. For the sake of explanation, it is assumed that the area of the target image is larger than the area of the template image. Further, each image is represented by multi-valued shades of two or more values, and the density resolution of each image is the same.
[0034]
For example, the target image in (a) is a topographic map, and the template image in (b) is a pond. More specifically, the template image (thick frame) is moved to the entire target image as shown by the arrow in (c) to compare the similarities at the respective positions, and the position with the highest similarity is the template image. , (D) is template matching in which a part of the target image is detected as a template image. Here, the horizontal direction in the figure is the row direction, and the vertical direction in the figure is the column direction.
[0035]
In the case of (c), the following feature amount is used as a measure for determining the similarity of the template image at each position in the target image.
[0036]
First, for an image I composed of m × n pixels, a pixel at a position (x, y) in the image I is set as a pixel value I (x, y). However, it has a relationship of 1 ≦ x ≦ m, 1 ≦ y ≦ n, the density resolution is 2 bits, and is a numerical value expressed in four stages of 0 to 3.
[0037]
Figure 2 shows a 4 × 4 pixel image I with four levels of density resolution. _S It is.
[0038]
This arbitrary image I _S From the pixel values between adjacent pixels, the following two ternary images Q _h , Q _v As a feature quantity.
[0039]
Thus, the magnitude relationship and equivalence relationship of pixel values between pixels adjacent in the column direction and the row direction are examined.
[Expression 2]

Figure 3 (a) shows image I _S Q calculated from _h Image (line-direction reference or target ternary comparison pixel matrix In) Yes, (b) is Q _v Image (column-direction reference or target ternary comparison pixel matrix )so is there. In (a), image I _S The relationship between the upper left pixel 00 and the right pixel 01 is that the respective densities are 2 and 4, and the calculated result Q _V Is “-2”, so the symbol “<” is displayed. Similarly, in (b), the relationship between the image 00 and the pixel 10 below it is that the respective densities are 2 and 3, and the calculated result Q _h Is “−1”, so the symbol “<” is displayed.
[0040]
Thereafter, image I _S The same calculation is performed as a whole. As a result, Q _v Is an image of (m−1) × n pixels, that is, a ternary image of (3 × 4) pixels, and Q _h Is an image of m × (n−1) pixels, that is, a ternary image of (4 × 3) pixels.
[0041]
The obtained two ternary images are expressed as follows, and are hereinafter referred to as QT (Qualitative Trinary Representation) expressions.
QT (I) = (Q _v , Q _h )
For the image (aI + b) multiplied by (ap + b) for each pixel value p,
QT (I) = QT (aI + b)
This relationship is established.
[0042]
Even if each pixel value of the image is multiplied by (ap + b), the magnitude relationship between the pixels does not change, so the same QT expression can be obtained. This indicates that the feature amount (QT expression) does not change even by normalization of norm performed on the entire image or normalization using average and variance.
[0043]
In addition, for the image HistNorm (I) that has been subjected to histogram flattening before obtaining a ternary image,
QT (I) = QT (HistNorm (I))
This relationship is established.
[0044]
Next, the similarity between QT expressions will be described with reference to FIG.
[0045]
FIG. 4 (a) is an explanatory diagram of a QT representation showing each image and a ternary image obtained from the image, and (b) is an explanation of the QT representation showing the result of comparing the ternary images of each image. FIG.
[0046]
(a) The image 1 on the left is the target image, and the QT representation of the image 1 (the left is the reference ternary comparison image in the row direction and the right is the reference ternary comparison image in the column direction) is displayed below the image 1 The image 2 on the right side is a template image, and the QT representation of the image 2 (the left side is the target ternary comparison image in the row direction and the right side is the target ternary comparison image in the column direction) is displayed below the image 2. Let QT (I ₁ ) And QT representation of image 2 is QT (I ₂ ).
[0047]
Each QT expression can be expressed as follows.
QT (I ₁ ) = (Q _v1 , Q _h1 )
QT (I ₂ ) = (Q _v2 , Q _h2 )
This QT (I ₁ ) (Q _v1 , Q _h1 ) And QT (I ₂ ) (Q _v2 , Q _h2 ) And the number of pixels with the same sign D at the corresponding position D _v , D _h (Number A, Number B) is determined as follows.
[Equation 3]

The number of pixels having the same sign is obtained by calculating for all the pixels, with 1 being the case where the sign of each corresponding pixel of each ternary image is the same and 0 being the other case.
[0048]
In addition, since the maximum value of the obtained number is when comparing the target image and the image with the same template image,
D _v (I, I) = n (m-1)
D _h (I, I) = m (n-1)
It becomes.
[0049]
In this way, the number of pixels having the same sign for each pixel is calculated from (a). In (b), only pixels having the same symbol are shaded and displayed. In this example, the following numbers are obtained.
D _v (I ₁ , I ₂ ) = 1
D _h (I ₁ , I ₂ ) = 6
Next, the similarity QTS between the image 1 and the image 2 is obtained. QTS is a formula shown below, and its range is 0.0 to 1.0.
[Expression 4]

M, n, D mentioned above _v , D _h Substituting each numerical value, the similarity QTS = 0.29 is obtained.
[0050]
Further, the similarity of images having a QT expression as shown in FIG. 5 will be described.
[0051]
The

images

1, 2, and 3 are different images composed of 4 × 4 pixels, and the density levels of these

images

1, 2, and 3 are set in four stages.
[0052]
QT representation of

images

1, 2, and 3 and their QT representation using the method described above In The number of identical codes is obtained, and the similarity is calculated from the result.
[0053]
The number of identical codes in

images

2 and 3 is
D _v (I ₂ , I _Three ) = 12
D _h (I ₂ , I _Three ) = 8
The similarity QTS at that time is 0.83. As a result, it is determined that image 3 has a higher similarity to image 2 than image 1.
[0054]
If the template image and the target image have the same number of pixels, the denominator of the similarity QTS is common, and therefore the similarity QTS may be obtained by comparing only the numerators.
[0055]
In this way, when calculating the similarity using the QT expression, the calculation cost is reduced compared to the case where the same accuracy is obtained by the conventional method because the similarity is obtained only by the comparison operation and the sum operation without using multiplication. it can. In addition, since no statistical average or variance is used, the template image is not moved a plurality of times during matching, which contributes to a reduction in calculation cost.
[0056]
FIG. 6 (a) is a graph showing the relationship between the QT expression and the mutual function, and (b) shows a plurality of target images.
[0057]
Each point in the graph represents a correlation value or a QT expression result (QTS) between a plurality of target images and one template image as shown in (b).
[0058]
In the graph, it can be seen that even in the range surrounded by the dotted line, that is, the images determined to be substantially the same in the mutual function, the degree of similarity is different in terms of the QT expression. The target images at this time included images of various shapes as shown in (b).
[0059]
Conventionally, there was an image matching method in which the increment sign of the gray value between pixels by Kaneko or the like is a similarity measure, but this method has the same definition as the increment code that indicates that the gray value is the same. In addition, only horizontal comparisons are made. That is, the image matching is performed using a binary expression expressed only by the magnitude relation, and the result of the image matching is output from the calculation result only in the horizontal direction.
[0060]
Differences between the conventional image matching method such as Kaneko and the present invention will be described with reference to FIG. 7 (a), an explanatory diagram of the conventional image matching, and (b) an explanatory diagram of the present invention.
[0061]
In (a) and (b), each image is composed of (4 × 4) pixels, and the gradation value has four levels. The numerical values described between the images indicate the similarity between the images, and the closer to 1, the more identical the images are.
[0062]
Here, the left image in (a) and (b) is image 1, the right image is image 2, and the center image is image 3.
[0063]
In the prior art shown in (a), as a result of comparing the code obtained from the adjacent pixels in image 1 with the code obtained from the adjacent pixels in image 2, all the codes at the corresponding positions are the same. Similarity is 1.0, image 1 and image 2 Are determined to be the same image. In addition, in image 1 and image 3, pixels (3, 1) to (3,4) have the same gray value, and the sign is the same sign, but the equal sign is the same as the handling of the increasing code. The similarity is assumed to be 1.0, and as a result, the same image is considered. Similarly, the pixels (2, 1) to (2, 4) between the image 2 and the image 3 have the same gray value, but are judged to have a similarity of 1.0 and are regarded as the same image.
[0064]
On the other hand, in the present invention shown in (b), pixels having the same sign are defined separately from the magnitude relationship, so the above-described problem is eliminated and pixels having the same density value are obtained. Are determined to be the same, and the similarity is calculated as 0.5.
[0065]
Thus, by expressing the magnitude relationship and the same relationship using a ternary expression, it is possible to perform similarity calculation with higher accuracy than in the past.
[0066]
Next, the similarity calculation operation using the QT expression as described above will be described with reference to the block diagram of the electronic device in FIG. 8 and the flowchart in FIG. Note that the template image and the target image are captured in advance by an imaging camera or the like.
[0067]
The electronic device 1 has a monitor 2, a main body 3, a keyboard 4, and a mouse 5. In the main body 2, a storage device 10 such as a hard disk in which images captured by an imaging camera such as a digital camera are stored, a memory 11 composed of ROM and RAM, a storage device 10, a memory 11, and each element (monitor 2 , A keyboard 4 and a mouse 5) are connected to each other and comprise a CPU 12 that performs computation and control of each element.
[0068]
An image can be stored in the storage device 10 by directly connecting the main body 2 and the imaging camera, or by operating the keyboard 4 and the mouse 5 via a line such as the Internet. When displaying an image, the monitor 2 is used.
[0069]
(1) First, a template image is read from the storage device 10 into the memory 11, and the CPU 12 obtains a QT expression of the template image (S1).
[0070]
(2) Next, the CPU 12 reads the target image from the storage device 10 into the memory 11, generates an enlarged image of the target image and a plurality of reduced target images, and stores them in the storage device 10 or the memory 11 (S2). .
[0071]
(3) Next, the QT representation of the original target image and a plurality of target images enlarged and reduced is obtained by the CPU 12, and stored in the storage device 10 or the memory 11 in association with the target image (S3).
[0072]
(4) Next, each corresponds between the ternary image of the QT representation of the template image and an arbitrary area of the same size as the template image in the ternary image of the QT representation of the target image at a certain magnification. The number of identical codes at the position is obtained, and the similarity is calculated from the number by the CPU 12 and temporarily stored in the storage device 10 or the memory 11 (S4).
[0073]
(5) Next, the CPU 12 determines whether or not the similarity of the ternary image in the QT representation of the template image has been calculated for all the ternary images in the QT representation of the target image (S5). . If all calculations have been completed, the process proceeds to S9, and if not completed, the process proceeds to S6.
[0074]
(6) If it has not been completed, the position in the target image of this arbitrary region is shifted by one pixel in the row or column direction, a ternary image of the template image, The CPU 12 calculates the similarity at each corresponding position and temporarily stores it in the storage device 10 or the memory 11 (S6). In calculating the similarity, all calculations in the row direction at an arbitrary position are performed, then one pixel is shifted in the column direction, and all calculations are performed in the row direction at a position shifted by one pixel.
[0075]
(7) Next, the CPU 12 compares the similarity stored in (6) with the similarity stored in (4) (S7). As a result of the comparison, if the similarity in (6) is greater than (4), the process proceeds to (8), and if it is smaller, the process proceeds to (5).
[0076]
(8) Next, when the similarity is high, the similarity in (4) is deleted from the storage device 10 or the memory 11 (S8). Proceed to S5.
[0077]
(9) When all the calculations in the target image with a certain magnification are completed, the similarity at the position where the ternary image of the target image with the other magnification and the ternary image of the template image correspond respectively. Whether or not the calculation is made is determined by the CPU 12 (S9). If calculation of all other magnification target images has been completed, the process proceeds to (11), and if not completed, the process proceeds to (10).
[0078]
(10) If the processing has not been completed, the CPU 12 reads out target images having different magnifications stored in the storage device 10 or the memory 11 (S10). Go to (4).
[0079]
(11) If completed, the similarity (including the magnification of the target screen and the position in the target image) stored in the storage device 10 or the memory 11 is regarded as the maximum similarity (S11). It is determined that the position stored in the maximum similarity is the position of the template image.
[0080]
In the first embodiment as described above, the quaternary QT representation is used to obtain a similarity measure that is resistant to fluctuations, the accuracy of the similarity calculation is higher, and the calculation cost associated with this improvement in accuracy It is possible to search for a template image that suppresses an increase in the number of images.
[0081]
Note that the template image may have a hierarchical structure. For example, when the template image (the structure of the object shown in the image) is complicated, it can be composed of a plurality of template images divided into the overall structure and the structure of each part.
[0082]
Here, FIGS. 10A to 10I are explanatory diagrams when the template image has a hierarchical structure. As an example, consider a human face.
[0083]
As shown in (a), the template image is divided into five template images of the entire face, right eye, left eye, nose and mouth. In the hierarchical structure, the template image of the entire face is in the upper stage, and the template images of the right eye, left eye, nose, and mouth are in parallel in the lower stage of the entire face. Between the layers, the resolution of the template image may be the same or different.
[0084]
(b) shows the layout of each template image and the search range (black frame). The outermost frame indicates the range of the template image of the entire face. The search range for the right eye, left eye, nose, and mouth is displayed in a frame surrounding each template image in the entire face frame. Each template image is matched within this frame. Here, a part thereof may overlap, such as the range of the right eye and nose and the range of the left eye and nose.
[0085]
The target image is (c), and each template image described above is detected from the target image. The template image of the entire face is (d). First, this template image is detected from the target image using the QT expression. The detection results are

frames

1 and 2 in (e).
[0086]
Next, the

frames

1 and 2 are cut out from the target image, and the sizes of these frames are made the same (see (f)). Considering these

frames

1 and 2 as template images of the entire face, matching with each template image (right eye, left eye, nose, mouth) in (b) is performed using the QT expression.
[0087]
(g) is an image obtained by superimposing (b) and (f), and the image on the right side has a low similarity (is absent) from the matching result, and is thus excluded from the detection target. The image on the left side is determined to be a template image because the degree of similarity is high (beyond the threshold).
[0088]
Accordingly, a template image is detected from the target image as indicated by a frame in (h).
[0089]
In this way, it is possible to hold the template image as a hierarchical structure and perform matching. Further, although matching is performed for each layer in FIG. 10, matching of each layer may be performed in parallel, and it may be determined that the position where the best match of each template image is the position of the template image. For example, in FIG. 10, there are five template images. The total similarity of the five template images at each position is defined as the similarity of the position, and the position having the highest similarity is the position of the template image. There can be. This can also be applied to the third embodiment described below.
[0090]
Next, the configuration of the second exemplary embodiment of the present invention will be described with reference to FIG.
[0091]
In the following embodiments, the same components are denoted by the same reference numerals, and redundant description is omitted.
[0092]
The feature of the second embodiment is that, in addition to the QT expression, the conventional quantitative similarity (for example, the similarity by the correlation method) is used in combination.
[0093]
FIGS. 11 (a) to 11 (e) are explanatory diagrams of the operation of the second embodiment. (A) is a target image and a template image. (C) is the matching result figure of QT expression and correlation method, (d) is the matching result figure obtained by combining QT expression and correlation method, (e) It is an image which shows the place where the template image matched.
[0094]
As shown in (a), the template image is a monitor, and the target image is an image of a room in which the monitor is placed.
[0095]
The result of matching the template image using the QT expression from the target image is (b). In (b), four thresholds are set for QTS for matching, and increase toward t1 to t4. Therefore, in the matching result of the uppermost threshold value t1, there are many candidates that match the template image, and conversely, in the matching result of the lowermost threshold value t4, the number of matching candidates is small.
[0096]
From this threshold value, for example, the matching result of t3 is set as the matching result by the QT expression ((c) upper stage). In addition, the result of matching by the correlation method using the same target image and template image is (c) the lower row.
[0097]
The matching result based on the QT expression and the matching result based on the correlation method are compared, and the location detected as the matching result at the same time is set as the position of the template image. However, when there are a plurality of locations that match at the same time, the location having the maximum similarity among the detection results obtained by the correlation method is set as the true template image position. As a result, the solid line in (d) is the position of the true template image, and the broken lines are the positions detected by matching at the same time.
[0098]
In this way, as shown in (e), it is determined that the area surrounded by the white frame in the target image is the template image.
[0099]
In the second embodiment as described above, the detection accuracy can be further improved by performing template matching by combining the QT expression and the correlation method.
[0100]
Also, the similarity in the target image is obtained using a QT expression whose similarity calculation cost is lower than that of the correlation method, and then matching is performed using the correlation method only for places where the similarity is equal to or higher than a desired threshold. By performing, it is possible to perform matching that is lower than the calculation cost when only the correlation method is used and that is highly accurate.
[0101]
Next, the configuration of the third exemplary embodiment of the present invention will be described with reference to FIGS.
[0102]
A feature of the third embodiment is that a target area of a template image is tracked (tracked) for an image sequence in which a plurality of images such as a moving image are arranged, and the position of the template image is detected from the moving image. It is that.
[0103]
FIG. 12 shows a target image captured by the imaging camera, and has a time axis from the top to the bottom in the figure. The left side is a target image captured at a predetermined interval, and the right side is an image showing a template image in the target image with a white frame.
[0104]
Here, the template image is a hand region, and the position where the template image is detected changes as the template image moves from right to left in the image with time. In addition, since the position and shape of the hand changes with the passage of time, the target image also changes.
[0105]
The operation will be described with reference to the flowchart of the third embodiment in FIG.
[0106]
Note that the template image is stored in the storage device 10 or the memory 11 in advance. The target image is an image captured by an imaging camera or the like, and this image is sent to the storage device 10 or the memory 11 at a predetermined interval and stored. There are five target images in the present embodiment as shown in FIG.
[0107]
(1) First, a template image is read from the storage device 10 or the memory 11, and the CPU 12 obtains a QT expression of the template image (S21).
[0108]
(2) Next, the target image (for example, the upper left image in FIG. 12) is read from the storage device 10 or the memory 11 (S22).
[0109]
(3) Next, a plurality of target images obtained by changing the scales of the read target images are obtained by the CPU 12, and these target images are stored in the storage device 10 or the memory 11 (S23).
[0110]
(4) Next, a QT expression is obtained for each target image, and the QT expression corresponding to the target image is stored in the storage device 10 or the memory 11 (S24).
[0111]
(5) Next, each corresponds between a ternary image of QT representation of the template image and an arbitrary area having the same size as the template image in the ternary image of QT representation of the target image at a certain magnification. The number of identical codes at the position is obtained, and the similarity is obtained by the CPU 12 and temporarily stored in the storage device 10 or the memory 11 (S25).
[0112]
(6) Next, the CPU 12 determines whether or not the similarity between the ternary images in the QT representation of the template image has been calculated for all the ternary images in the QT representation of the target image (S26). If all the calculations are completed, the process proceeds to S30, and if not completed, the process proceeds to S27.
[0113]
(7) If not completed yet, the position of this arbitrary area in the target image is shifted by one pixel in the row direction or the column direction, the ternary image of the template image, The CPU 12 obtains the similarity at each corresponding position and temporarily stores it in the storage device 10 or the memory 11 (S27). Note that the calculation is performed in the column direction after performing all the row directions in an arbitrary region.
[0114]
(8) Next, the CPU 12 compares the similarity stored in (7) with the similarity stored in (5) (S28). As a result of the comparison, if the degree of similarity of (7) is larger than (5), the process proceeds to (9), and if smaller, the process proceeds to (6).
[0115]
(9) If the similarity is high, the similarity in (5) is deleted from the storage device 10 or the memory 11 (S29). Go to (6).
[0116]
(10) When all the calculations in the target image at a certain magnification have been completed, the similarity at the position where the ternary image of the target image at the other magnification and the ternary image of the template image correspond respectively. Whether or not the calculation is made is determined by the CPU 12 (S30). If calculation of all other magnification target images has been completed, the process proceeds to (12), and if not completed, the process proceeds to (11).
[0117]
(11) If not completed, the CPU 12 reads out target images having different magnifications stored in the storage device 10 or the memory 11 (S31). Go to (3).
[0118]
(12) If completed, the similarity (including the magnification of the target screen and the position in the target image) stored in the storage device 10 or the memory 11 is regarded as the maximum similarity (S32). It is determined that the position stored in the maximum similarity is the position of the template image.
[0119]
(13) Next, the area having the maximum similarity in the target image is extracted by the CPU 12 as a new template image and stored in the storage device 10 or the memory 11. Then, the QT expression of this new template image is obtained and stored in correspondence with the template image (S33).
[0120]
(14) Next, the CPU 12 determines whether or not there is a target image to be template matched (S34). If it exists, the process proceeds to (2). If it does not exist, the process ends.
[0121]
In this embodiment, since there are target images to be further matched, the process proceeds to (2), and the target image is the second image from the upper left side in FIG. ) Matching. After processing the second image, the same matching is performed while changing the target image to the third, fourth, and fifth from the top left of FIG. 12 in (14).
[0122]
In the third embodiment as described above, the template image moves in the target image, and the target image and the template image are From moment to moment Even in the case of changing, template matching can be performed even when the target image is a moving image by sequentially updating and tracking the template image.
[0123]
Next, the configuration of the fourth exemplary embodiment of the present invention will be described with reference to FIGS.
[0124]
The feature of the fourth embodiment is that the QT expression is applied in combination with the quadtree method.
[0125]
FIGS. 14A and 14B are explanatory diagrams of a quadtree data structure. There are three layers, and one image in each layer is composed of 2 × 2 pixels. A hierarchical structure with four branches from a node is shown. The hierarchy 1 is the uppermost layer and the hierarchy 3 is the lowest layer (see FIG. 14 (a)).
[0126]
The image of layer 1 consists of 2 × 2 pixels. The image of layer 2 which is the lower layer of layer 1 consists of 4 × 4 pixels obtained by dividing one pixel of layer 1 into four, and the average density of the pixels divided into four is the density of one pixel of layer 1 It becomes. Similarly, the image of layer 3 which is the lower layer of layer 2 is composed of 8 × 8 pixels obtained by dividing one pixel of layer 2 into four, and the average density of the pixels divided into four is one of layer 2 The density is one pixel (see FIG. 14B). Further, in this embodiment, the relationship between the layers is related to each layer with the average value of the lower layer pixel values as the upper layer density, but the maximum and minimum averages of the pixel values of the lower layer pixels and There is no restriction on the way of association such as median.
[0127]
Fig. 15 (a) is an image of each layer, (b) is a QT representation of each layer, and (c) is a ternary representation of magnitude and equality relationships that are digitized with a predetermined relationship It is.
[0128]
If the minimum unit of each layer is 2 × 2 pixels, the QT representation of layer 1 is one set, the QT representation of layer 2 is four sets, and the QT representation of layer 3 is eight sets. A set of QT expressions consists of four code values. There are 3 combinations of sign values for ternary representations of magnitude relations and equality relations. ⁴ = 81 types, but if you delete combinations that are not possible, you will get 57 types.
[0129]
The QT expression of each set shown in (b) can be expressed as shown in (c) by adding numbers (0 to 56) obtained by digitizing these 57 combinations. This digitized number is referred to as a QT-ID number. In addition, since the relationship between each QT expression is composed of 2 × 2 pixels, it is expressed in four stages, which is obtained using the number of corresponding pixels having the same sign. If all codes are the same, it is 4. If all codes are different, it is 0.
[0130]
In this way, after obtaining a certain target image, this target image is represented by a quadtree structure, and the QT representation of each layer is represented by a QT-ID number (including a similar relationship represented by 0 to 4). Try again.
[0131]
A method of judging the similarity between two images when the two images are represented by a tree structure as shown in FIG.
[0132]
When judging the similarity using the QT-ID number of each node in all hierarchies, find out whether the corresponding ID numbers of the nodes in each hierarchy are the same, or how much the corresponding number of codes is The similarity between structures is judged by the number.
[0133]
By representing each image in a hierarchical manner, the global properties of the image will be represented in the upper layer, in addition to matching only based on the adjacency as described in the first embodiment, The similarity of global properties will also be compared.
[0134]
Alternatively, attention may be paid only to a certain hierarchy, and the similarity may be determined by comparing QT-ID numbers of a certain hierarchy.
[0135]
This method can be used, for example, when searching for similar images. The higher the hierarchy is, the more abstract the image is. Therefore, when performing a rough search, the higher hierarchy is used, and when performing a more accurate search, the lower hierarchy is used for matching.
[0136]
Further, when searching for an image, an index (index) for searching may be required, and this QT-ID number can also be used as index information.
[0137]
In the fourth embodiment as described above, in order to further reduce the calculation cost, matching is performed in the upper layer, and in order to further increase detection accuracy, matching is performed in the lower layer, Matching can be used as needed.
[0138]
In addition, since each image has a hierarchical structure, it is possible to perform not only matching between QT expressions obtained from each image but also precise matching.
[0139]
Next, the configuration of the fifth exemplary embodiment of the present invention will be described with reference to FIG.
[0140]
A feature of the fifth embodiment is that pixels that are not involved in matching calculation are set in the template image.
[0141]
FIGS. 16 (a) and 16 (b) are explanatory views of the fifth embodiment. In FIG. 16, the pixel “φ” is a portion to be ignored when performing template matching.
[0142]
As shown in (a), the pixels to be ignored are the four corners of the template image, and the pixels at the four corners are excluded from the matching pixels in advance. Then, the template image is deformed into a substantially cross shape.
[0143]
Also, as shown in (b), a plurality of pixels at the center of the template image are ignored. For example, if the template image is a donut shape, the matching of the hollow portion is unnecessary, and therefore, it can be an area to be ignored in advance. Further, if the template image is a hand, a pixel corresponding to the gap between the fingers can be set as the region to be ignored.
[0144]
Thus, by ignoring predetermined pixels in advance, the calculation cost can be reduced and the detection time can be shortened. Further, by not calculating the similarity of the area to be ignored, it is not affected by the meaningless area.
[0145]
The designation of the position of the region partially including the ignored pixels can be arbitrarily set.
[0146]
Next, a sixth embodiment of the present invention will be described with reference to FIG.
[0147]
A feature of the sixth embodiment is that when matching is performed on a plurality of target images, classification is performed for each of a plurality of template images.
[0148]
FIGS. 17A to 17C are explanatory diagrams, FIG. 17A shows three types of template images, FIG. 17B shows a plurality of target images, and FIG. 17C shows classification results.
[0149]
There are three types of template images, A, B, and C, as shown in (a). A plurality of target images are held in a certain storage area (see (b)). The template image, the target image, and each QT expression are obtained, and each template image is matched with each target image. Each target image has one of the three template images in the image. Matching is performed, and as a result, the target images are sorted and classified for each template image having the maximum similarity (see (c)). After that, more detailed matching may be performed for each classified target image. This method can be used, for example, as a preprocessing for clustering or for selection of a multi-category dictionary for pattern recognition.
[0150]
Needless to say, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the spirit of the present invention. For example, the template image may be created using an average image of a plurality of images. In this case, the average image may be obtained by making an average pixel value from the gray value of each pixel to obtain the QT expression, or by obtaining the QT expression of each image and using the value with the most corresponding pixel position of each image. It may be a template image. In order to detect a deformed object, it is possible to set a plurality of template images in consideration of various deformed shapes for one template image.
[0151]
Further, in order to obtain a ternary image, calculation is performed between adjacent pixels, but a ternary image may be obtained between pixels arranged in an oblique direction or between pixels separated by a predetermined distance. For example, as shown in FIG. 18, a magnitude relationship or an equivalence relationship between non-adjacent pixels may be used.
[0152]
In addition, the detection accuracy can be further improved by providing the magnitude relationship as n (n is a natural number) stages by providing the magnitude relationship with stages.
[0153]
In template matching and tracking (extracting a template image from a moving image), the template image is translated in the target image or the target image is enlarged or reduced. An image may be the target of matching.
[0154]
In addition, the pixel values used in the QT expression may be set to 64 levels (6 bits) and 128 levels (7 bits) instead of 4 levels in the embodiment. Further, when the obtained image is a 256-level (8-bit) image, the same QT expression may be obtained by reducing the grayscale resolution to, for example, 64 levels. In the case of 256 levels, information from another aspect can be obtained such that the portion where the magnitude relationship has been established becomes equivalent because the resolution has dropped to 64 levels. Of course, a multi-resolution QT representation may be used using a plurality of resolutions.
[0155]
When Quadtree is used, an image composed of 2 × 2 pixels is used as one unit, but the number of pixels constituting this image unit can be set as appropriate.
[0156]
It is also possible to perform matching on a color image. In that case, matching can be performed by performing the same processing on the shades of R, G, and B of the color image. It is also possible to convert to another color representation and obtain the magnitude relationship and the equality relationship from this color representation.
[0157]
In addition, as a target image, an infrared image captured (captured) by irradiating infrared rays can be used instead of an image captured under visible light. In general, an infrared image becomes whitish depending on the object to be picked up, and a saturated image is obtained. In this case, if matching is performed using the conventional normalized correlation method, the correlation value increases in proportion to the brightness, so that erroneous recognition is likely to occur. However, the matching using the QT expression is not influenced by the property of the image to be captured (how the image peculiar to the wavelength is captured), and the matching can be performed without erroneous recognition. Also, matching with reduced misrecognition can be performed on various images such as an ultrasound image, a medical image, and a radar image with poor contrast.
[0158]
Further, when the resolution level of the target image and the template image is large (for example, 128, 256 levels) when obtaining the QT expression, it is possible to give an arbitrary range to the same code. For example, when expressing the density in 256 levels, the equal sign range is within ± 5, and if the result of comparison is within this range, it is considered that they are the same, and the equal sign is used.
[0159]
Further, the processing in the embodiment of the present invention can be realized by a computer-executable program, and this program can be realized as a computer-readable storage medium.
[0160]
As the storage medium in the present invention, a magnetic disk, floppy disk, hard disk, optical disk (CD-ROM, CD-R, DVD, etc.), magneto-optical disk (MO, etc.), semiconductor memory, etc. can be stored. As long as it is a computer-readable storage medium, the storage format may be any form.
[0161]
In addition, an OS (operation system), database management software, MW (middleware) such as a network, etc. running on a computer based on instructions from a program installed in the computer from a storage medium realize the present embodiment. A part of each process may be executed.
[0162]
Furthermore, the storage medium in the present invention is not limited to a medium independent of a computer, but also includes a storage medium in which a program transmitted via a LAN or the Internet is downloaded and stored or temporarily stored.
[0163]
Further, the number of storage media is not limited to one, and the case where the processing in the present embodiment is executed from a plurality of media is also included in the storage medium in the present invention, and the configuration of the media may be any configuration.
[0164]
The computer according to the present invention executes each process according to the present embodiment based on a program stored in a storage medium, and a single device such as a personal computer or a plurality of devices are connected to a network. Any configuration such as a system may be used.
[0165]
In addition, the computer in the present invention is not limited to a personal computer, but includes a processing unit, a microcomputer, and the like included in an information processing device, and is a generic term for devices and devices that can realize the functions of the present invention by a program. .
[0166]
【The invention's effect】
According to the present invention as described above, by using a similarity measure that is strong against fluctuations, it is possible to increase the detection accuracy while suppressing the calculation cost.
[Brief description of the drawings]
FIGS. 1A and 1B illustrate a first embodiment of a pattern recognition method according to the present invention, where FIG. 1A is a target image, FIG. 1B is a template image, and FIG. 1C is a template image that moves in the target image. Explanatory drawing, (d) is explanatory drawing of a matching result.
FIG. 2 is an explanatory diagram of a target image according to the first embodiment of the pattern recognition method of the present invention.
FIGS. 3A and 3B are diagrams for explaining a first embodiment of a pattern recognition method according to the present invention, where FIG. 3A is an explanatory diagram of horizontal QT representation, and FIG. 3B is an explanatory diagram of vertical QT representation;
FIGS. 4A and 4B illustrate a first embodiment of a pattern recognition method according to the present invention. FIG. 4A is an explanatory diagram of QT expressions of images, and FIG. 4B is a comparison of QT expressions obtained from the images. Explanatory drawing of the result.
FIG. 5 is an explanatory diagram of the similarity according to the first embodiment of the pattern recognition method of the present invention.
6A and 6B are diagrams for explaining a first embodiment of a pattern recognition method according to the present invention, in which FIG. 6A is a graph showing the relationship between the similarity based on the QT expression and the similarity based on the correlation method, and FIG. The target image.
7A and 7B are diagrams for explaining a first embodiment of a pattern recognition method of the present invention, where FIG. 7A is an explanatory diagram of conventional matching, and FIG. 7B is an explanatory diagram of matching using a QT expression.
FIG. 8 is a block diagram showing the configuration of the electronic apparatus according to the first embodiment of the pattern recognition method of the present invention.
FIG. 9 is a flowchart showing the operation of the first embodiment of the pattern recognition method of the present invention.
FIGS. 10A and 10B are diagrams for explaining a first embodiment of a pattern recognition method according to the present invention, where FIG. 10A is a template image having a hierarchical structure, and FIG. 10B is an explanatory diagram of an arrangement relationship and detection areas of each template image; (C) is a target image, (d) is a template image, (e) is an explanatory diagram of candidate positions of a plurality of template images (upper layer) detected in the target image, and (f) is cut out from the target image. Explanatory diagram of the template image candidates, (g) Explanatory diagram when matching the template image of the lower layer to the template image (higher layer) candidate, (h) displays the result of detecting the template image Explanatory drawing of a target image.
11A and 11B illustrate a second embodiment of the pattern recognition method of the present invention, where FIG. 11A is a target image and a template image, FIG. 11B is an explanatory diagram of matching results for each threshold, and FIG. Is an explanatory diagram of the matching result by the QT expression and the matching result by the correlation method, (d) is an explanatory diagram of the matching result using the QT expression and the correlation method, and (e) is a template image detected in the target image.
FIG. 12 is an explanatory diagram of a third embodiment of the pattern recognition method of the present invention.
FIG. 13 is a flowchart showing the operation of the third embodiment of the pattern recognition method of the present invention.
14A and 14B are diagrams for explaining a fifth embodiment of the pattern recognition method of the present invention, where FIG. 14A is an explanatory diagram of a structure of a quadtree, and FIG. 14B is an explanatory diagram of pixels in each layer.
FIGS. 15A and 15B illustrate a fifth embodiment of the pattern recognition method of the present invention, where FIG. 15A is an explanatory diagram of pixels in each layer, FIG. 15B is an explanatory diagram of QT representation in each layer, and FIG. ) Is an explanatory diagram when each QT expression is indicated by a QT-ID number.
FIG. 16 is an explanatory diagram when setting a region for ignoring the QT expression from the target image of the pattern recognition method of the present invention.
FIGS. 17A and 17B illustrate a fourth embodiment of the pattern recognition method of the present invention, where FIG. 17A is an explanatory diagram of a plurality of template images, FIG. 17B is an explanatory diagram of a plurality of target images, and FIG. Is an explanatory diagram of target images classified for each template image.
FIG. 18 is an explanatory diagram of QT expression different from that of the first embodiment of the pattern recognition method of the present invention.
FIGS. 19A and 19B illustrate a conventional pattern recognition method, where FIG. 19A is an explanatory diagram of a target image, FIG. 19B is an explanatory diagram when light is inserted into the target image, and FIG. 19C is an image of FIG. (D) is an explanatory diagram of a target image obtained by performing edge processing on the image of (b), (e) is an actual target image captured by an imaging camera, (f) ) Is an explanatory diagram for detecting a template image from (e).
[Explanation of symbols]
1 Electronic equipment
2 Monitor
3 Body
4 Keyboard
5 mouse
10 Storage device
11 memory
12 CPU

Claims

列方向及び行方向にそれぞれ複数の画素を配列した第1の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した基準3値比較画像を求める工程と、
列方向及び行方向にそれぞれ複数の画素を配列した第2の画像中の、列方向及び行方向に互いに隣接する異なる2個の画素間の濃淡値を比較し、その大小関係および同値関係を3種類の符号で表した対象3値比較画像を求める工程と、
前記基準3値比較画像と前記対象3値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求める工程と、
前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定する工程とを
有することを特徴とするパターン認識方法。In the first image in which a plurality of pixels are arranged in the column direction and the row direction, the gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and the magnitude relationship and the equivalence relationship are compared with each other. A step of obtaining a reference ternary comparison image represented by a type code;
In the second image in which a plurality of pixels are arranged in the column direction and the row direction, the gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and the magnitude relationship and the equivalence relationship are compared with each other. Obtaining a target ternary comparison image represented by a kind of code;
Obtaining the number of coincidence of the codes at the corresponding positions of the reference ternary comparison image and the target ternary comparison image; and
And determining the similarity between the target ternary comparison image and the reference ternary comparison image from the number.

各階層の１画素の画素値が１段階下位の階層の２×２画素の画素値を用いて求められ、前記１段階下位の階層の２×２画素が前記各階層の各画素を列方向および行方向にそれぞれ２分割した位置に配置されている、複数階層の階層構造によって、対象画像を表現し、
前記対象画像の階層構造と同様の複数階層の階層構造によって、基準画像を表現し、
前記対象画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較し、その大小関係および同値関係を 3 種類の符号で表した対象 3 値比較画像を求め、
前記基準画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較し、その大小関係および同値関係を 3 種類の符号で表した基準 3 値比較画像を求め、
前記対象画像の、前記階層中の少なくとも 1 つの前記対象 3 値比較画像と、この対象 3 値比較画像に対応する前記基準 3 値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求め、
前記個数から前記対象 3 値比較画像と前記基準 3 値比較画像との類似性を判定する
ことを特徴とするパターン認識方法。 The pixel value of one pixel in each layer is obtained using the pixel value of 2 × 2 pixels in the lower layer of the first level, and the 2 × 2 pixels in the lower layer of the first layer move the pixels in the respective layers in the column direction and The target image is expressed by a hierarchical structure of a plurality of hierarchies arranged at positions divided into two in the row direction,
A reference image is expressed by a hierarchical structure of a plurality of hierarchies similar to the hierarchical structure of the target image,
For each of the layers of the target image, a gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and a target three- value comparison in which the magnitude relationship and the equivalence relationship are represented by three types of codes Ask for an image
For each of the layers of the reference image, a gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and a reference three- value comparison in which the magnitude relationship and the equivalence relationship are represented by three types of codes Ask for an image
And said object image, at least one of the three target value comparison image in the hierarchy of the reference three-value comparison image corresponding to the target ternary comparison image, at the position where the corresponding said code matches Find the number,
Determining the similarity between the reference three-value comparison image and the target ternary comparative image from the number
A pattern recognition method characterized by the above .

パターン認識を行うプログラムをコンピュータ読み取り可能なように記憶させた記憶媒体であって、
列方向及び行方向にそれぞれ複数の画素を配列した第 1 の画像中の、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較させ、その大小関係および同値関係を 3 種類の符号で表した基準 3 値比較画像を求めさせ、
列方向及び行方向にそれぞれ複数の画素を配列した第 2 の画像中の、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較させ、その大小関係および同値関係を 3 種類の符号で表した対象 3 値比較画像を求めさせ、
前記基準 3 値比較画像と前記対象 3 値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求めさせ、
前記個数から前記対象 3 値比較画像と前記基準 3 値比較画像との類似性を判定させる
プログラムを記憶したコンピュータ読み取り可能な記憶媒体。 A storage medium storing a program for performing pattern recognition so that it can be read by a computer,
In the first image in which a plurality of pixels respectively in the column direction and a row direction, by comparing the gray value between two different pixels adjacent to each other in the column direction and the row direction, the magnitude relation and equivalence relations 3 let obtains a reference 3 value comparison image representing the type of code,
In the second image in which a plurality of pixels respectively in the column direction and a row direction, by comparing the gray value between two different pixels adjacent to each other in the column direction and the row direction, the magnitude relation and equivalence relations 3 let obtains the target ternary comparison image representing the type of code,
Wherein the reference 3 value comparison image and the target ternary comparison image, at the position where the corresponding, let obtains the number of the codes match,
To determine the similarity between the target ternary comparison image and the reference three-value comparison image from the number
A computer-readable storage medium storing a program .

パターン認識を行うプログラムをコンピュータ読み取り可能なように記憶させた記憶媒体であって、
各階層の１画素の画素値が１段階下位の階層の２×２画素の画素値を用いて求められ、前記１段階下位の階層の２×２画素が前記各階層の各画素を列方向および行方向にそれぞれ２分割した位置に配置されている、複数階層の階層構造によって、対象画像を表現させ、
前記対象画像の階層構造と同様の複数階層の階層構造によって、基準画像を表現させ、
前記対象画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較させ、その大小関係および同値関係を 3 種類の符号で表した対象 3 値比較画像を求めさせ、
前記基準画像の前記各階層ごとに、列方向及び行方向に互いに隣接する異なる 2 個の画素間の濃淡値を比較させ、その大小関係および同値関係を 3 種類の符号で表した基準 3 値比較画像を求めさせ、
前記対象画像の、前記階層中の少なくとも 1 つの前記対象 3 値比較画像と、この対象 3 値比較画像に対応する前記基準 3 値比較画像との、それぞれが対応する位置で、前記符号が一致する個数を求めさせ、
前記個数から前記対象3値比較画像と前記基準3値比較画像との類似性を判定させる
プログラムを記憶したコンピュータ読み取り可能な記憶媒体。A storage medium storing a program for performing pattern recognition so that it can be read by a computer,
The pixel value of one pixel in each layer is obtained using the pixel value of 2 × 2 pixels in the lower layer of the first level, and the 2 × 2 pixels in the lower layer of the first layer move the pixels in the respective layers in the column direction and The target image is expressed by a hierarchical structure of a plurality of hierarchies arranged at positions divided into two in the row direction,
A reference image is expressed by a hierarchical structure of a plurality of hierarchies similar to the hierarchical structure of the target image,
For each level of the target image, the gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and the target three- value comparison in which the magnitude relationship and the equivalence relationship are represented by three kinds of codes Ask for an image,
For each level of the reference image, a gray value between two different pixels adjacent to each other in the column direction and the row direction is compared, and a reference three- value comparison in which the magnitude relationship and the equivalence relationship are represented by three types of codes. Ask for an image,
And said object image, at least one of the three target value comparison image in the hierarchy of the reference three-value comparison image corresponding to the target ternary comparison image, at the position where the corresponding said code matches Let the number be determined,
A computer-readable storage medium storing a program for determining similarity between the target ternary comparison image and the reference ternary comparison image from the number.