JP2021033426A

JP2021033426A - Image detection device, image detection method and image detection program

Info

Publication number: JP2021033426A
Application number: JP2019149940A
Authority: JP
Inventors: 高橋　純也; Junya Takahashi; 純也高橋
Original assignee: Toshiba Information Systems Japan Corp
Current assignee: Toshiba Information Systems Japan Corp
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2021-03-01
Anticipated expiration: 2039-08-19
Also published as: JP6893751B2

Abstract

To meet various demands such as a demand that the detection ability is not significantly reduced and a demand that the processing time is not increased.SOLUTION: An image detection device includes image pyramid generating means 31 for generating an image pyramid with respect to a target image data, first image search means 32 for detecting a detection target image by an image search using a feature amount on a template image including a detection target image and image data of a plurality of layers constituting the image pyramid, and second image search means 33 for detecting the detection target image by an image search using similarity information with respect to a layer image in which the first image search means performed the search but could not detect the detection target image and a contrast image that is a layer image corresponding to the detection target image detected by the previous search.SELECTED DRAWING: Figure 4

Description

この発明は、画像検出装置、画像検出方法及び画像検出用プログラムに関するものである。 The present invention relates to an image detection device, an image detection method, and an image detection program.

従来の一般的な物体検出装置では、例えば次に説明するような手法により検出が行われていた。例えば、図１に示すようなオリジナル画像に対し、（１）〜（６）の距離的粒度で物体検出が行われ、かつテンプレートサイズが固定のシステムの場合には、図２に示すようにサイズと解像度が異なる６枚のレイヤ画像Ｐ１〜Ｐ６を作成して物体検出を行う。テンプレート画像ＴＥＭＰが、図の左端に示されている。 In the conventional general object detection device, for example, the detection is performed by the method described below. For example, in the case of a system in which object detection is performed with the distance particle size of (1) to (6) and the template size is fixed for the original image as shown in FIG. 1, the size is as shown in FIG. 6 layer images P1 to P6 having different resolutions are created and object detection is performed. The template image TEMP is shown at the left edge of the figure.

上記のレイヤ画像Ｐ１〜Ｐ６は、サイズがレイヤ画像Ｐ１、Ｐ２、・・・、Ｐ５、Ｐ６の順に大きくなっており、レイヤ画像Ｐ６、Ｐ５、・・・、Ｐ２、Ｐ１の順に下から上に適当な距離で６段重ねると、ピラミッドの如き形状となるので、画像ピラミッドと称されている。 The size of the layer images P1 to P6 increases in the order of layer images P1, P2, ..., P5, P6, and the layers images P6, P5, ..., P2, P1 are arranged from bottom to top. It is called an image pyramid because it becomes a pyramid-like shape when six layers are stacked at an appropriate distance.

上記の物体検出において、特徴量を用いる物体検出手法（例えばＨＯＧ特徴量ベースの物体検出）の場合、検出精度は高いが比較テンプレートからの大きさの変化には脆弱性を有する場合がある。この物体検出手法を採用して処理を実行する場合、上記のように画面上で大きさの異なる物体を検出しようとするときには、図２のようなピラミッドレイヤを用いることができる。 In the above-mentioned object detection, in the case of the object detection method using the feature amount (for example, the object detection based on the HOG feature amount), the detection accuracy is high, but the change in size from the comparison template may be vulnerable. When the process is executed by adopting this object detection method, a pyramid layer as shown in FIG. 2 can be used when trying to detect objects having different sizes on the screen as described above.

ピラミッドレイヤ数を多くすると、細かい粒度で物体サイズ変化に対応可能となるが、ピラミッド画像の生成時間、スキャン時間の双方で処理時間が増加してしまう。また、処理時間を優先してピラミッドレイヤ数を削減した場合には、ピラミッドの隙間で検出される人物の検出漏れが生じてしまう問題がある。 If the number of pyramid layers is increased, it becomes possible to respond to changes in object size with fine grain size, but the processing time increases in both the pyramid image generation time and the scanning time. Further, when the number of pyramid layers is reduced by giving priority to the processing time, there is a problem that the detection of the person detected in the gap of the pyramid is omitted.

特許文献１には、検出処理に利用した画像を、別の大きさの顔を検出処理する際にも再度利用できるようなピラミッド画像の構成とし、さらに、縮小率β＝１／Ｋ（Ｋは２以上の整数）を有効に利用した方法で当該ピラミッド画像を作成する手法が開示されている。即ち、この手法は、作成されたピラミッド画像の再利用によって高速化に関する技術が開示されている。 Patent Document 1 has a pyramid image configuration in which the image used for the detection process can be reused when a face of another size is detected, and further, the reduction ratio β = 1 / K (K is A method of creating the pyramid image by a method that effectively utilizes (an integer of 2 or more) is disclosed. That is, this method discloses a technique related to speeding up by reusing the created pyramid image.

特許文献２には、基準ピラミッドｐｙ１及び追跡ピラミッドｐｙ２から画像サイズが最小の階層画像をそれぞれ読み込んで、勾配法を用いて追跡点の反復検出を実行する画像処理方法が開示されている。この画像処理方法では、設定された収束判定条件に従って上位の階層画像を読み込んで追跡点検出を行い、上記追跡点の反復検出と収束条件の設定変更が繰り返される。これにより追跡点の反復検出回数を抑制する。探索領域を事前計算することで探索範囲を抑制する手法である。 Patent Document 2 discloses an image processing method in which a hierarchical image having the smallest image size is read from the reference pyramid py1 and the tracking pyramid py2, respectively, and repeated detection of tracking points is performed using a gradient method. In this image processing method, a higher hierarchical image is read according to the set convergence test condition to detect the tracking point, and the repeated detection of the tracking point and the setting change of the convergence condition are repeated. This suppresses the number of repeated detections of tracking points. This is a method of suppressing the search range by pre-calculating the search area.

特許文献３には、これまでよりも有効な物体検出・追跡処理などの画像処理装置が開示されている。この画像処理装置では、入力した画像データに基づいて複数の異なるスケールの層画像データから成る画像ピラミッドを形成すると共に、対象物について固定スケールとしたテンプレートを用意する。これにより、テンプレートが該当する画像ピラミッド内位置を求める。 Patent Document 3 discloses an image processing apparatus such as object detection / tracking processing that is more effective than ever. In this image processing apparatus, an image pyramid composed of a plurality of layer image data of different scales is formed based on the input image data, and a template having a fixed scale for the object is prepared. As a result, the position in the image pyramid to which the template corresponds is obtained.

更に、この画像ピラミッド内位置に基づいて特定された上側層画像データと下側層画像データ毎にパーティクルとの照合により尤度を求める。次に、これらの尤度を用いて真の尤度を求める補間処理を行う。このように、ピラミッド間の隙間を補間してピラミッド間に存在する物標の尤度を求める手法である。 Further, the likelihood is obtained by collating the upper layer image data specified based on the position in the image pyramid with the particles for each lower layer image data. Next, interpolation processing for obtaining the true likelihood is performed using these likelihoods. In this way, it is a method of interpolating the gap between the pyramids to obtain the likelihood of the target existing between the pyramids.

特許文献４には、対象物を撮像した画像について、ゴミや汚れの付着、照明の変化、個体差による表面状態の違い等により、エッジ方向が部分的に反転した場合でも、認識ミスを防止しつつ処理時間を短縮化することのできる画像処理方法が開示されている。この画像処理方法においては、解像度の異なる複数のテンプレート画像を解像度順に階層化した第１ピラミッドと、この第１ピラミッドにおけるテンプレート画像と同じ解像度であって、解像度の異なる複数の被探索画像を解像度順に階層化した第２ピラミッドとを用いる。そして、画像処理装置が行う画像処理により、同解像度の階層において上記第１ピラミッドのテンプレート画像に含まれる特徴画像を上記第２ピラミッドの被探索画像の中から、類似度の判定により探索する処理を解像度の低い順から行うものである。 Patent Document 4 prevents recognition errors even when the edge direction of an image obtained by capturing an object is partially reversed due to adhesion of dust or dirt, changes in lighting, differences in surface conditions due to individual differences, or the like. However, an image processing method capable of shortening the processing time is disclosed. In this image processing method, a first pyramid in which a plurality of template images having different resolutions are layered in the order of resolution, and a plurality of searched images having the same resolution as the template image in the first pyramid but having different resolutions are arranged in the order of resolution. A layered second pyramid is used. Then, by image processing performed by the image processing device, a process of searching for a feature image included in the template image of the first pyramid in the same resolution hierarchy from the searched images of the second pyramid by determining the degree of similarity is performed. This is done in ascending order of resolution.

更に、上記画像処理方法は、上記特徴画像の点と、上記被探索画像における上記特徴画像の点に対応する位置の点との局所類似度を、上記特徴画像の全ての点についてそれぞれ計算する局所類似度計算工程と、上記局所類似度計算工程にて計算した上記各局所類似度を合計して正規化することにより上記類似度を計算する類似度計算工程と、を備えている。 Further, the image processing method calculates the local similarity between the points of the feature image and the points of the positions corresponding to the points of the feature image in the searched image, respectively, for all the points of the feature image. It includes a similarity calculation step and a similarity calculation step of calculating the similarity by summing and normalizing each local similarity calculated in the local similarity calculation step.

そして、上記局所類似度計算工程では、上記特徴画像の点と、上記被探索画像における上記特徴画像の点に対応する位置の点との仮の局所類似度を計算し、上記仮の局所類似度が所定値を下回るか否かを判断し、上記仮の局所類似度が上記所定値以上であると判断した場合は、上記仮の局所類似度を上記局所類似度とし、上記仮の局所類似度が上記所定値を下回ると判断した場合は、前記仮の局所類似度を絶対値が小さくなるように変更して上記局所類似度とするものである。 Then, in the local similarity calculation step, a provisional local similarity between the point of the feature image and the point of the position corresponding to the point of the feature image in the searched image is calculated, and the provisional local similarity is calculated. Is lower than the predetermined value, and if it is determined that the provisional local similarity is equal to or higher than the predetermined value, the provisional local similarity is defined as the local similarity, and the provisional local similarity is defined as the provisional local similarity. When it is determined that is less than the predetermined value, the provisional local similarity is changed so that the absolute value becomes smaller to obtain the local similarity.

特許文献５には、パターンマッチングを高速化し、しかも物体の検出の失敗を防ぐ画像処理方法が開示されている。具体的には最上段の被探索画像よりも下段の被探索画像において、該被探索画像に対して上段の被探索画像で抽出した候補点に対応した位置の点毎に、該候補点に対応した位置の点を含む領域と領域限定閾値とを設定する。類似度の計算途中で、類似度の計算途中結果に基づき、類似度が領域限定閾値よりも低い値となるか否かを予測し、類似度が領域限定閾値よりも低い値となると予測した場合には、類似度の計算を途中で打ち切る。類似度が領域限定閾値以上となると予測した場合には、類似度の計算を続行し、類似度の計算が完了したとき類似度が領域限定閾値よりも高い値であれば、次の走査位置において用いる領域限定閾値を、領域限定閾値よりも高い値となった類似度で更新するものである。 Patent Document 5 discloses an image processing method that speeds up pattern matching and prevents failure in detecting an object. Specifically, in the image to be searched lower than the image to be searched in the uppermost row, each of the points at the position corresponding to the candidate point extracted in the image to be searched in the upper row corresponds to the candidate point. The area including the point at the specified position and the area limitation threshold are set. In the middle of calculating the similarity, based on the result of the calculation of the similarity, it is predicted whether or not the similarity will be lower than the region limitation threshold, and the similarity is predicted to be lower than the region limitation threshold. The calculation of similarity is interrupted in the middle. If it is predicted that the similarity will be equal to or higher than the region limitation threshold, the calculation of the similarity is continued, and if the similarity is higher than the region limitation threshold when the calculation of the similarity is completed, the next scanning position is performed. The region-limited threshold value to be used is updated with a similarity that is higher than the region-limited threshold value.

特開２００６−２０２１８４号公報Japanese Unexamined Patent Publication No. 2006-202184 特開２０１０−００９３７５号公報Japanese Unexamined Patent Publication No. 2010-09375 特開２０１０−１１３５１３号公報Japanese Unexamined Patent Publication No. 2010-11513 特開２０１６−０１８４２２号公報Japanese Unexamined Patent Publication No. 2016-018422 特開２０１７−１１１６３８号公報Japanese Unexamined Patent Publication No. 2017-11138

しかしながら、上記手法はいずれも、検出能力を大きく落としたくないという要求や、また処理時間を増大させたくないという要求など多様な要求に応えるという課題を達成するものではない。本発明はこのような画像ピラミッドを用いた画像検出装置の現状に鑑みてなされたもので、その目的は、検出能力を大きく落としたくないという要求や、また処理時間を増大させたくないという要求など、多様な要求を全てではなくとも解決し、多様な要求に応えた様々な態様の画像検出装置を提供することである。即ち、検出能力は高い方が良いが処理時間は多少要しても良いという要求に応えた実施形態や、検出能力はある程度で良いが処理時間は短い方が良いという要求に応えた実施形態の画像検出装置を提供する。 However, none of the above methods achieves the problem of meeting various demands such as a demand that the detection capability is not significantly reduced and a demand that the processing time is not increased. The present invention has been made in view of the current state of the image detection apparatus using such an image pyramid, and an object of the present invention is a requirement that the detection capability is not significantly reduced and a requirement that the processing time is not increased. It is to solve various demands, if not all, and to provide image detection devices of various modes that meet various demands. That is, an embodiment that meets the requirement that a higher detection capability is preferable but a little processing time may be required, and an embodiment that responds to a requirement that the detection capability is sufficient but the processing time is shorter. An image detection device is provided.

本実施形態に係る画像検出装置は、対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成手段と、検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索手段と、前記第１の画像検索手段が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段とを具備する。 The image detection device according to the present embodiment has an image pyramid generation means for generating an image pyramid with respect to the target image data, a template image including the image to be detected, and image data having a plurality of layers constituting the image pyramid. The first image search means that detects the image to be detected by the image search using the feature amount, the layer image that the detection target image could not be detected by the search performed by the first image search means, and the layer image detected by the previous search. A second image search means for detecting a detection target image by an image search using similarity information is provided for a contrast image which is a corresponding layer image in which the target image is detected.

本実施形態に係る画像検出装置が処理する画像の原画の一例を示す図。The figure which shows an example of the original image of the image processed by the image detection apparatus which concerns on this embodiment. 本実施形態に係る画像検出装置が処理する画像の原画から生成したピラミッド画像の一例を示す図。The figure which shows an example of the pyramid image generated from the original image of the image processed by the image detection apparatus which concerns on this embodiment. 本実施形態に係る画像検出装置の構成を示すブロック図。The block diagram which shows the structure of the image detection apparatus which concerns on this embodiment. 本実施形態に係る画像検出装置の要部構成を示すブロック図。The block diagram which shows the main part structure of the image detection apparatus which concerns on this embodiment. 本実施形態に係る画像検出装置の動作を示すフローチャート。The flowchart which shows the operation of the image detection apparatus which concerns on this embodiment. 本実施形態に係る画像検出装置が処理する画像の原画から生成したピラミッド画像に対する処理の手順を示す図。The figure which shows the procedure of processing with respect to the pyramid image generated from the original image of the image processed by the image detection apparatus which concerns on this embodiment.

以下添付図面を参照して本発明に係る画像検出装置、画像検出方法及び画像検出用プログラムの実施形態を説明する。各図において、同一の構成要素には同一の符号を付して重複する説明を省略する。本実施形態に係る画像検出装置は、ワークステーション、パーソナルコンピュータ、サーバコンピュータなどの各種のコンピュータによって実現される。上記コンピュータは、図３に示されるような構成を備えることができる。 Hereinafter, embodiments of an image detection device, an image detection method, and an image detection program according to the present invention will be described with reference to the accompanying drawings. In each figure, the same components are designated by the same reference numerals, and duplicate description will be omitted. The image detection device according to the present embodiment is realized by various computers such as a workstation, a personal computer, and a server computer. The computer can be provided with the configuration shown in FIG.

即ち、ＣＰＵ１０が主メモリ１１のプログラムやデータなどを用いて画像検出装置としての処理を行う構成となっている。ＣＰＵ１０には主メモリ１１以外に、バス１２を介して外部記憶インタフェース１３、入力インタフェース１４、表示インタフェース１５、画像入力インタフェース１６が接続されている。 That is, the CPU 10 is configured to perform processing as an image detection device using programs and data of the main memory 11. In addition to the main memory 11, the CPU 10 is connected to an external storage interface 13, an input interface 14, a display interface 15, and an image input interface 16 via a bus 12.

外部記憶インタフェース１３には外部記憶装置２３が接続され、入力インタフェース１４にはマウスなどのポインティングデバイス２２やキーボード入力装置やタッチパネル入力装置などの入力装置２４が接続され、表示インタフェース１５にはＬＥＤやＬＣＤなどの各種の表示装置２５が接続されている。 An external storage device 23 is connected to the external storage interface 13, a pointing device 22 such as a mouse or an input device 24 such as a keyboard input device or a touch panel input device is connected to the input interface 14, and an LED or LCD is connected to the display interface 15. Various display devices 25 such as are connected.

更に、画像入力インタフェース１６には画像入力装置としてのカメラ２６が接続され、画像データが供給されている。ここでは、カメラ２６が接続されているものとしているが、複数台のカメラが接続されていても良い。更に、画像入力インタフェース１６には画像を供給する装置であればカメラ２６に限定されず、例えば、画像データを送るサーバや端末、更には、画像データが記憶された媒体などであっても良い。 Further, a camera 26 as an image input device is connected to the image input interface 16, and image data is supplied. Here, it is assumed that the cameras 26 are connected, but a plurality of cameras may be connected. Further, the image input interface 16 is not limited to the camera 26 as long as it is a device that supplies images, and may be, for example, a server or terminal that sends image data, or a medium in which image data is stored.

外部記憶装置２３には、本実施形態に係る画像検出装置を実現するためのプログラムが保持されている。即ち、図４に示されるように外部記憶装置２３には、画像ピラミッド生成手段３１、第１の画像検索手段３２、第２の画像検索手段３３、検索対象抽出手段３４が備えられている。画像ピラミッド生成手段３１は、対象の画像データに対し画像ピラミッドを生成するものである。即ち、画像ピラミッド生成手段３１は、カメラ２６により撮像され画像入力インタフェース１６を介して取り込んだ１フレームの画像データに対して６枚のレイヤの画像データを生成する。既に説明したように、図１に示すようなオリジナル画像に対し、（１）〜（６）の距離的粒度で物体検出を行うものであり、かつテンプレートサイズが固定のシステムとし、図２に示すようにサイズと解像度が異なる６枚の画像Ｐ１〜Ｐ６を作成して物体検出を行うものとする。上記図２のレイヤ画像Ｐ１〜Ｐ６においては、レイヤ画像Ｐ６が原画像であるものとし、レイヤ画像Ｐ１〜Ｐ５はレイヤ画像Ｐ６を所定の割合で縮小して作成することができる。なお、生成するレイヤ画像の枚数は６に限定されず、６より少なくても良いし、６より多くても良い。 The external storage device 23 holds a program for realizing the image detection device according to the present embodiment. That is, as shown in FIG. 4, the external storage device 23 is provided with an image pyramid generation means 31, a first image search means 32, a second image search means 33, and a search target extraction means 34. The image pyramid generation means 31 generates an image pyramid with respect to the target image data. That is, the image pyramid generating means 31 generates image data of six layers for one frame of image data captured by the camera 26 and captured via the image input interface 16. As described above, the original image as shown in FIG. 1 is subjected to object detection with the distance particle size of (1) to (6), and the template size is fixed, and is shown in FIG. It is assumed that six images P1 to P6 having different sizes and resolutions are created and object detection is performed. In the layer images P1 to P6 of FIG. 2, it is assumed that the layer image P6 is the original image, and the layer images P1 to P5 can be created by reducing the layer images P6 by a predetermined ratio. The number of layer images to be generated is not limited to 6, and may be less than 6 or more than 6.

第１の画像検索手段３２は、検出対象の画像が含まれるテンプレート画像と上記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出するものである。例えば、６枚（全ての）のレイヤ画像に対して特徴量を使った画像検索により検出対象画像を検出する。 The first image search means 32 detects a detection target image by an image search using a feature amount for a template image including the detection target image and image data of a plurality of layers constituting the image pyramid. For example, the detection target image is detected by image search using the feature amount for 6 (all) layer images.

検索対象抽出手段３４は、上記画像ピラミッドを構成する複数レイヤの画像データについていくつかのレイヤの画像を抽出して検索対象レイヤ画像を抽出するものである。例えば、１フレームにおいて生成されるＮレイヤからＳレイヤ置きに１枚（１レイヤ）抽出して検索対象レイヤ画像とする。１フレームにおいて抽出により得られる検索対象レイヤ画像の数はＬである。即ち、Ｎは（Ｓ＋１）の整数倍であり、Ｌ、Ｓ、Ｎの関係は、Ｌ＝Ｎ／（Ｓ＋１）である。 The search target extraction means 34 extracts images of several layers from the image data of the plurality of layers constituting the image pyramid to extract the search target layer image. For example, one image (one layer) is extracted from the N layer generated in one frame every S layer to obtain a search target layer image. The number of search target layer images obtained by extraction in one frame is L. That is, N is an integral multiple of (S + 1), and the relationship between L, S, and N is L = N / (S + 1).

検索対象抽出手段３４による抽出が行われる場合には、上記第１の画像検索手段３２は、上記検索対象抽出手段３４により抽出されたレイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する。ここで、特徴量とは、スケール変化に弱い特徴量として、CoHOG（Co-Occurrence Histogram Of Gradient）特徴量、HOG（Histogram Of Gradient）特徴量、LBP（Local Binary Pattern）特徴量、Haar-Like特徴量、ICF（Integral Channel Features）特徴量、ACF（Aggregate Channel Features）特徴量などがあり、スケール変化に強い特徴量として、SIFT（Scale Invariant Feature Transform）特徴量、SURF（Speeded Up Robust Features）特徴量などを挙げることができる。 When extraction is performed by the search target extraction means 34, the first image search means 32 detects the image data of the layer extracted by the search target extraction means 34 by an image search using the feature amount. Detect images. Here, the features are CoHOG (Co-Occurrence Histogram Of Gradient) features, HOG (Histogram Of Gradient) features, LBP (Local Binary Pattern) features, and Haar-Like features as features that are vulnerable to scale changes. There are quantities, ICF (Integral Channel Features) features, ACF (Aggregate Channel Features) features, etc., and SIFT (Scale Invariant Feature Transform) features and SURF (Speeded Up Robust Features) features are strong against scale changes. And so on.

また、特徴量としては、１フレーム（１画面）内のオブジェクトや検出対象である人や車など物の部位（全部または一部）の模様、色または階調度の値、空間周波数、各種特徴量の平均値、ヒストグラム、標準偏差、更に特徴ベクトルなどを用いても良い。 The features include patterns, color or gradation values, spatial frequencies, and various features of objects in one frame (one screen) and parts (all or part) of objects such as people and cars to be detected. The mean value, histogram, standard deviation, feature vector, and the like may be used.

第２の画像検索手段３３は、上記第１の画像検索手段３２が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、類似度情報を使った画像検索により検出対象画像を検出する。テンプレートマッチングにおける類似度情報を使った画像検索手法としては、ＳＡＤ(Sum of Absolute Difference)、ＳＳＤ(Sum of Squared Difference)、ＮＣＣ（Normalized Cross Correlation）などを用いることができる。 The second image search means 33 compares the layer image in which the detection target image could not be detected by the search performed by the first image search means 32 and the corresponding layer image in which the detection target image was detected in the previous search. The image to be detected is detected by an image search using the similarity information for the image. As an image search method using similarity information in template matching, SAD (Sum of Absolute Difference), SSD (Sum of Squared Difference), NCC (Normalized Cross Correlation) and the like can be used.

上記において、「上記第１の画像検索手段３２が行った検索によって検出対象画像が検出できなかったレイヤ画像」とは、第ｍフレームの画像検索において検出できなかった第Ｍ番目のレイヤ画像を指す。上記で、Ｍは複数であっても良く、実際には、例えばＭ番目のレイヤ画像とＭ＋２番目のレイヤ画像のようなこともあり得る。また、「前の検索で検出対象画像が検出された対応レイヤ画像である対比画像」とは、第ｍフレームの画像検索より前のフレームにおける検索の意味であり、第ｍ−１フレーム以前の検索における同じレイヤ位置またはその近隣レイヤ位置のレイヤ画像であって、検出対象画像が検出された場合のレイヤ画像を指す。同じレイヤ位置とは正確に対応するＭ番目のレイヤ画像を指す。Ｍが複数であっても良いことは前述の通りであり、この場合には、その近隣レイヤ位置が含まれる。「第ｍ−１フレーム以前」は、「第ｍ−１フレーム以前」でｍから直近のフレームを指す。 In the above, the "layer image in which the detection target image could not be detected by the search performed by the first image search means 32" refers to the Mth layer image that could not be detected in the image search of the mth frame. .. In the above, M may be plural, and in reality, it may be, for example, an M-th layer image and an M + second layer image. Further, the "contrast image which is the corresponding layer image in which the detection target image was detected in the previous search" means the search in the frame before the image search in the mth frame, and the search before the m-1th frame. Refers to a layer image when a detection target image is detected, which is a layer image at the same layer position or a neighboring layer position in the above. The same layer position refers to the Mth layer image that corresponds exactly. As described above, there may be a plurality of M's, and in this case, the neighboring layer positions thereof are included. "Before the m-1th frame" refers to the "before the m-1th frame" and the frame closest to m.

また、本実施形態では、「所定数置きに」抽出を行っているために正確に対応するＭ番目のレイヤ画像のみならず、その近隣レイヤ位置が含まれる。例えば、「Ｍ」番目のレイヤに対しサイズの近いレイヤを含み、「Ｍ」に隣接する「Ｍ−１」番目のレイヤ画像或いは、「Ｍ＋１」番目のレイヤ画像であっても良い。隣接するレイヤ位置の意味は、「Ｍ」からそのシステムで決定されただけ離れていても良く、「Ｍプラスマイナス所定数のレイヤ」を許容する。 Further, in the present embodiment, since the extraction is performed "every predetermined number", not only the M-th layer image that corresponds accurately but also the neighboring layer positions are included. For example, it may be an "M-1" th layer image or an "M + 1" th layer image that includes a layer having a size close to that of the "M" th layer and is adjacent to the "M". The meaning of adjacent layer positions may be as far as determined by the system from "M", allowing "M plus or minus a predetermined number of layers".

上記において、上記検索対象抽出手段３４は、上記第１の画像検索手段３２が採用する特徴量の種類に応じてＳの値を決定するようにしても良い。前述の通り、特徴量には、スケール変化に弱い特徴量とスケール変化に弱い特徴量があるので、スケール変化に強い特徴量として、SIFT（Scale Invariant Feature Transform）特徴量、SURF（Speeded Up Robust Features）特徴量などを用いる場合には、Ｓを大きくしてスケール変化が大きなレイヤ画像について画像検索を行うようにしても良い。ただし、同じ特徴量（例えばCoHOG)であっても、辞書のサイズ自体の大きい小さい（解像度の高い低い）によって、どの程度スケール変化に強い、弱いが変化し得ることが分かった。例えば、解像度の高い辞書の方が、スケール変化が大きくなっても検出能力が下がりにくいということが分かった。このため、辞書のサイズ自体の大きい小さい（解像度の高い低い）によっても、Ｓを変えるようにしても良い。 In the above, the search target extraction means 34 may determine the value of S according to the type of feature amount adopted by the first image search means 32. As mentioned above, there are two types of features, one is weak against scale changes and the other is weak against scale changes. Therefore, SIFT (Scale Invariant Feature Transform) features and SURF (Speeded Up Robust Features) are the features that are strong against scale changes. ) When a feature amount or the like is used, S may be increased to perform an image search for a layer image having a large scale change. However, it was found that even with the same features (for example, CoHOG), the size of the dictionary itself can change depending on how large or small it is (high or low resolution), whether it is strong or weak against scale changes. For example, it was found that a dictionary with a high resolution is less likely to lose its detection ability even if the scale change becomes large. Therefore, S may be changed depending on whether the size of the dictionary itself is large or small (high resolution or low resolution).

以上の通りに構成された画像検出装置は、外部記憶装置２３のプログラムである各手段がＣＰＵ１０によって実現されることにより処理を行う。この場合の処理を図５のフローチャートに示す。このフローチャートにおいて、Ｎは１フレームで生成される画像ピラミッドのレイヤの数であり、Ｓは、Ｎ枚のレイヤからＳ枚置きに抽出を行うときの数であり、１フレームにおいて抽出により得られる検索対象レイヤ画像の数がＬである。既に説明した通り、Ｎは（Ｓ＋１）の整数倍であり、Ｌ、Ｓ、Ｎの関係は、Ｌ＝Ｎ／（Ｓ＋１）である。従って、１フレームではＬのレイヤに対してレイヤ処理が行われる。更に、フレーム番号がｔであり、探索レイヤセットの先頭番号をｎとしてあり、ｎ＝ｔｍｏｄ（Ｓ＋１）が成り立つ。従って、Ｓ＝１の場合、Ｎは０、１、０、１、０、１、・・・・と繰り返される。１フレームを処理するループは、以下の条件で繰り返される。
Ｆｏｒ（ｉ＝ｎ；ｉ＜Ｎ；ｉ＋＝（Ｓ＋１）） The image detection device configured as described above performs processing when each means, which is a program of the external storage device 23, is realized by the CPU 10. The processing in this case is shown in the flowchart of FIG. In this flowchart, N is the number of layers of the image pyramid generated in one frame, S is the number when extracting from N layers every S image, and the search obtained by extraction in one frame. The number of target layer images is L. As described above, N is an integral multiple of (S + 1), and the relationship between L, S, and N is L = N / (S + 1). Therefore, in one frame, layer processing is performed on the L layer. Further, the frame number is t, the start number of the search layer set is n, and n = t mod (S + 1) holds. Therefore, when S = 1, N is repeated as 0, 1, 0, 1, 0, 1, .... The loop for processing one frame is repeated under the following conditions.
For (i = n; i <N; i + = (S + 1))

図５のフローチャートがスタートとなり、１フレームの画像を取り込む（Ｓ１１）。このとき、フレーム番号ｔが更新される。次に、過去において、任意の指定フレーム回数内で画像が検出されているかを判定する（Ｓ１２）。ここで、ＹｅｓとなるとステップＳ１３へ進み、類似度情報を用いた画像処理による画像検出を行う（Ｓ１３）。ステップＳ１２においてＮｏと判断した場合、またはステップＳ１３の処理が終了すると、所定（Ｎ枚の）レイヤから構成される画像ピラミッドを生成すると共に、ｎを先頭としたＳ枚置きのレイヤ抽出を行う（Ｓ１４）。次に、Ｓ枚置きのレイヤ抽出を行ったレイヤの画像について特徴量を用いた画像検索を行う（Ｓ１５）。このステップＳ１５においては、１フレーム内でループ処理が行われる。このステップＳ１５が完了すると、ステップＳ１１へ戻って処理を続ける。 The flowchart of FIG. 5 starts, and one frame of the image is captured (S11). At this time, the frame number t is updated. Next, it is determined whether or not the image has been detected within an arbitrary specified number of frames in the past (S12). Here, if Yes, the process proceeds to step S13, and image detection is performed by image processing using the similarity information (S13). When No is determined in step S12, or when the process of step S13 is completed, an image pyramid composed of predetermined (N) layers is generated, and layer extraction is performed every S layer starting with n (n). S14). Next, an image search using the feature amount is performed on the image of the layer from which the layer extraction is performed every S sheet (S15). In this step S15, the loop processing is performed within one frame. When this step S15 is completed, the process returns to step S11 to continue the process.

図５のフローチャートによる処理の具体例を図６を参照して説明する。ここでは、ｔ−１フレーム目（回目）の処理とし、ステップＳ１１からＳ１２へ進み、ステップＳ１２においてＮｏへ分岐したものとする。本実施形態の処理当初においては画像検出がされていないから、当然このようになるものと推察できる。本実施形態では、図６（Ａ）のレイヤ画像Ｐ１〜Ｐ６のようにピラミッド画像が生成され、Ｓ枚置きのレイヤ抽出により、レイヤ画像Ｐ２、Ｐ４、Ｐ６が抽出され、このレイヤ画像Ｐ２、Ｐ４、Ｐ６においてテンプレート画像ＴＥＭＰの画像の検出が特徴量を用いて行われる（Ｓ１４、Ｓ１５）。ここでは、図６（Ａ）のレイヤ画像Ｐ２、Ｐ４において画像検出ができず、レイヤ画像Ｐ６において画像検出できたものとする。 A specific example of the process according to the flowchart of FIG. 5 will be described with reference to FIG. Here, it is assumed that the process is the t-1th frame (th), the process proceeds from step S11 to S12, and the process branches to No in step S12. Since the image was not detected at the beginning of the processing of the present embodiment, it can be inferred that this is the case. In the present embodiment, a pyramid image is generated as shown in the layer images P1 to P6 of FIG. 6A, and layer images P2, P4, and P6 are extracted by layer extraction every S sheet, and the layer images P2 and P4 are extracted. , P6, the image of the template image TEMP is detected using the feature amount (S14, S15). Here, it is assumed that the image cannot be detected in the layer images P2 and P4 of FIG. 6A, and the image can be detected in the layer image P6.

次に、ｔが１インクリメントされ（Ｓ１１）、ステップＳ１２へ進む。なお上記ステップＳ１２は、次の理由により存在している。本実施形態では特徴量を用いた画像検索において目的の対象物画像が見つからなかった場合に類似度情報を用いた画像処理による画像検出を行っている。このため、「過去(又は直前)のフレームに存在していた検出済み物体などの画像と、類似性の高い物体などの画像」が存在していることが必須であり、この画像の存在を検出するため、ステップＳ１２が存在している。従って、初回フレームではＮoへ分岐するため、類似度情報を用いた物体検出は行われず、また、長期間にわたり物体検出がなされていない場合にもステップＳ１２においてＮｏへ分岐することになる。 Next, t is incremented by 1 (S11), and the process proceeds to step S12. The above step S12 exists for the following reason. In the present embodiment, when the target object image is not found in the image search using the feature amount, the image is detected by the image processing using the similarity information. Therefore, it is essential that "an image of a detected object or the like that existed in the past (or immediately preceding) frame and an image of an object or the like having a high similarity" exist, and the existence of this image is detected. Therefore, step S12 exists. Therefore, since the object is branched to No in the first frame, the object detection using the similarity information is not performed, and even if the object is not detected for a long period of time, the object is branched to No in step S12.

ステップＳ１４の特徴量を用いた画像検索を終えてステップＳ１１へ戻ると、次の１フレームの画像を取り込んで（Ｓ１１）、任意の指定フレーム回数内で画像が検出されているかを判定する（Ｓ１２）。ここでは、ｔフレーム目（回目）であるので、１つ前の（ｔ−１）フレーム目でレイヤ画像Ｐ６において画像検出できているために、ステップＳ１２においてＹｅｓへ分岐する。図６（Ｂ）に示されるｔフレーム目のＳ枚置きに抽出したレイヤ画像Ｐ１、Ｐ３、Ｐ５の全てにおいて類似度情報を用いた画像処理による画像検出を行っても良いのであるが、本実施形態では、先に特徴量を用いて画像検索を実行する。つまり、現在のフレーム（ｔフレーム目）においてＳ枚置きに抽出した図６（Ｂ）のレイヤ画像Ｐ１、Ｐ３、Ｐ５について、１テンプレート画像ＴＥＭＰの画像の検出が特徴量を用いて実行される（Ｓ１４、Ｓ１５）。特徴量を用いた画像検索においては、図６（Ｂ）のレイヤ画像Ｐ１、Ｐ３において画像が検出でき、レイヤ画像Ｐ５において画像検出できなかったものとする。本実施形態では、特徴量を用いた画像検索ができなかったレイヤ画像についてのみ、類似度情報を用いた画像処理を行うために、画像検出できないレイヤがあるか検出がなされ、この画像検出できなかったレイヤ画像であるレイヤ画像Ｐ５についてステップＳ１３の処理が行われる（Ｓ１３）。 When the image search using the feature amount in step S14 is completed and the process returns to step S11, the image of the next one frame is captured (S11), and it is determined whether the image is detected within an arbitrary specified number of frames (S12). ). Here, since it is the t-frame (th), since the image can be detected in the layer image P6 at the previous (t-1) frame, the process branches to Yes in step S12. Image detection by image processing using similarity information may be performed on all of the layer images P1, P3, and P5 extracted every S sheet at the t-frame shown in FIG. 6 (B). In the form, the image search is first performed using the feature amount. That is, for the layer images P1, P3, and P5 of FIG. 6 (B) extracted every S sheet in the current frame (t frame), the image detection of one template image TEMP is executed using the feature amount (the feature amount). S14, S15). In the image search using the feature amount, it is assumed that the image can be detected in the layer images P1 and P3 of FIG. 6B, but the image cannot be detected in the layer image P5. In the present embodiment, in order to perform image processing using the similarity information only for the layer image for which the image search using the feature amount could not be performed, it is detected whether there is a layer that cannot be detected, and this image cannot be detected. The process of step S13 is performed on the layer image P5 which is the layer image (S13).

具体的には、現在のフレーム（ｔフレーム）より前のフレーム（ｔ−１フレーム）において画像検出できた対比画像であるレイヤ画像Ｐ６があるので、このレイヤ画像Ｐ６を対比画像としてこの度のフレームの検出できなかったレイヤ画像Ｐ５に対して類似度情報を使った画像検索により検出対象画像を検出する。なお、この場合に、レイヤ画面全体を比較するのではなく、対比画像の領域（位置とサイズ）をデータとして保持しておき、これを多少縮小・拡大により微調整して検出対象レイヤに適用しても良い。また、画像検出できたレイヤについては、検出できた画像にタグを付すなどして更なる画像処理へ使用できるデータを作成する。今回画像検出できなかったレイヤ画像の全てを処理して、ステップＳ１１へ進む。 Specifically, since there is a layer image P6 that is a contrast image that could be detected in a frame (t-1 frame) before the current frame (t frame), this layer image P6 is used as a contrast image for this frame. The detection target image is detected by the image search using the similarity information for the layer image P5 that could not be detected. In this case, instead of comparing the entire layer screen, the area (position and size) of the contrast image is retained as data, and this is finely adjusted by slightly reducing or enlarging and applied to the detection target layer. You may. Further, for the layer in which the image can be detected, data that can be used for further image processing is created by tagging the detected image. All of the layer images for which the image could not be detected this time are processed, and the process proceeds to step S11.

ステップＳ１１へ進むと、ｔの更新により、ｔ＋１フレーム目（回目）について処理が行われる。ここでは、ｔ＋１フレーム目（回目）であるので、ステップＳ１２においてＹｅｓへ分岐するのであるが、前述の通り、先にステップＳ１４、Ｓ１５を実行する。この１フレームの画像を取り込んで、所定レイヤの画像ピラミッドを生成する。画像ピラミッドを構成する複数レイヤの画像データについてＳ枚置きにレイヤの抽出を実行して検索対象レイヤ画像を抽出する。本実施形態では、特徴量を用いた画像検索に失敗したレイヤ画像についてのみ、類似度情報を用いた画像処理を行うために、図６（ｃ）のレイヤ画像Ｐ２、Ｐ４、Ｐ６においてテンプレート画像ＴＥＭＰの画像の検索が特徴量を用いて実行される。図６（ｃ）のレイヤ画像Ｐ２、Ｐ４において画像検出が失敗し、レイヤ画像Ｐ６において画像検出が成功したものとする。 Proceeding to step S11, processing is performed for the t + 1th frame (th time) by updating t. Here, since it is the t + 1th frame (th time), it branches to Yes in step S12, but as described above, steps S14 and S15 are executed first. This one-frame image is captured to generate an image pyramid of a predetermined layer. For the image data of a plurality of layers constituting the image pyramid, layer extraction is executed every S image to extract the search target layer image. In the present embodiment, in order to perform image processing using the similarity information only for the layer image for which the image search using the feature amount has failed, the template image TEMP is used in the layer images P2, P4, and P6 of FIG. 6 (c). The image of is searched using the feature amount. It is assumed that the image detection fails in the layer images P2 and P4 of FIG. 6C and the image detection succeeds in the layer image P6.

以上によりステップＳ１３へ戻って、ここでは、画像検出が失敗しているレイヤ画像Ｐ２、Ｐ４に対して処理を行う。つまり、この度のフレームの検出できなかったレイヤ画像Ｐ２、Ｐ４に対して類似度情報を使った画像検索により検出対象画像を検出する。なお、この場合に、レイヤ画面全体を比較するのではなく、対比画像の領域（位置とサイズ）をデータとして保持しておき、これを多少縮小・拡大により微調整して検出対象レイヤに適用しても良いことは前述の通りである。また、画像検出できたレイヤについては、検出できた画像にタグを付すなどして更なる画像処理へ使用できるデータを作成する。今回画像検出できなかったレイヤ全て処理したかを検出して、ステップＳ１１へ戻って、次の１フレームであるｔ＋２フレームの処理へと進むことになる。 As a result, the process returns to step S13, where processing is performed on the layer images P2 and P4 for which image detection has failed. That is, the detection target image is detected by the image search using the similarity information for the layer images P2 and P4 for which the frame could not be detected this time. In this case, instead of comparing the entire layer screen, the area (position and size) of the contrast image is retained as data, and this is finely adjusted by slightly reducing or enlarging and applied to the detection target layer. As mentioned above, it may be possible. Further, for the layer in which the image can be detected, data that can be used for further image processing is created by tagging the detected image. It is detected whether all the layers for which the image could not be detected this time have been processed, and the process returns to step S11 and proceeds to the processing of the next one frame, t + 2 frames.

上記の実施形態によれば、歩行者などの検出の場合、歩行者が瞬間移動して突然画面上に発生することはないため、過去のフレームにおいて特徴量を用いた画像検索での検出が成功する可能性が高く、本実施形態のように１フレームの複数レイヤ画像について所定数置きに抽出を行っても検出能力を落とすことなく高速化を実現できる。 According to the above embodiment, in the case of detecting a pedestrian or the like, the pedestrian does not teleport and suddenly occur on the screen, so that the detection by the image search using the feature amount in the past frame is successful. Even if a plurality of layer images of one frame are extracted at predetermined intervals as in the present embodiment, the speed can be increased without degrading the detection capability.

なお、上記実施形態では、ｔ−１フレーム目（回目）で、レイヤ画像Ｐ２、Ｐ４、Ｐ６においてテンプレート画像ＴＥＭＰの画像が特徴量を用いて検出し、ｔフレーム目（回目）で、レイヤ画像Ｐ１、Ｐ３、Ｐ５においてテンプレート画像ＴＥＭＰの画像が特徴量を用いて検出し、ｔ＋１フレーム目（回目）で、レイヤ画像Ｐ２、Ｐ４、Ｐ６においてテンプレート画像ＴＥＭＰの画像が特徴量を用いて検出し、ｔ−１フレーム目（回目）で、レイヤ画像Ｐ１、Ｐ３、Ｐ５において画像検出を行わず、ｔフレーム目（回目）で、レイヤ画像Ｐ２、Ｐ４、Ｐ６において画像検出を行わず、ｔ＋１フレーム目（回目）で、レイヤ画像Ｐ１、Ｐ３、Ｐ５において画像検出を行わないものを示した。 In the above embodiment, the image of the template image TEMP is detected by using the feature amount in the layer images P2, P4, and P6 at the t-1th frame (the second time), and the layer image P1 is detected at the t-1st frame (the second time). , P3, P5, the image of the template image TEMP is detected using the feature amount, and at the t + 1th frame (th), the image of the template image TEMP is detected using the feature amount in the layer images P2, P4, P6, and t. In the -1st frame (th), no image detection is performed in the layer images P1, P3, P5, and in the t frame (th), no image detection is performed in the layer images P2, P4, P6, and the t + 1th frame (th). ) Indicates that the layer images P1, P3, and P5 do not perform image detection.

上記第１の実施形態の処理に加えて、ｔ−１フレーム目（回目）で、抽出されなかったレイヤ画像Ｐ１、Ｐ３、Ｐ５において類似度情報を使った画像検索により画像検出を行い、ｔフレーム目（回目）で、抽出されなかったレイヤ画像Ｐ２、Ｐ４、Ｐ６において類似度情報を使った画像検索により画像検出を行い、ｔ＋１フレーム目（回目）で、抽出されなかったレイヤ画像Ｐ１、Ｐ３、Ｐ５において類似度情報を使った画像検索により画像検出を行うようにしても良い。 In addition to the processing of the first embodiment, image detection is performed in the t-1th frame (th time) by image search using similarity information in the layer images P1, P3, and P5 that have not been extracted, and the t-frame is used. Image detection was performed by image search using similarity information in the layer images P2, P4, and P6 that were not extracted at the second time, and the layer images P1, P3, and were not extracted at the t + 1th frame (second time). Image detection may be performed by image search using similarity information in P5.

対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成手段と、
検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索手段と、
前記第１の画像検索手段が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、更に、前記第１の画像検索手段が画像検索しなかったレイヤ画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段と
を具備することを特徴とする画像検出装置としても良い。 An image pyramid generation means that generates an image pyramid for the target image data,
A first image search means for detecting an image to be detected by an image search using a feature amount for a template image including the image to be detected and image data of a plurality of layers constituting the image pyramid.
Further, with respect to the contrast image which is the layer image in which the detection target image could not be detected by the search performed by the first image search means and the corresponding layer image in which the detection target image was detected in the previous search, the first As an image detection device, the image search means is provided with a second image search means for detecting a detection target image by an image search using similarity information for a layer image that has not been image-searched. Is also good.

また、第２の画像検索手段が、前記第１の画像検索手段が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、類似度情報を使った画像検索により検出対象画像を検出しなくとも良い。 Further, the second image search means is a comparison between the layer image in which the detection target image could not be detected by the search performed by the first image search means and the corresponding layer image in which the detection target image was detected in the previous search. It is not necessary to detect the image to be detected by the image search using the similarity information for the image.

すなわち、
対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成手段と、
前記画像ピラミッドを構成する複数レイヤの画像データについていくつか置きに検索対象レイヤ画像を抽出する検索対象抽出手段と
検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データであって、前記検索対象抽出手段により抽出されたレイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索手段と、
前記検索対象抽出手段により抽出されなかったレイヤの画像データであって、前記第１の画像検索手段が画像検索しなかったレイヤ画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段と
を具備することを特徴とする画像検出装置であっても良い。 That is,
An image pyramid generation means that generates an image pyramid for the target image data,
About the image data of a plurality of layers constituting the image pyramid A search target extraction means for extracting a search target layer image every other layer, a template image including an image to be detected, and image data of a plurality of layers constituting the image pyramid. Therefore, with respect to the image data of the layer extracted by the search target extraction means, the first image search means for detecting the detection target image by the image search using the feature amount, and
The image data of the layer that was not extracted by the search target extraction means, and the detection target image is obtained by image search using the similarity information with respect to the layer image that the first image search means did not search for. An image detection device may be provided, which comprises a second image search means for detecting.

更に、上記では、いずれの場合も、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段による処理は、レイヤを対象としたが、レイヤに限定されず、ピラミッド画像を生成した元の画像を対象としても良い。即ち、類似度情報を用いた検出では、任意のサイズで検出が可能なので、例えばピラミッド画像を生成する元となった画像（以下、元画像）を毎回用いるなどでも良い。例えば、上記のレイヤＰ２で検出された物体等の画像もレイヤＰ６で検出された物体等の画像も、サイズは異なるものの、それぞれ元画像には映っているので類似度情報を用いて検出することが可能である。 Further, in the above case, in any case, the processing by the second image search means for detecting the image to be detected by the image search using the similarity information targets the layer, but is not limited to the layer, and is not limited to the layer, and is a pyramid image. You may target the original image that generated. That is, in the detection using the similarity information, since the detection can be performed at an arbitrary size, for example, the image that is the source for generating the pyramid image (hereinafter, the original image) may be used every time. For example, the image of the object or the like detected on the layer P2 and the image of the object or the like detected on the layer P6 are different in size, but they are reflected in the original image, so they should be detected using the similarity information. Is possible.

１０ＣＰＵ
１１主メモリ
１２バス
１３外部記憶インタフェース
１４入力インタフェース
１５表示インタフェース
１６画像入力インタフェース
２２ポインティングデバイス
２３外部記憶装置
２４入力装置
２５表示装置
２６カメラ
３１画像ピラミッド生成手段
３２第１の画像検索手段
３３第２の画像検索手段
３４検索対象抽出手段 10 CPU
11 Main memory 12 Bus 13 External storage interface 14 Input interface 15 Display interface 16 Image input interface 22 Pointing device 23 External storage device 24 Input device 25 Display device 26 Camera 31 Image pyramid generation means 32 First image search means 33 Second Image search means 34 Search target extraction means

Claims

対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成手段と、
検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索手段と、
前記第１の画像検索手段が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段と
を具備することを特徴とする画像検出装置。 An image pyramid generation means that generates an image pyramid for the target image data,
A first image search means for detecting an image to be detected by an image search using a feature amount for a template image including the image to be detected and image data of a plurality of layers constituting the image pyramid.
Similarity information is used for the layer image in which the detection target image could not be detected by the search performed by the first image search means and the contrast image which is the corresponding layer image in which the detection target image was detected in the previous search. An image detection device including a second image search means for detecting an image to be detected by an image search.

前記画像ピラミッドを構成する複数レイヤの画像データについていくつかのレイヤの画像を抽出して検索対象レイヤ画像を抽出する検索対象抽出手段を具備し、
前記第１の画像検索手段は、前記検索対象抽出手段により抽出されたレイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出することを特徴とする請求項１に記載の画像検出装置。 A search target extraction means for extracting images of several layers from image data of a plurality of layers constituting the image pyramid and extracting a search target layer image is provided.
The image according to claim 1, wherein the first image search means detects a detection target image by an image search using a feature amount with respect to the image data of the layer extracted by the search target extraction means. Detection device.

前記検索対象抽出手段は、１つの画像ピラミッドに対しＳレイヤ置きに抽出を行って検索対象のレイヤ画像を抽出することを特徴とする請求項２に記載の画像検出装置。 The image detection device according to claim 2, wherein the search target extraction means extracts one image pyramid every S layer to extract a layer image to be searched.

前記検索対象抽出手段は、前記第１の画像検索手段が採用する特徴量の種類に応じてＳの値を決定することを特徴とする請求項３に記載の画像検出装置。 The image detection device according to claim 3, wherein the search target extraction means determines a value of S according to the type of feature amount adopted by the first image search means.

対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成ステップと、
検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索ステップと、
前記第１の画像検索ステップが行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像である対比画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索ステップと
を具備することを特徴とする画像検出方法。 An image pyramid generation step that generates an image pyramid for the target image data,
A first image search step of detecting a detection target image by an image search using a feature amount for a template image including a detection target image and image data of a plurality of layers constituting the image pyramid, and
Similarity information is used for the layer image in which the detection target image could not be detected by the search performed by the first image search step and the contrast image which is the corresponding layer image in which the detection target image was detected in the previous search. An image detection method comprising a second image search step of detecting an image to be detected by an image search.

前記画像ピラミッドを構成する複数レイヤの画像データについていくつかのレイヤの画像を抽出して検索対象レイヤ画像を抽出する検索対象抽出ステップを具備し、
前記第１の画像検索ステップでは、前記検索対象抽出ステップにより抽出されたレイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出することを特徴とする請求項５に記載の画像検出方法。 A search target extraction step for extracting images of several layers for image data of a plurality of layers constituting the image pyramid and extracting a search target layer image is provided.
The image according to claim 5, wherein in the first image search step, a detection target image is detected by an image search using a feature amount with respect to the image data of the layer extracted by the search target extraction step. Detection method.

前記検索対象抽出ステップでは、１つの画像ピラミッドに対してＳレイヤ置きに抽出を行って検索対象のレイヤ画像を抽出することを特徴とする請求項６に記載の画像検出方法。 The image detection method according to claim 6, wherein in the search target extraction step, one image pyramid is extracted every S layer to extract a layer image to be searched.

前記検索対象抽出ステップでは、前記第１の画像検索ステップが採用する特徴量の種類に応じてＳの値を決定することを特徴とする請求項７に記載の画像検出方法。 The image detection method according to claim 7, wherein in the search target extraction step, the value of S is determined according to the type of feature amount adopted by the first image search step.

コンピュータを、
対象の画像データに対し画像ピラミッドを生成する画像ピラミッド生成手段、
検出対象の画像が含まれるテンプレート画像と前記画像ピラミッドを構成する複数レイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出する第１の画像検索手段、
前記第１の画像検索手段が行った検索によって検出対象画像が検出できなかったレイヤ画像と前の検索で検出対象画像が検出された対応レイヤ画像に対して、類似度情報を使った画像検索により検出対象画像を検出する第２の画像検索手段
として機能させることを特徴とする画像検出用プログラム。 Computer,
Image pyramid generation means, which generates an image pyramid for the target image data,
A first image search means for detecting a detection target image by an image search using a feature amount for a template image including a detection target image and image data of a plurality of layers constituting the image pyramid.
By image search using similarity information for the layer image for which the detection target image could not be detected by the search performed by the first image search means and the corresponding layer image for which the detection target image was detected in the previous search. An image detection program characterized by functioning as a second image search means for detecting an image to be detected.

前記コンピュータを、更に、
前記画像ピラミッドを構成する複数レイヤの画像データについていくつかのレイヤの画像を抽出して検索対象レイヤ画像を抽出する検索対象抽出手段として機能させ、
前記コンピュータを前記第１の画像検索手段として、前記検索対象抽出手段により抽出されたレイヤの画像データについて、特徴量を使った画像検索により検出対象画像を検出するように機能させることを特徴とする請求項９に記載の画像検出用プログラム。 The computer,
It is made to function as a search target extraction means for extracting images of several layers from the image data of a plurality of layers constituting the image pyramid and extracting a search target layer image.
The computer is used as the first image search means to function to detect a detection target image by an image search using a feature amount for the image data of the layer extracted by the search target extraction means. The image detection program according to claim 9.

前記コンピュータを前記検索対象抽出手段として、１つの画像ピラミッドに対してＳレイヤ置きに抽出を行って検索対象のレイヤ画像を抽出するように機能させることを特徴とする請求項１０に記載の画像検出用プログラム。 The image detection according to claim 10, wherein the computer is used as the search target extraction means to extract one image pyramid every S layer to extract a layer image to be searched. Program for.

前記コンピュータを前記検索対象抽出手段として、前記第１の画像検索手段が採用する特徴量の種類に応じてＳの値を決定するように機能させることを特徴とする請求項１１に記載の画像検出用プログラム。 The image detection according to claim 11, wherein the computer is used as the search target extraction means to function so as to determine the value of S according to the type of feature amount adopted by the first image search means. Program for.