JP5100688B2

JP5100688B2 - Object detection apparatus and program

Info

Publication number: JP5100688B2
Application number: JP2009056244A
Authority: JP
Inventors: 美也子馬場; 歳康勝野; 展彦井上
Original assignee: Denso Corp; Toyota Central R&D Labs Inc
Current assignee: Denso Corp; Toyota Central R&D Labs Inc
Priority date: 2009-03-10
Filing date: 2009-03-10
Publication date: 2012-12-19
Anticipated expiration: 2029-03-10
Also published as: JP2010211460A

Description

本発明は、対象物検出装置及びプログラムに係り、特に、撮像した画像から対象物を検出するための対象物検出装置及びプログラムに関する。 The present invention relates to a Target-object detection apparatus及beauty programs, in particular, it relates to Target object detection apparatus及beauty program for detecting an object from the captured image.

近年、車載カメラで撮像した車両周辺の映像を画像処理し、歩行者などの対象物を検出してドライバに検出結果を提示する対象物検出装置を搭載する車両が増加している。 2. Description of the Related Art In recent years, an increasing number of vehicles are equipped with an object detection device that performs image processing on an image around a vehicle imaged by an in-vehicle camera, detects an object such as a pedestrian, and presents a detection result to a driver.

対象物を検出する方法として、予め対象物パターンを学習させた識別モデルを生成しておき、入力された画像が識別モデルに登録されたデータに近いか否かを判断することで対象物を検出する方法が一般的である。具体的には、例えば、Ｖｉｏｌａ＆Ｊｏｎｅｓの手法等、様々な手法がある。 As a method for detecting an object, an identification model in which an object pattern is learned in advance is generated, and the object is detected by determining whether the input image is close to data registered in the identification model. The method to do is common. Specifically, for example, there are various techniques such as the technique of Viola & Jones.

このように入力画像と識別モデルとを比較することにより対象物を検出する場合において、入力画像と識別モデルとの間に位置ずれや大きさのずれが生じていると正確に対象物を検出することができない。そこで、入力画像の部分領域と、予め保存してある教示画像の部分領域とに対して、入力画像の部分領域における座標と、それと同じ座標を含む教示画像の複数の座標点の間で輝度差分の絶対値をとり、そのうち最も小さい値をその座標における輝度値として第１出力画像を作成し、第１出力画像が輝度値ゼロの画像に近い場合、入力画像の部分領域と教示画像の部分領域とが同一であると判定する画像認識装置が提案されている（例えば、特許文献１参照）。 Thus, in the case of detecting an object by comparing the input image and the identification model, the object is accurately detected if a positional deviation or a size deviation occurs between the input image and the identification model. I can't. Therefore, with respect to the partial area of the input image and the partial area of the teaching image stored in advance, the luminance difference between the coordinates in the partial area of the input image and a plurality of coordinate points of the teaching image including the same coordinates as the input image partial area. When the first output image is created with the smallest value as the luminance value at the coordinates, and the first output image is close to an image with a luminance value of zero, the partial region of the input image and the partial region of the teaching image Has been proposed (see, for example, Patent Document 1).

また、入力画像の部分領域を抽出するには、入力画像の全画面に対して所定サイズの探索ウインドウを所定の探索ステップでスキャンしながら抽出することが行われている。 In order to extract a partial area of an input image, a search window having a predetermined size is extracted while scanning a predetermined search step with respect to the entire screen of the input image.

特開２００１−２２９２６号公報JP 2001-22926 A

しかしながら、特許文献１の画像認識装置では、対象物検出の精度を向上させるためには、対象物画像の部分領域に対応して教示画像の部分領域から取る座標の数を多くする必要があり、比較する座標の数が多いほど識別に要する時間が長くなる、という問題がある。また、部分領域を抽出する際に、探索ウインドウのサイズの種類を多くしたり、探索ステップを小さくしたりするなどして密な探索を行うと、探索回数が増加し、識別に要する時間が長くなる、という問題がある。 However, in the image recognition apparatus of Patent Document 1, in order to improve the accuracy of object detection, it is necessary to increase the number of coordinates taken from the partial area of the teaching image in correspondence with the partial area of the object image. There is a problem that the time required for identification becomes longer as the number of coordinates to be compared increases. In addition, when extracting a partial area, if a dense search is performed by increasing the size of the search window or reducing the search step, the number of searches increases and the time required for identification increases. There is a problem of becoming.

本発明は、上述した問題を解決するために成されたものであり、識別に要する時間が長くなることを抑制するために粗い探索を行った場合でも、対象物検出の精度を向上させることができる識別モデル生成装置、対象物検出装置、及び識別モデル生成プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and can improve the accuracy of object detection even when a rough search is performed in order to suppress an increase in the time required for identification. It is an object to provide an identification model generation device, an object detection device, and an identification model generation program.

上記目的を達成するために本発明の対象物検出装置は、入力画像に対して所定サイズのウインドウ枠を所定量ずつ移動させながら前記入力画像から前記ウインドウ枠内の画像をウインドウ画像として複数抽出する抽出手段と、前記抽出手段により抽出されたウインドウ画像の各々と、各々が異なる識別対象をウインドウ内に含む複数の第１の学習用画像、及び各々が前記識別対象以外の異なる対象をウインドウ内に含む複数の第２の学習用画像を用いて学習することにより第１の識別モデルを生成する第１の識別モデル生成手段と、前記第１の学習用画像の各々について、第１の学習用画像の識別対象の位置及び大きさの少なくとも一方をウインドウ内でずらしたずれ画像を複数生成するずれ画像生成手段と、前記ずれ画像生成手段で生成された複数のずれ画像の各々と、前記第１の識別モデルとを比較することにより、前記ずれ画像の識別のし易さまたはし難さを示す評価値を算出する算出手段と、前記算出手段で算出された評価値に基づいて、識別し難いずれ画像を前記第１の学習用画像の各々について該第１の学習用画像に対応する第３の学習用画像として選択し、前記複数の第１の学習用画像の各々に対応して選択された複数の第３の学習用画像、及び前記複数の第２の学習用画像を用いて学習することにより第２の識別モデルを生成する第２の識別モデル生成手段と、を含む識別モデル生成装置で生成された前記第２の識別モデルとを比較して、ウインドウ画像が識別対象であるか否かを識別することにより、前記入力画像から識別対象を検出する検出手段とを含んで構成されている。 In order to achieve the above object, the object detection apparatus of the present invention extracts a plurality of images in the window frame from the input image as window images while moving the window frame of a predetermined size by a predetermined amount with respect to the input image. Extraction means, each of the window images extracted by the extraction means , a plurality of first learning images each including a different identification target in the window, and each different target other than the identification target in the window A first identification model generating means for generating a first identification model by learning using a plurality of second learning images, and a first learning image for each of the first learning images. A misaligned image generating means for generating a plurality of misaligned images obtained by shifting at least one of the position and size of the identification target within the window, and the misaligned image generating means A calculating unit that calculates an evaluation value indicating the ease or difficulty of identifying the shifted image by comparing each of the plurality of shifted images with the first identification model; and Based on the calculated evaluation value, a difficult-to-identify image is selected as a third learning image corresponding to the first learning image for each of the first learning images, and the plurality of first images A second identification model is generated by learning using a plurality of third learning images selected corresponding to each of the learning images and the plurality of second learning images. An identification model generating means, and comparing the second identification model generated by the identification model generation device including the identification model generation means to identify whether the window image is the identification target, thereby identifying the identification target from the input image and a detection means for detecting a configuration It is.

また、本発明の対象物検出プログラムは、コンピュータを、入力画像に対して所定サイズのウインドウ枠を所定量ずつ移動させながら前記入力画像から前記ウインドウ枠内の画像をウインドウ画像として複数抽出する抽出手段、及び前記抽出手段により抽出されたウインドウ画像の各々と、各々が異なる識別対象をウインドウ内に含む複数の第１の学習用画像、及び各々が前記識別対象以外の異なる対象をウインドウ内に含む複数の第２の学習用画像を用いて学習することにより第１の識別モデルを生成する第１の識別モデル生成手段と、前記第１の学習用画像の各々について、第１の学習用画像の識別対象の位置及び大きさの少なくとも一方をウインドウ内でずらしたずれ画像を複数生成するずれ画像生成手段と、前記ずれ画像生成手段で生成された複数のずれ画像の各々と、前記第１の識別モデルとを比較することにより、前記ずれ画像の識別のし易さまたはし難さを示す評価値を算出する算出手段と、前記算出手段で算出された評価値に基づいて、識別し難いずれ画像を前記第１の学習用画像の各々について該第１の学習用画像に対応する第３の学習用画像として選択し、前記複数の第１の学習用画像の各々に対応して選択された複数の第３の学習用画像、及び前記複数の第２の学習用画像を用いて学習することにより第２の識別モデルを生成する第２の識別モデル生成手段とを含む識別モデル生成装置で生成された前記第２の識別モデルとを比較して、ウインドウ画像が識別対象であるか否かを識別することにより、前記入力画像から識別対象を検出する検出手段として機能させるためのプログラムである。 Further, the object detection program of the present invention is an extraction means for extracting a plurality of images in the window frame as window images from the input image while moving a window frame of a predetermined size by a predetermined amount with respect to the input image. And each of the window images extracted by the extracting means , a plurality of first learning images each including a different identification target in the window, and a plurality of each including a different target other than the identification target in the window First identification model generation means for generating a first identification model by learning using the second learning image, and identification of the first learning image for each of the first learning images A shifted image generating means for generating a plurality of shifted images obtained by shifting at least one of the position and size of the object within the window; and the shifted image generating means Calculating means for calculating an evaluation value indicating the ease or difficulty of identifying the misaligned image by comparing each of the plurality of misaligned images formed with the first identification model; and the calculation Based on the evaluation value calculated by the means, a difficult-to-identify image is selected as a third learning image corresponding to the first learning image for each of the first learning images, and the plurality of images A second identification model is generated by learning using a plurality of third learning images selected corresponding to each of the first learning images and the plurality of second learning images. The input image is identified by comparing the second identification model generated by the identification model generation device including two identification model generation means and identifying whether the window image is an identification target or not. machine as a detection means for detecting a target Is a program of the order to.

本発明の対象物検出装置及びプログラムで用いられる第２の識別モデルを生成する識別モデル生成装置によれば、第１の識別モデル生成手段が、各々が異なる識別対象をウインドウ内に含む複数の第１の学習用画像、及び各々が識別対象以外の異なる対象をウインドウ内に含む複数の第２の学習用画像を用いて学習することにより第１の識別モデルを生成する。そして、ずれ画像生成手段が、第１の識別モデルを生成する際に用いられた第１の学習用画像の各々について、第１の学習用画像の識別対象の位置及び大きさの少なくとも一方をウインドウ内でずらしたずれ画像を複数生成し、算出手段が、ずれ画像生成手段で生成された複数のずれ画像の各々と、第１の識別モデルとを比較することにより、ずれ画像の識別のし易さまたはし難さを示す評価値を算出する。そして、第２の識別モデル生成手段が、算出手段で算出された評価値に基づいて、識別し難いずれ画像を第１の学習用画像の各々について該第１の学習用画像に対応する第３の学習用画像として選択し、複数の第１の学習用画像の各々に対応して選択された複数の第３の学習用画像、及び複数の第２の学習用画像を用いて学習することにより第２の識別モデルを生成する。 According to the object detection device and the identification model generation device that generates the second identification model used in the program according to the present invention, the first identification model generation means includes a plurality of second identification models each including different identification objects in the window. A first identification model is generated by learning using one learning image and a plurality of second learning images each including a different object other than the identification object in the window. Then, for each of the first learning images used when the shifted image generating means generates the first identification model, at least one of the position and the size of the identification target of the first learning image is displayed in the window. A plurality of misaligned images generated within the image are generated, and the calculating means compares each of the misaligned images generated by the misaligned image generating means with the first identification model, thereby easily identifying the misaligned images. An evaluation value indicating the difficulty or difficulty is calculated. Then, the second identification model generation means corresponds to the first learning image corresponding to the first learning image for each of the first learning images based on the evaluation value calculated by the calculation means. And learning using a plurality of third learning images and a plurality of second learning images selected corresponding to each of the plurality of first learning images. A second identification model is generated.

このように、第１の識別モデルと比較して識別し難いと評価されたずれ画像を用いて第２の識別モデルを生成するため、第２の識別モデルを用いて識別対象を検出する際に、識別に要する時間が長くなることを抑制するために粗い探索を行い、入力画像から抽出された画像に位置ずれや大きさずれが生じているような場合でも、対象物検出の精度を向上させることができる。 As described above, since the second identification model is generated using the misaligned image that is evaluated to be difficult to identify compared to the first identification model, when the identification target is detected using the second identification model. In order to prevent the time required for identification from being increased, a rough search is performed to improve the accuracy of object detection even when the image extracted from the input image has a positional deviation or a size deviation. be able to.

また、前記ずれ画像の中で最も識別し難いずれ画像を前記第３の学習用画像として選択することができる。 Further, the most difficult-to-identify image among the shifted images can be selected as the third learning image.

また、前記ずれ画像生成手段は、前記第１の学習用画像の各々について、前記識別対象の位置をウインドウの上下方向及び左右方向の長さの５％以下の量、ウインドウの上下方向及び左右方向にずらすことによりずれ画像を生成するようにしたり、前記第１の学習用画像の各々について、ウインドウ内で識別対象を縮小率９５％以上または拡大率１０５％以下で拡縮することによりずれ画像を生成するようにしたりすることができる。位置ずれ及び大きさずれのずれ量が５％を超えるずれ画像では、識別対象として識別されない可能性が高まるため、第２の識別モデル生成の際に用いる学習用画像とするには現実的ではない。そこで、ずれ画像を生成する際の位置ずれ及び大きさずれのずれ量を５％以下とすることで、適切なずれ画像を生成することができる。 Further, the shift image generation means sets the position of the identification target for each of the first learning images to an amount of 5% or less of the vertical and horizontal lengths of the window, the vertical and horizontal directions of the window. A shifted image is generated by shifting the image to the first learning image, or a classification image is generated by scaling the identification target within the window at a reduction ratio of 95% or more or an enlargement ratio of 105% or less. Or you can. It is not realistic to use a learning image used for generating the second identification model because a deviation image in which the displacement amount of the positional deviation and the size deviation exceeds 5% increases the possibility that it is not identified as an identification target. . Therefore, by setting the amount of positional deviation and size deviation when generating a shifted image to 5% or less, an appropriate shifted image can be generated.

本発明の対象物検出装置によれば、上記の識別モデル生成装置により生成された第２の識別モデルを用いて対象物の検出を行うため、識別に要する時間が長くなることを抑制するために粗い探索を行った場合でも、対象物検出の精度を向上させることができる。 According to the object detection device of the present invention, since the object is detected using the second identification model generated by the above-described identification model generation device, it is possible to suppress an increase in time required for identification. Even when a rough search is performed, the accuracy of object detection can be improved.

以上説明したように、本発明によれば、識別に要する時間が長くなることを抑制するために粗い探索を行った場合でも、対象物検出の精度を向上させることができる、という効果が得られる。 As described above, according to the present invention, it is possible to improve the accuracy of object detection even when a rough search is performed to suppress an increase in the time required for identification. .

本実施の形態に係る歩行者検出装置の概略構成を示すブロック図である。It is a block diagram which shows schematic structure of the pedestrian detection apparatus which concerns on this Embodiment. 本実施の形態における識別モデル生成処理ルーチンを示すフローチャートである。It is a flowchart which shows the identification model generation process routine in this Embodiment. （Ａ）対象物画像の学習用画像、及び（Ｂ）非対象物画像の学習用画像について説明するための図である。It is a figure for demonstrating the image for learning of (A) object image, and the image for learning of (B) non-object image. 本実施の形態の識別モデル生成の流れを説明するための図である。It is a figure for demonstrating the flow of the identification model production | generation of this Embodiment. （Ａ）位置ずれ画像、及び（Ｂ）大きさずれ画像の生成を説明するための図である。It is a figure for demonstrating the production | generation of (A) position shift image and (B) size shift image. 本実施の形態における歩行者検出処理ルーチンを示すフローチャートである。It is a flowchart which shows the pedestrian detection process routine in this Embodiment. 本実施の形態の場合、参考識別モデルを用いた場合、及び第１の識別モデルを用いた場合の歩行者検出性能を示すＲＯＣ曲線である。In the case of this Embodiment, it is a ROC curve which shows the pedestrian detection performance at the time of using a reference identification model, and a 1st identification model. 本実施の形態の場合、参考識別モデルを用いた場合、及び第１の識別モデルを用いた場合の誤検出数に対する特徴数を示すグラフである。In the case of the present embodiment, it is a graph showing the number of features with respect to the number of false detections when a reference identification model is used and when a first identification model is used. 本実施の形態で粗探索を行った場合、第１の識別モデルで粗探索を行った場合、第１の識別モデルを用いて４倍の密検索を行った場合、第１の識別モデルを用いて８倍の密探索を行った場合の歩行者検出性能を示すＲＯＣ曲線である。When a rough search is performed in the present embodiment, when a rough search is performed using the first identification model, when a four-fold dense search is performed using the first identification model, the first identification model is used. It is a ROC curve which shows the pedestrian detection performance at the time of performing a dense search of 8 times. 本実施の形態で粗探索を行った場合、第１の識別モデルで粗探索を行った場合、第１の識別モデルを用いて４倍の密検索を行った場合、第１の識別モデルを用いて８倍の密探索を行った場合の誤検出数に対する特徴数を示すグラフである。When a rough search is performed in the present embodiment, when a rough search is performed using the first identification model, when a four-fold dense search is performed using the first identification model, the first identification model is used. 6 is a graph showing the number of features with respect to the number of false detections when a dense search of 8 times is performed. 計算コストを比較した表である。It is the table | surface which compared calculation cost.

以下、図面を参照して本発明の実施の形態を詳細に説明する。本実施の形態では、対象物として歩行者を検出する歩行者検出装置に本発明の対象物検出装置を適用した場合について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. This Embodiment demonstrates the case where the target object detection apparatus of this invention is applied to the pedestrian detection apparatus which detects a pedestrian as a target object.

図１に示すように、本実施の形態に係る歩行者検出装置１０は、識別対象領域を含む範囲を撮像する撮像装置１２と、撮像装置１２から出力される撮像画像に基づいて歩行者を検出する歩行者検出処理ルーチンを実行するコンピュータ１６と、コンピュータ１６での処理結果を表示するための表示装置１８とを備えている。 As illustrated in FIG. 1, the pedestrian detection device 10 according to the present embodiment detects a pedestrian based on an imaging device 12 that captures a range including an identification target region and a captured image output from the imaging device 12. The computer 16 which performs the pedestrian detection process routine to perform, and the display apparatus 18 for displaying the process result in the computer 16 are provided.

撮像装置１２は、識別対象領域を含む範囲を撮像し、画像信号を生成する撮像部（図示省略）と、撮像部で生成されたアナログ信号である画像信号をデジタル信号に変換するＡ／Ｄ変換部（図示省略）と、Ａ／Ｄ変換された画像信号を一時的に格納するための画像メモリ（図示省略）とを備えている。 The imaging device 12 captures an area including an identification target region and generates an image signal (not shown), and A / D conversion that converts an analog image signal generated by the imaging unit into a digital signal And an image memory (not shown) for temporarily storing the A / D converted image signal.

コンピュータ１６は、歩行者検出装置１０全体の制御を司るＣＰＵ、後述する歩行者検出処理のプログラム等を記憶した記憶媒体としてのＲＯＭ、ワークエリアとしてデータを一時格納するＲＡＭ、及びこれらを接続するバスを含んで構成されている。このような構成の場合には、各構成要素の機能を実現するためのプログラムをＲＯＭやＨＤＤ等の記憶媒体に記憶しておき、これをＣＰＵが実行することによって、各機能が実現されるようにする。 The computer 16 includes a CPU that controls the pedestrian detection device 10 as a whole, a ROM as a storage medium that stores a pedestrian detection processing program to be described later, a RAM that temporarily stores data as a work area, and a bus that connects these. It is comprised including. In the case of such a configuration, a program for realizing the function of each component is stored in a storage medium such as a ROM or HDD, and each function is realized by executing the program by the CPU. To.

このコンピュータ１６をハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図１に示すように、撮像装置１２で撮像されコンピュータ１６へ入力された入力画像から所定領域を抽出するウインドウ画像抽出部２２と、ウインドウ画像抽出部２２により抽出されたウインドウ画像と識別モデルとを比較することにより、入力画像から歩行者を検出する検出部２８と、撮像装置１２によって撮像された撮像画像に、検出部２８による検出結果を重畳させて表示装置１８に表示するよう制御する表示制御部３０とを含んだ構成で表すことができる。 When the computer 16 is described with functional blocks divided for each function realizing means determined based on hardware and software, a predetermined image is input from an input image captured by the imaging device 12 and input to the computer 16 as shown in FIG. A window image extraction unit 22 that extracts a region, a detection unit 28 that detects a pedestrian from an input image by comparing the window image extracted by the window image extraction unit 22 with an identification model, and an image pickup device 12 It can be expressed by a configuration including a display control unit 30 that controls to superimpose a detection result of the detection unit 28 on the captured image and display it on the display device 18.

ウインドウ画像抽出部２２は、入力画像から予め定められたサイズのウインドウ（探索ウインドウと呼称）を１ステップにつき、予め定められた移動量（探索ステップと呼称）だけ移動させながら画像を切り取る。ここでは、切り取った画像をウインドウ画像といい、ウインドウ画像のサイズ（すなわち探索ウインドウのサイズ）をウインドウサイズと呼称する。ウインドウサイズは様々なサイズの歩行者を検出するために複数種設定されており、ウインドウ画像抽出部２２は、設定されている全てのウインドウサイズの探索ウインドウを用いてウインドウ画像を抽出する。また、ウインドウ画像抽出部２２は、抽出したウインドウ画像を予め設定された画素数の画像（例えば、横１６×縦３２画素の画像）に変換する。 The window image extraction unit 22 cuts out an image while moving a predetermined size window (referred to as a search window) from the input image by a predetermined movement amount (referred to as a search step) per step. Here, the cut image is referred to as a window image, and the size of the window image (that is, the size of the search window) is referred to as a window size. A plurality of window sizes are set in order to detect pedestrians of various sizes, and the window image extraction unit 22 extracts window images using search windows of all the set window sizes. In addition, the window image extraction unit 22 converts the extracted window image into an image having a preset number of pixels (for example, an image of 16 horizontal x 32 vertical pixels).

検出部２８は、ウインドウ画像抽出部２２により抽出されたウインドウ画像と、後述する識別モデル生成装置５０で生成された第２の識別モデルとを比較してウインドウ画像が識別対象であるか否かを識別し、ウインドウ画像が識別対象であると識別された場合には、入力画像内におけるウインドウ画像を歩行者として検出する。 The detection unit 28 compares the window image extracted by the window image extraction unit 22 with a second identification model generated by the identification model generation device 50 described later to determine whether the window image is an identification target. When the window image is identified as an identification target, the window image in the input image is detected as a pedestrian.

識別モデル生成装置５０は、ＣＰＵ、ＲＯＭ、ＲＡＭ、及び内蔵ＨＤＤ等を含んで構成されたコンピュータで構成することができる。このような構成の場合には、各構成要素の機能を実現するためのプログラムをＲＯＭやＨＤＤ等の記憶媒体に記憶しておき、これをＣＰＵが実行することによって、各機能が実現されるようにする。また、識別モデル生成装置５０を歩行者検出装置１０と互いに独立したマイクロコンピュータで構成してもよいし、同一コンピュータ上で構成することも可能である。 The identification model generation device 50 can be configured by a computer including a CPU, a ROM, a RAM, a built-in HDD, and the like. In the case of such a configuration, a program for realizing the function of each component is stored in a storage medium such as a ROM or HDD, and each function is realized by executing the program by the CPU. To. Further, the identification model generation device 50 may be constituted by a microcomputer independent of the pedestrian detection device 10 or may be constituted on the same computer.

識別モデル生成装置５０をハードウエアとソフトウエアとに基づいて定まる機能実現手段毎に分割した機能ブロックで説明すると、図１に示すように、入力された学習用画像を用いて学習して第１の識別モデルを生成する第１の識別モデル生成部５２と、入力された学習用画像から位置及び大きさの少なくとも一方をずらしたずれ画像を生成し、生成したずれ画像の識別のし易さまたはし難さを評価するずれ画像生成評価部５４と、ずれ画像生成評価部５４で、評価値が最小となったずれ画像を学習用画像として用いて学習して第２の識別モデルを生成する第２の識別モデル生成部５６と、第２の識別モデル生成部５６で生成された第２の識別モデルを記憶する識別モデル記憶部５８とを含んだ構成で表すことができる。 If the identification model generation device 50 is described with function blocks divided for each function realizing means determined based on hardware and software, as shown in FIG. A first identification model generation unit 52 for generating the identification model, and generating a shifted image in which at least one of the position and the size is shifted from the input learning image, and the generated shifted image is easily identified or The shifted image generation evaluation unit 54 for evaluating difficulty and the shifted image generation evaluation unit 54 learn by using the shifted image having the smallest evaluation value as a learning image to generate a second identification model. It can be expressed by a configuration including two identification model generation units 56 and an identification model storage unit 58 that stores the second identification model generated by the second identification model generation unit 56.

第１の識別モデル生成部５２は、様々な服装、背景、大きさ等のバリエーションを持つ歩行者が撮影された対象物画像の学習用画像、及び標識、自動車、背景等の歩行者以外が撮影された非対象物画像の学習用画像が所定枚数（例えば、各１０００枚）入力されると、これらの学習用画像をブースティングアルゴリズムを用いて学習して、第１の識別モデルとしてカスケード型識別器を生成する。 The first identification model generation unit 52 captures an image for learning of an object image in which pedestrians having various clothes, backgrounds, sizes, and the like are photographed, and photographs other than pedestrians such as signs, automobiles, and backgrounds. When a predetermined number (for example, 1000) of learning images of the non-object images that have been input are input, these learning images are learned using a boosting algorithm, and cascade identification is performed as a first identification model. Create a container.

ずれ画像生成評価部５４は、対象物画像の学習用画像について、上下方向及び左右方向に識別対象の位置をずらしたずれ画像、識別対象を拡縮することにより識別対象の大きさをずらしたずれ画像、及び識別対象の位置及び大きさの両方をずらしたずれ画像を生成する。生成されたずれ画像の各々と第１の識別モデルとを比較し、ずれ画像の各々について識別のし易さを示す評価値を算出する。 The misaligned image generation evaluation unit 54 is a misaligned image in which the position of the identification target is shifted in the vertical direction and the horizontal direction, and the misaligned image in which the size of the identification target is shifted by enlarging or reducing the identification target. And a shifted image in which both the position and size of the identification target are shifted. Each of the generated shifted images is compared with the first identification model, and an evaluation value indicating the ease of identification of each of the shifted images is calculated.

第２の識別モデル生成部５６は、第１の識別モデル生成の際に用いた対象物画像の学習用画像毎に、最も識別し難い（評価値最小）ずれ画像を選択し、選択された評価値最小のずれ画像、及び第１の識別モデル生成の際に用いた非対象物画像の学習用画像を用いて、第１の識別モデルを生成した際と同様の手法により第２の識別モデルを生成する。 The second identification model generation unit 56 selects a misaligned image that is most difficult to identify (minimum evaluation value) for each learning image of the object image used in generating the first identification model, and the selected evaluation The second identification model is obtained by the same method as that used when the first identification model is generated by using the image having the smallest value and the learning image of the non-object image used in generating the first identification model. Generate.

識別モデル記憶部５８は、ハードディスクドライブ（ＨＤＤ）やＣＤ−ＲＯＭ等のように、内蔵または外付けの記憶手段であって、第２の識別モデルを記憶できる媒体により構成されている。識別モデル記憶部５８に記憶された第２の識別モデルは、歩行者検出装置１０の検出部２８で歩行者の検出に利用される。 The identification model storage unit 58 is a built-in or external storage unit, such as a hard disk drive (HDD) or a CD-ROM, and is configured of a medium that can store the second identification model. The second identification model stored in the identification model storage unit 58 is used for detection of a pedestrian by the detection unit 28 of the pedestrian detection device 10.

ここで、図２を参照して、識別モデル生成処理ルーチンについて説明する。 Here, the identification model generation processing routine will be described with reference to FIG.

ステップ１００で、学習用画像を１枚入力する。次に、ステップ１０２で、図３（Ａ）に示すように、入力された学習用画像が対象物画像の学習用画像６０の場合には、対象物画像の学習用画像６０の中で歩行者６２が含まれている領域を所定の縦横比（例えば、横１：縦２）で切り取る。また、同図（Ｂ）に示すように、入力された学習用画像が非対象物画像の学習用画像６６の場合には、非対象物画像の学習用画像６６の所定領域を所定の縦横比（例えば、横１：縦２）で切り取る。 In step 100, one learning image is input. Next, in step 102, as shown in FIG. 3A, when the input learning image is the learning image 60 of the object image, the pedestrian in the learning image 60 of the object image. A region including 62 is cut out at a predetermined aspect ratio (for example, horizontal 1: vertical 2). Also, as shown in FIG. 5B, when the input learning image is a non-object image learning image 66, a predetermined area of the non-object image learning image 66 is set to a predetermined aspect ratio. Cut (for example, horizontal 1: vertical 2).

次にステップ１０４で、切り取った画像をバイリニア法等により、所定のサイズ（例えば、１６×３２画素）に正規化する。正規化された対象物画像の学習用画像をＰｏｓ_ｉ６４（ｉ＝１、２、・・・、ｎ：ｎは用意した対象物画像の学習用画像６０の枚数、例えば１０００枚）とし、正規化された非対象物画像の学習用画像をＮｅｇ_ｉ６８（ｉ＝１、２、・・・、ｍ：ｍは用意した非対象物画像の学習用画像６６の枚数、例えば１０００枚）とする。Ｐｏｓ_１〜Ｐｏｓ_ｎをＰｏｓｉｔｉｖｅデータ（１）とし、Ｎｅｇ_１〜Ｎｅｇ_ｍをＮｅｇａｔｉｖｅデータ（１）とする。 Next, in step 104, the cut image is normalized to a predetermined size (for example, 16 × 32 pixels) by a bilinear method or the like. The normalized learning image of the object image is assumed to be Pos _i 64 (i = 1, 2,..., N: n is the number of learning images 60 of the prepared object image, for example, 1000). The learning image of the converted non-object image is Neg _i 68 (i = 1, 2,..., M: m is the number of prepared non-object image learning images 66, for example, 1000). . The pos ₁ -POS _n and Positive data _(1), the Neg 1 ~Neg _m and Negative data (1).

次に、ステップ１０６で、用意しておいたすべての学習用画像について、上記ステップ１０２及びステップ１０４の処理が終了したか否かを判断する。終了した場合には、ステップ１０８へ進み、終了していない場合には、ステップ１００へ戻り、次の学習用画像を入力して、処理を繰り返す。 Next, in step 106, it is determined whether or not the processing in step 102 and step 104 has been completed for all prepared learning images. If completed, the process proceeds to step 108. If not completed, the process returns to step 100, the next learning image is input, and the process is repeated.

ステップ１０８で、図４（Ａ）に示すように、Ｐｏｓｉｔｉｖｅデータ（１）及びＮｅｇａｔｉｖｅデータ（１）を用いてブースティングアルゴリズムにより学習して、第１の識別モデルとしてカスケード型識別器を生成する。生成した第１の識別モデルは、一旦所定の記憶領域に記憶する。 In step 108, as shown in FIG. 4A, learning is performed by the boosting algorithm using the positive data (1) and the negative data (1) to generate a cascade classifier as the first identification model. The generated first identification model is temporarily stored in a predetermined storage area.

次に、ステップ１１０で、第１の識別モデル生成に用いたＰｏｓｉｔｉｖｅデータ（１）の各々について、対象物画像の学習用画像６０を用いてずれ画像を生成する。図５（Ａ）に示すように、識別対象の位置をずらした位置ずれ画像７０ａは、対象物画像の学習用画像６０のＰｏｓ_ｉ６４を切り取った領域から、左右方向及び上下方向の少なくとも一方へ平行移動した領域を切り取って、所定のサイズ（例えば、１６×３２画素）に正規化することにより生成する。切り取る領域を平行移動させることにより、ずれ画像内での識別対象の位置をずらすことができる。左右方向へ移動させる場合の移動量７２は、Ｐｏｓ_ｉ６４の横サイズの２．５％、及び５％の量とする。同様に、上下方向へ移動させる場合の移動量は、Ｐｏｓ_ｉ６４の縦サイズの２．５％、及び５％の量とする。 Next, in step 110, for each of the positive data (1) used for generating the first identification model, a shifted image is generated using the learning image 60 of the object image. As shown in FIG. 5A, the misaligned image 70a in which the position of the identification target is shifted is from a region where the Pos _i 64 of the learning image 60 of the target image is cut out to at least one of the horizontal direction and the vertical direction. It is generated by cutting out the translated area and normalizing it to a predetermined size (for example, 16 × 32 pixels). By moving the cut region in parallel, the position of the identification target in the shifted image can be shifted. The amount of movement 72 when moving in the left-right direction is 2.5% and 5% of the horizontal size of Pos _i 64. Similarly, the amount of movement when moving in the vertical direction is 2.5% and 5% of the vertical size of Pos _i 64.

また、同図（Ｂ）に示すように、識別対象の大きさをずらした大きさずれ画像７０ｂは、対象物画像の学習用画像６０のＰｏｓ_ｉ６４を切り取った領域と中心を同じくする領域であって、Ｐｏｓ_ｉ６４のサイズを拡縮させた領域を切り取って、所定のサイズ（例えば、１６×３２画素）に正規化することにより生成する。拡縮による大きさずれ量は、拡大率１０５％、及び縮小率９５％とする。切り取る領域のサイズを拡縮することにより、ずれ画像内での識別対象のサイズを９５％または１０５％に拡縮することができる。 Further, as shown in FIG. 5B, the size-shifted image 70b in which the size of the identification target is shifted is a region having the same center as the region where the Pos _i 64 of the learning image 60 of the target image is cut out. there are, cut region obtained by scaling the size of Pos i _64, a predetermined size (e.g., 16 × 32 pixels) is generated by normalizing the. The amount of size deviation due to enlargement / reduction is assumed to be 105% enlargement and 95% reduction. By enlarging / reducing the size of the region to be cut out, the size of the identification target in the shifted image can be enlarged / reduced to 95% or 105%.

上記のように、Ｐｏｓ_ｉ６４について、左右方向にそれぞれ２．５％、及び５％ずらした位置ずれ画像を５種類、上下方向にそれぞれ２．５％、及び５％ずらした位置ずれ画像を５種類、拡大率１０５％、及び縮小率９５％に拡縮した大きさずれ画像を３種類（それぞれずれ量なしの場合を含む）、合計７５種類のずれ画像を生成する。 As described above, for Pos _i 64, 5 types of misaligned images shifted by 2.5% and 5% in the left-right direction and 5 misaligned images shifted by 2.5% and 5% in the vertical direction, respectively. Three types of size-shifted images that have been scaled up and down to a type, an enlargement rate of 105%, and a reduction rate of 95% (including cases where there is no shift amount), a total of 75 types of shift images are generated.

なお、位置ずれ画像７０ａ生成の際の移動量を、Ｐｏｓ_ｉ６４の横または縦サイズの２．５％、及び５％、大きさずれ画像７０ｂ生成の際の大きさずれ量を、拡大率１０５％、及び縮小率９５％としたのは、位置ずれ及び大きさずれのずれ量が５％を超えるずれ画像では、歩行者として識別されない可能性が高まるため、第２の識別モデル生成の際に用いる学習用画像とするには現実的ではないからである。 Note that the amount of movement when generating the misaligned image 70a is 2.5% and 5% of the horizontal or vertical size of the Pos _i 64, and the amount of misalignment when generating the misaligned image 70b is the enlargement factor 105. % And the reduction ratio of 95% are because the possibility of not being identified as a pedestrian increases in a displacement image in which the displacement amount of the positional displacement and the size displacement exceeds 5%. Therefore, when generating the second identification model This is because it is not realistic to use the learning image.

次に、ステップ１１２で、Ｐｏｓ_ｉ６４について生成されたずれ画像の各々と上記ステップ１０８で生成した第１の識別モデルとを比較することにより、ずれ画像の各々について識別のし易さを示す評価値を算出する。ここでは、第１の識別モデルとしてブースティングアルゴリズムによるカスケード型識別器を用いているため、評価値として下記（１）式のＥを用いる。 Next, in step 112, each of the shifted images generated for Pos _i 64 is compared with the first identification model generated in step 108, thereby evaluating the ease of identifying each of the shifted images. Calculate the value. Here, since the cascade type discriminator based on the boosting algorithm is used as the first discrimination model, E in the following equation (1) is used as the evaluation value.

ここで、Ｓは、カスケード型識別器のステージ数（弱識別器の数）、Ｖ_ｉは、ステージｉのステージ評価値、Ｔｈｒ_ｉは、ステージｉのステージ閾値である。カスケード型識別器は、例えば、Ｈａａｒ−ｌｉｋｅ特徴の集合からなる弱識別器がｉ個並列に接続された識別器であり、ステージ１でのステージ評価値Ｖ_１がステージ閾値Ｔｈｒ_１以上の場合にはステージ２へ進み、ステージ２でのステージ評価値Ｖ_２がステージ閾値Ｔｈｒ_２以上の場合にはステージ３へ進み、最終的にステージＳでのステージ評価値Ｖ_Ｓがステージ閾値Ｔｈｒ_Ｓ以上の場合に、入力データ（ずれ画像）が歩行者であると識別する。ステージ評価値Ｖ_ｉは、ステージｉの弱識別器が示す特徴がずれ画像からどの程度抽出されるかを示す値であり、ステージ評価値Ｖ_ｉが高い程、歩行者としての確からしさが高いことを示す。従って、Ｅの値が大きい程、ずれ画像が歩行者として識別し易く、Ｅの値が小さい程、ずれ画像が歩行者として識別し難いことを示している。 Here, S is the number of stages of cascaded classifier (the number of weak classifiers), V _i is the stage evaluation value of the stage i, Thr _i is the stage threshold stage i. Cascade classifier, for example, a Haar-like composed of a set of feature weak classifiers are connected in i number juxtaposed discriminator stage evaluation value V ₁ of the stage 1 in the case of the stage threshold Thr ₁ or more Advances to stage 2 and proceeds to stage 3 when the stage evaluation value V ₂ at stage ₂ is greater than or equal to the stage threshold Thr ₂ , and finally when the stage evaluation value V _S at stage _S is greater than or equal to the stage threshold Thr _S In addition, the input data (deviation image) is identified as a pedestrian. The stage evaluation value V _i is a value indicating how much the feature indicated by the weak classifier of stage _i is extracted from the shifted image. The higher the stage evaluation value V _i , the higher the probability as a pedestrian. Indicates. Therefore, the larger the value of E, the easier it is to identify the displaced image as a pedestrian, and the smaller the value of E, the more difficult it is to identify the displaced image as a pedestrian.

次に、ステップ１１４で、Ｐｏｓ_ｉ６４について生成された７５種類のずれ画像のうち、評価値が最小のずれ画像、すなわち最も識別し難いずれ画像を選択する。次に、ステップ１１６で、Ｐｏｓｉｔｉｖｅデータ（１）に含まれる全てのＰｏｓ_ｉ６４について評価値が最小のずれ画像を選択したか否かを判断する。選択した場合には、ステップ１１８へ進み、選択していない場合には、ステップ１１０へ戻って処理を繰り返す。図４（Ｂ）に示すように、全てのＰｏｓ_ｉ６４についての評価値が最小のずれ画像（Ｐｏｓ_ｉmin６４）をまとめてＰｏｓｉｔｉｖｅデータ（２）とする。 Next, in step 114, among the 75 types of shifted images generated for Pos _i 64, the shifted image with the smallest evaluation value, that is, the most difficult to identify image is selected. Next, in step 116, it is determined whether or not a shifted image having the smallest evaluation value has been selected for all Pos _i 64 included in the positive data (1). If selected, the process proceeds to step 118. If not selected, the process returns to step 110 to repeat the process. As shown in FIG. 4B, the shifted images (Pos _imin 64) having the smallest evaluation value for all Pos _i 64 are collectively referred to as Positive data (2).

次に、ステップ１１８で、図４（Ｃ）に示すように、Ｐｏｓｉｔｉｖｅデータ（２）及びＮｅｇａｔｉｖｅデータ（１）を用いて、上記ステップ１０８で第１の識別モデルを生成したのと同様の手法により第２の識別モデルを生成する。生成した第２の識別モデルを識別モデル記憶部５８に記憶して、処理を終了する。 Next, in step 118, as shown in FIG. 4 (C), using the positive data (2) and negative data (1), the same method as that used to generate the first identification model in step 108 above is used. A second identification model is generated. The generated second identification model is stored in the identification model storage unit 58, and the process ends.

次に、図６を参照して、本実施の形態における歩行者検出の処理ルーチンについて説明する。 Next, a pedestrian detection processing routine in the present embodiment will be described with reference to FIG.

ステップ２００で、撮像装置１２で撮像された画像が入力され、次に、ステップ２０２で、入力画像に対して例えば１６×３２画素の探索ウインドウを入力画像の所定領域（例えば、左角の領域）に設定し、設定した探索ウインドウを用いて、入力画像から１６×３２画素のウインドウ画像を抽出する。 In step 200, an image captured by the imaging device 12 is input. Next, in step 202, a search window of, for example, 16 × 32 pixels is set with respect to the input image, and a predetermined region (for example, a left corner region) of the input image. And a 16 × 32 pixel window image is extracted from the input image using the set search window.

次に、ステップ２０４で、ウインドウ画像と第２の識別モデルとを比較することにより、ウインドウ画像が識別対象である歩行者か否かを識別する。ウインドウ画像が歩行者であると識別された場合には、ステップ２０６へ進み、探索ウインドウの位置及び大きさ等の情報をリストとしてＲＡＭに保存して、次のステップ２０８へ進む。一方、歩行者であると識別されなかった場合には、そのままステップ２０８へ進む。 Next, in step 204, the window image is compared with the second identification model to identify whether or not the window image is a pedestrian to be identified. If the window image is identified as a pedestrian, the process proceeds to step 206, information such as the position and size of the search window is stored in the RAM as a list, and the process proceeds to the next step 208. On the other hand, if it is not identified as a pedestrian, the process proceeds to step 208 as it is.

ステップ２０８で、入力画像全体について探索ウインドウをスキャンして探索が終了したか否かを判断する。終了した場合は、ステップ２１２へ進む。終了していない場合は、ステップ２１０へ進み、探索ウインドウの位置を予め定められた探索ステップだけ移動させて、ステップ２０２に戻り、ステップ２０２〜ステップ２０８の処理を繰り返す。現サイズの探索ウインドウでの画像全体の探索が終了すると、ステップ２１２へ進む。 In step 208, the search window is scanned for the entire input image to determine whether the search is complete. If completed, go to step 212. If not completed, the process proceeds to step 210, the position of the search window is moved by a predetermined search step, the process returns to step 202, and the processes of steps 202 to 208 are repeated. When the search of the entire image in the search window of the current size is completed, the process proceeds to step 212.

ステップ２１２で、全てのサイズの探索ウインドウでの探索が終了したか否かを判断する。ここで、探索ウインドウは歩行者を検出するためのウインドウ画像を抽出するためのフレームとして用いられているが、探索ウインドウのサイズが異なれば、様々なサイズの歩行者（近くに存在する歩行者、遠くに存在する歩行者など）を検出することができる。本実施の形態では、様々なサイズの探索ウインドウが予め用意されており、各々の探索ウインドウで画像全体を探索する必要がある。終了した場合は、ステップ２１６へ進み、終了していない場合は、ステップ２１４へ進む。 In step 212, it is determined whether or not the search in all size search windows has been completed. Here, the search window is used as a frame for extracting a window image for detecting a pedestrian, but if the size of the search window is different, pedestrians of various sizes (pedestrians nearby, It is possible to detect pedestrians and the like that exist in the distance. In the present embodiment, search windows of various sizes are prepared in advance, and it is necessary to search the entire image in each search window. If completed, the process proceeds to step 216. If not completed, the process proceeds to step 214.

ステップ２１４で、探索ウインドウのサイズを１ステップ拡大（例えば、探索ウインドウのサイズを１．２倍）して、ステップ２０２へ戻り、ステップ２０２〜ステップ２１２の処理を繰り返す。なお、設定した探索ウインドウが１６×３２画素を超えるサイズのウインドウであった場合には、抽出したウインドウ画像を１６×３２画素に変換する。すべてのサイズの探索ウインドウでの探索が終了すると、ステップ２１６へ進む。 In step 214, the size of the search window is enlarged by one step (for example, the size of the search window is 1.2 times), the process returns to step 202, and the processing in steps 202 to 212 is repeated. When the set search window is a window having a size exceeding 16 × 32 pixels, the extracted window image is converted to 16 × 32 pixels. When the search in the search windows of all sizes ends, the process proceeds to step 216.

ステップ２１６で、リストに保存した情報に基づいて、入力画像に対して、検出された歩行者がウインドウで囲まれて表示されるように表示装置１８を制御する。 In step 216, based on the information stored in the list, the display device 18 is controlled so that the detected pedestrian is displayed surrounded by a window with respect to the input image.

ここで、図７〜図１１に、本実施の形態の歩行者検出装置１０の検出性能を評価した評価結果を示す。 Here, the evaluation result which evaluated the detection performance of the pedestrian detection apparatus 10 of this Embodiment in FIGS. 7-11 is shown.

図７は、歩行者検出率と１フレーム当たりの誤検出数をプロットしたＲＯＣ曲線（ｒｅｃｅｉｖｅｒｏｐｅｒａｔｉｎｇｃｈａｒａｃｔｅｒｉｓｔｉｃｃｕｒｖｅ）である。検出率、及び誤検出数は以下のように定義する。 FIG. 7 is a ROC curve (receiver operating characteristic curve) in which the pedestrian detection rate and the number of false detections per frame are plotted. The detection rate and the number of false detections are defined as follows.

検出率＝検出された歩行者／検出対象となる歩行者
誤検出数＝検出対象以外に対する検出数／評価フレーム数 Detection rate = detected pedestrians / detected pedestrians false detection count = number of detections other than detection target / number of evaluation frames

ＲＯＣ曲線では、グラフの左上方領域にプロットされるほど性能が高い。すなわちｘ軸が同じ（誤検出数が等しい）場合は、ｙ軸の値が大きいほど検出率が高く、ｙ軸が同じ（検出率が等しい）場合は、ｘ軸の値が小さいほど誤検出数が少ないことを意味する。 In the ROC curve, the higher the performance is plotted in the upper left area of the graph. That is, when the x-axis is the same (the number of false detections is equal), the larger the y-axis value, the higher the detection rate. When the y-axis is the same (the detection rate is the same), the smaller the x-axis value, the lower the number of false detections. Means less.

図７において、▲印は、本実施の形態の場合（第２の識別モデルを用いた場合）、□印は、ずれ画像からランダムに選択した画像を用いて生成した参考識別モデルを用いた場合、◆印は、従来どおり第１の識別モデルを用いた場合である。計算時間を抑えるため、探索ウインドウのサイズ及び探索ステップは粗い設定（以下、粗探索ともいう）とした。図７に示すとおり、粗探索を行った場合、本実施の形態の識別モデルを用いた場合が最も検出性能が高い。 In FIG. 7, ▲ indicates the case of the present embodiment (when the second identification model is used), and □ indicates the case where the reference identification model generated using an image randomly selected from the shifted images is used. , ♦ indicates a case where the first identification model is used as usual. In order to reduce the calculation time, the size of the search window and the search step are set to be rough (hereinafter also referred to as a rough search). As shown in FIG. 7, when a rough search is performed, the detection performance is highest when the identification model of the present embodiment is used.

次に、図９に、本実施の形態の識別モデルで粗探索を行った場合（▲印）、従来どおり第１の識別モデルを用いて粗探索を行った場合（□印）、第１の識別モデルを用いて粗探索の４倍の細かさ（探索ステップが粗探索の１／４）で探索を行った場合（◆印）、第１の識別モデルを用いて粗探索の８倍の細かさ（探索ステップが粗探索の１／８）で探索（以下、密探索ともいう。４倍の場合も同様）を行った場合（×印）のＲＯＣ曲線を示す。本実施の形態では粗探索の場合でも、従来の識別モデルを用いた場合の密探索と同程度の検出性能となった。 Next, in FIG. 9, when a rough search is performed using the identification model of the present embodiment (marked with ▲), when a rough search is performed using the first identification model as in the past (marked with □), the first When a search is performed using the identification model with a fineness four times that of the coarse search (the search step is ¼ of the coarse search) (marked with ◆), the fineness is eight times that of the coarse search using the first identification model. The ROC curve is shown when a search (hereinafter also referred to as a fine search, the same applies when the search step is 4 times) is performed (x mark). In the present embodiment, even in the case of a rough search, the detection performance is comparable to that of a dense search using a conventional identification model.

また、歩行者検出に要する計算時間には、探索ウインドウのサイズ及び探索ステップだけでなく、識別モデルの大きさを表す特徴数も影響するため、特徴数についても評価を行った。図８及び図１０に誤検出数に対する特徴数を示す。ここで、各場合における歩行者検出の計算コストを比較するため、計算コストを以下のとおり定義する。 In addition, since the number of features representing the size of the identification model influences not only the size of the search window and the search step, but also the number of features was evaluated. 8 and 10 show the number of features with respect to the number of erroneous detections. Here, in order to compare the calculation costs of pedestrian detection in each case, the calculation costs are defined as follows.

計算コスト＝特徴数 × 探索回数 Calculation cost = number of features x number of searches

ここで、探索回数は、粗探索を「１」とした場合の探索の細かさで、粗探索の４倍の細かさで探索を行った場合は「４」、粗探索の８倍の細かさで探索を行った場合は「８」となる。 Here, the number of searches is the fineness of the search when the coarse search is “1”, “4” when the search is performed four times as fine as the coarse search, and eight times as fine as the coarse search. When the search is performed with “8”, “8” is obtained.

誤検出数２及び６個／フレームのときの特徴数を補間により求め、その際の計算コストを比較した比較表を図１１に示す。検出性能が同程度である従来の識別モデルを用いて密探索を行った場合と比較すると、本実施の形態による計算コストが最も低い。 FIG. 11 shows a comparison table in which the number of features when the number of false detections is 2 and 6 / frame is obtained by interpolation, and the calculation costs at that time are compared. Compared with the case where a dense search is performed using a conventional identification model having the same detection performance, the calculation cost according to the present embodiment is the lowest.

以上説明したように、本実施の形態の歩行者検出装置によれば、第１の識別モデル生成で用いたＰｏｓｉｔｉｖｅデータ（１）について、識別対象の位置及び大きさの少なくとも一方をずらしたずれ画像を複数生成し、生成した複数のずれ画像の中から識別のし易さを示す評価値が最小のずれ画像を用いて第２の識別モデルを生成する。これにより、第２の識別モデルは、第１の識別モデルに比べて、入力画像から抽出されたウインドウ画像内での識別対象の位置ずれ及び大きさずれに対して耐性が強くなり、粗探索を行って計算コストを低く抑えた場合でも、歩行者検出の精度を向上させることができる。 As described above, according to the pedestrian detection device of the present embodiment, a shifted image in which at least one of the position and the size of the identification target is shifted with respect to the positive data (1) used in the first identification model generation. Are generated, and a second identification model is generated using a shifted image having the smallest evaluation value indicating ease of identification from among the generated shifted images. As a result, the second identification model is more resistant to the positional deviation and size deviation of the identification target in the window image extracted from the input image than the first identification model. Even when the calculation cost is kept low, the accuracy of pedestrian detection can be improved.

なお、本実施の形態では、識別モデルとしてブースティングアルゴリズムを用いたカスケード型識別器を生成する場合について説明したが、線形ＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅ）や非線形ＳＶＭを用いた手法等、他の周知の手法を用いて識別モデルを生成してもよい。この場合、ずれ画像の識別し易さまたはし難さを示す評価値は、識別モデル生成に用いた手法に対応して、識別のし易さまたはし難さを評価できる値を用いるとよい。例えば、線形ＳＶＭを用いた場合には（２）式のｇ（ｘ）または｜ｇ（ｘ）｜を評価値として用いることができる。 In this embodiment, the case of generating a cascade classifier using a boosting algorithm as an identification model has been described. However, other well-known methods such as a method using a linear SVM (Support Vector Machine) or a non-linear SVM are used. An identification model may be generated using a technique. In this case, as an evaluation value indicating the ease or difficulty of identifying a misaligned image, a value that can evaluate the ease or difficulty of identification corresponding to the technique used for generating the identification model may be used. For example, when linear SVM is used, g (x) or | g (x) | in equation (2) can be used as an evaluation value.

ここで、ｘは入力ベクトル、ｗ及びｂは識別関数を決定するパラメータである。 Here, x is an input vector, and w and b are parameters that determine the discriminant function.

また、非線形ＳＶＭを用いた場合には（３）式のｇ（ｘ）または｜ｇ（ｘ）｜を評価値として用いることができる。 When nonlinear SVM is used, g (x) or | g (x) | in equation (3) can be used as an evaluation value.

ここで、ｘは入力ベクトル、ｗ_ｋ及びｂは識別関数を決定するパラメータ、ｘ_ｋ（〜）はＳＶ（サポートベクター）、ｍはＳＶの数、Ｋ（ｘ_１、ｘ_２）はｘ_１及びｘ_２を引数とするカーネル関数である。 Here, x is an input vector, w _k and b are parameters for determining the discriminant function, x _k (˜) is SV (support vector), m is the number of SVs, and K (x ₁ , x ₂ ) is x ₁ and the x ₂ is a kernel function as an argument.

１０歩行者検出装置
２２ウインドウ画像抽出部
２８検出部
３０表示制御部
５０識別モデル生成装置
５２第１の識別モデル生成部
５４ずれ画像生成評価部
５６第２の識別モデル生成部
５８識別モデル記憶部 DESCRIPTION OF SYMBOLS 10 Pedestrian detection apparatus 22 Window image extraction part 28 Detection part 30 Display control part 50 Identification model production | generation apparatus 52 1st identification model production | generation part 54 Misalignment image production | generation evaluation part 56 2nd identification model production | generation part 58 Identification model memory | storage part

Claims

入力画像に対して所定サイズのウインドウ枠を所定量ずつ移動させながら前記入力画像から前記ウインドウ枠内の画像をウインドウ画像として複数抽出する抽出手段と、
前記抽出手段により抽出されたウインドウ画像の各々と、各々が異なる識別対象をウインドウ内に含む複数の第１の学習用画像、及び各々が前記識別対象以外の異なる対象をウインドウ内に含む複数の第２の学習用画像を用いて学習することにより第１の識別モデルを生成する第１の識別モデル生成手段と、前記第１の学習用画像の各々について、第１の学習用画像の識別対象の位置及び大きさの少なくとも一方をウインドウ内でずらしたずれ画像を複数生成するずれ画像生成手段と、前記ずれ画像生成手段で生成された複数のずれ画像の各々と、前記第１の識別モデルとを比較することにより、前記ずれ画像の識別のし易さまたはし難さを示す評価値を算出する算出手段と、前記算出手段で算出された評価値に基づいて、識別し難いずれ画像を前記第１の学習用画像の各々について該第１の学習用画像に対応する第３の学習用画像として選択し、前記複数の第１の学習用画像の各々に対応して選択された複数の第３の学習用画像、及び前記複数の第２の学習用画像を用いて学習することにより第２の識別モデルを生成する第２の識別モデル生成手段と、を含む識別モデル生成装置で生成された前記第２の識別モデルとを比較して、ウインドウ画像が識別対象であるか否かを識別することにより、前記入力画像から識別対象を検出する検出手段と、
を含む対象物検出装置。 Extracting means for extracting a plurality of images in the window frame as window images from the input image while moving a predetermined size window frame by a predetermined amount with respect to the input image;
Each of the window images extracted by the extraction means , a plurality of first learning images each including a different identification target in the window, and a plurality of first images each including a different target other than the identification target in the window A first identification model generating means for generating a first identification model by learning using the second learning image, and an identification target of the first learning image for each of the first learning images. A shift image generation unit that generates a plurality of shift images in which at least one of the position and the size is shifted within the window, each of the plurality of shift images generated by the shift image generation unit, and the first identification model. By comparing, a calculation unit that calculates an evaluation value indicating the ease or difficulty of identifying the misaligned image, and a difficult-to-identify image based on the evaluation value calculated by the calculation unit Is selected as a third learning image corresponding to the first learning image for each of the first learning images, and a plurality of images selected corresponding to each of the plurality of first learning images is selected. generating the third learning image, and the identification model generation apparatus and a second identification model generating means for generating a second identification model by learning using the plurality of second learning image Detecting means for detecting the identification object from the input image by comparing the second identification model and identifying whether the window image is the identification object;
An object detection apparatus including:

前記第２の識別モデルは、前記第２の識別モデル生成手段が、前記ずれ画像の中で最も識別し難いずれ画像を前記第３の学習用画像として選択することにより生成された請求項１記載の対象物検出装置。 Second identification model, the second identification model generation means, the deviation most identification difficult claim 1 produced by a displacement image be selected as the third learning image in the image Object detection device.

前記第２の識別モデルは、前記ずれ画像生成手段が、前記第１の学習用画像の各々について、前記識別対象の位置をウインドウの上下方向及び左右方向の長さの５％以下の量、ウインドウの上下方向及び左右方向にずらすことにより生成したずれ画像を用いて生成された請求項１または請求項２記載の対象物検出装置。 Second identification models, the displacement image generating means, wherein each the first learning image, vertically and 5% or less of the amount in the lateral direction of the length of the identification position the window of the target, the window The object detection device according to claim 1, wherein the object detection device is generated by using a shifted image generated by shifting in a vertical direction and a horizontal direction.

前記第２の識別モデルは、前記ずれ画像生成手段が、前記第１の学習用画像の各々について、ウインドウ内で識別対象を縮小率９５％以上または拡大率１０５％以下で拡縮することにより生成したずれ画像を用いて生成された請求項１〜請求項３のいずれか１項記載の対象物検出装置。 Second identification models, the displacement image generating means, for each of said first learning image, were generated by scaling the identification target reduction of 95% or more or enlargement ratio 105% or less in a window object detecting apparatus according to any one of claims 1 to 3, which is generated using the deviation images.

コンピュータを、
入力画像に対して所定サイズのウインドウ枠を所定量ずつ移動させながら前記入力画像から前記ウインドウ枠内の画像をウインドウ画像として複数抽出する抽出手段、及び
前記抽出手段により抽出されたウインドウ画像の各々と、各々が異なる識別対象をウインドウ内に含む複数の第１の学習用画像、及び各々が前記識別対象以外の異なる対象をウインドウ内に含む複数の第２の学習用画像を用いて学習することにより第１の識別モデルを生成する第１の識別モデル生成手段と、前記第１の学習用画像の各々について、第１の学習用画像の識別対象の位置及び大きさの少なくとも一方をウインドウ内でずらしたずれ画像を複数生成するずれ画像生成手段と、前記ずれ画像生成手段で生成された複数のずれ画像の各々と、前記第１の識別モデルとを比較することにより、前記ずれ画像の識別のし易さまたはし難さを示す評価値を算出する算出手段と、前記算出手段で算出された評価値に基づいて、識別し難いずれ画像を前記第１の学習用画像の各々について該第１の学習用画像に対応する第３の学習用画像として選択し、前記複数の第１の学習用画像の各々に対応して選択された複数の第３の学習用画像、及び前記複数の第２の学習用画像を用いて学習することにより第２の識別モデルを生成する第２の識別モデル生成手段と、を含む識別モデル生成装置で生成された前記第２の識別モデルとを比較して、ウインドウ画像が識別対象であるか否かを識別することにより、前記入力画像から識別対象を検出する検出手段
として機能させるための対象物検出プログラム。 Computer
Extraction means for extracting a plurality of images in the window frame as window images from the input image while moving a window frame of a predetermined size by a predetermined amount with respect to the input image; and
Each of the window images extracted by the extraction means , a plurality of first learning images each including a different identification target in the window, and a plurality of first images each including a different target other than the identification target in the window A first identification model generating means for generating a first identification model by learning using the second learning image, and an identification target of the first learning image for each of the first learning images. A shift image generation unit that generates a plurality of shift images in which at least one of the position and the size is shifted within the window, each of the plurality of shift images generated by the shift image generation unit, and the first identification model. By comparing, a calculation unit that calculates an evaluation value indicating the ease or difficulty of identifying the misaligned image, and a difficult-to-identify image based on the evaluation value calculated by the calculation unit Is selected as a third learning image corresponding to the first learning image for each of the first learning images, and a plurality of images selected corresponding to each of the plurality of first learning images is selected. generating the third learning image, and the identification model generation apparatus and a second identification model generating means for generating a second identification model by learning using the plurality of second learning image The object detection program for functioning as a detection means for detecting an identification target from the input image by comparing the second identification model and identifying whether the window image is the identification target .

コンピュータを、請求項１〜請求項４のいずれか１項記載の対象物検出装置を構成する各手段として機能させるための対象物検出プログラム。 The target object detection program for functioning a computer as each means which comprises the target object detection apparatus of any one of Claims 1-4.