JP2014191400A

JP2014191400A - Image detection apparatus, control program, and image detection method

Info

Publication number: JP2014191400A
Application number: JP2013063856A
Authority: JP
Inventors: Takahiro Aoki; 隆裕青木
Original assignee: MegaChips Corp
Current assignee: MegaChips Corp
Priority date: 2013-03-26
Filing date: 2013-03-26
Publication date: 2014-10-06
Anticipated expiration: 2033-03-26
Also published as: JP6110174B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique capable of improving detection accuracy of a detection target image.SOLUTION: A grouping processing part 5 groups a plurality of detection result areas detected by a detection part 3 on the basis of positions of the detection result areas. A map generation part 4 generates a map indicating distribution of accuracy values indicating accuracy as a detection target image in the detection target image. An independent area specification part 8 specifies independent areas included in a high accuracy area in a binarized map. A threshold adjustment part 6 adjusts a threshold so that the number of independent areas specified by the independent area specification part 8 coincides with the number of groups obtained by grouping processing part 5. A detection target image specification part 9 specifies the detection target image on the basis of the independent areas included in the high accuracy area of the binarized map generated by using the threshold adjusted by the threshold adjustment part 6.

Description

本発明は、処理対象画像から検出対象画像を検出する技術に関する。 The present invention relates to a technique for detecting a detection target image from a processing target image.

特許文献１〜３には、処理対象画像から検出対象画像を検出する技術が開示されている。 Patent Documents 1 to 3 disclose techniques for detecting a detection target image from a processing target image.

特開２００８−２７０５８号公報JP 2008-27058 A 特開２００９−１７５８２１号公報JP 2009-175821 A 特開２０１１−２２１７９１号公報JP 2011-221791 A

さて、処理対象画像から検出対象画像を検出する際には、その検出精度の向上が望まれている。 Now, when detecting a detection target image from a processing target image, improvement of the detection accuracy is desired.

そこで、本発明は上述の点に鑑みて成されたものであり、検出対象画像についての検出精度を向上させることが可能な技術を提供することを目的とする。 Therefore, the present invention has been made in view of the above points, and an object thereof is to provide a technique capable of improving the detection accuracy of a detection target image.

上記課題を解決するため、本発明に係る画像検出装置の一態様は、処理対象画像から検出対象画像を検出する画像検出装置であって、検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う検出部と、前記検出結果領域の位置に基づいて、前記検出部で検出された複数の検出結果領域をグループ分けするグループ分け処理部と、前記検出部での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成するマップ生成部と、しきい値を用いて前記マップを２値化して２値化マップを生成する２値化処理部と、前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する独立領域特定部と、前記独立領域特定部で特定される前記独立領域の数が、前記グループ分け処理部で得られたグループの数と一致するように、前記しきい値を調整するしきい値調整部と、前記しきい値調整部で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する検出対象画像特定部とを備える。 In order to solve the above-described problem, one aspect of an image detection device according to the present invention is an image detection device that detects a detection target image from a processing target image, and uses a detection frame to perform the processing on the processing target image. Based on the position of the detection result region, a detection unit that performs detection processing that detects a region that is likely to be the detection target image having the same size as the detection frame as a detection result region, and is detected by the detection unit A grouping processing unit for grouping a plurality of detection result areas, and a map indicating a distribution in the processing target image with respect to an accuracy value indicating the certainty as the detection target image based on a detection result in the detection unit A map generating unit that generates a binarized map by binarizing the map using a threshold value, and the accuracy value in the map is greater than or equal to the threshold value Alternatively, an independent region specifying unit that specifies an independent region that corresponds to a region that is larger than the threshold value and that is included in the high-accuracy region in the binarization map that is generated using the threshold value, A threshold adjusting unit that adjusts the threshold so that the number of independent regions specified by the independent region specifying unit matches the number of groups obtained by the grouping processing unit; The detection target image is specified in the processing target image based on the independent region included in the high accuracy region of the binarized map generated using the threshold value adjusted by the value adjusting unit. A detection target image specifying unit.

また、本発明に係る画像検出装置の一態様では、前記グループ分け処理部は、前記検出結果領域が前記検出対象画像である確からしさにも基づいて、前記検出部で検出された複数の検出結果領域をグループ分けする。 In the image detection apparatus according to the aspect of the invention, the grouping processing unit may be configured to detect a plurality of detection results detected by the detection unit based on the probability that the detection result area is the detection target image. Group areas.

また、本発明に係る画像検出装置の一態様では、前記検出部は、互いに大きさが異なる複数種類の検出枠を用いて前記検出処理を行い、前記グループ分け処理部は、前記検出結果領域の大きさにも基づいて、前記検出部で検出された複数の検出結果領域をグループ分けする。 In the image detection apparatus according to the aspect of the invention, the detection unit performs the detection process using a plurality of types of detection frames having different sizes, and the grouping processing unit Based on the size, the plurality of detection result areas detected by the detection unit are grouped.

また、本発明に係る画像検出装置の一態様では、前記グループ分け処理部は、複数種類のグループ数のそれぞれについて、前記検出部で検出された複数の検出結果領域を当該グループ数でグループ分けする処理部と、前記複数種類のグループ数のそれぞれについて、前記処理部において前記検出部で検出された複数の検出結果領域が当該グループ数でグループ分けされた結果得られる複数のグループの間での分離の程度を示す分離度を求める分離度取得部と、前記複数種類のグループ数のうち、前記検出部で検出された複数の検出結果領域がそのグループ数でグループ分けされた結果得られる複数のグループの間での前記分離度が最も高いグループ数を、前記しきい値調整部が前記しきい値の調整で使用する使用グループ数として決定する使用グループ数決定部とを有する。 In the image detection apparatus according to the aspect of the invention, the grouping processing unit may group a plurality of detection result areas detected by the detection unit for each of a plurality of types of groups according to the number of groups. Separation between the processing unit and the plurality of groups obtained as a result of grouping the plurality of detection result areas detected by the detection unit in the processing unit by the number of groups for each of the plurality of types of groups. A degree-of-separation obtaining unit for obtaining a degree of separation; and a plurality of groups obtained as a result of grouping a plurality of detection result areas detected by the detection unit among the plurality of types of groups by the number of groups The number of groups with the highest degree of separation between the thresholds is determined as the number of groups used by the threshold adjustment unit for the threshold adjustment. And a group number determination unit.

また、本発明に係る画像検出装置の一態様では、前記検出対象画像は、人の顔画像である。 In one aspect of the image detection apparatus according to the present invention, the detection target image is a human face image.

また、本発明に係る制御プログラムの一態様は、処理対象画像から検出対象画像を検出する画像検出装置を制御するための制御プログラムであって、前記画像検出装置に、（ａ）検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、（ｂ）前記検出結果領域の位置に基づいて、前記工程（ａ）において検出された複数の検出結果領域をグループ分けする工程と、（ｃ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成する工程と、（ｄ）しきい値を用いて前記マップを２値化して２値化マップを生成する工程と、（ｅ）前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する工程と、（ｆ）前記工程（ｅ）で特定される前記独立領域の数が、前記工程（ｂ）で得られたグループの数と一致するように、前記しきい値を調整する工程と、（ｇ）前記工程（ｆ）で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する工程とを実行させるためのものである。 An aspect of the control program according to the present invention is a control program for controlling an image detection apparatus that detects a detection target image from a processing target image. The image detection apparatus uses (a) a detection frame. (B) performing a detection process on the processing target image to detect, as a detection result region, an area that is highly likely to be the detection target image having the same size as the detection frame; A step of grouping a plurality of detection result areas detected in the step (a) based on the position; and (c) a probability as the detection target image based on the detection result in the step (a). A step of generating a map indicating the distribution of the accuracy values indicating the processing target image, (d) generating a binarized map by binarizing the map using a threshold value; )Previous Independent maps included in the high-accuracy region in the binarized map generated by using the threshold value corresponding to a region where the accuracy value is greater than or greater than the threshold value in the map. And (f) adjusting the threshold value so that the number of the independent regions specified in the step (e) matches the number of groups obtained in the step (b). And (g) based on the independent region included in the high-accuracy region of the binarization map generated using the threshold value adjusted in the step (f). And a step of specifying the detection target image in the image.

また、本発明に係る画像検出方法の一態様は、処理対象画像から検出対象画像を検出する画像検出方法であって、（ａ）検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、（ｂ）前記検出結果領域の位置に基づいて、前記工程（ａ）において検出された複数の検出結果領域をグループ分けする工程と、（ｃ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成する工程と、（ｄ）しきい値を用いて前記マップを２値化して２値化マップを生成する工程と、（ｅ）前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する工程と、（ｆ）前記工程（ｅ）で特定される前記独立領域の数が、前記工程（ｂ）で得られたグループの数と一致するように、前記しきい値を調整する工程と、（ｇ）前記工程（ｆ）で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する工程とを備える。 An aspect of the image detection method according to the present invention is an image detection method for detecting a detection target image from a processing target image. (A) Using the detection frame, the detection is performed on the processing target image. A step of performing a detection process of detecting an area that is highly likely to be the detection target image having the same size as the frame as a detection result area; and (b) detecting in the step (a) based on the position of the detection result area. A step of grouping a plurality of detection result areas, and (c) based on the detection result in the step (a), an accuracy value indicating the probability as the detection target image in the processing target image Generating a map indicating the distribution; (d) generating the binarized map by binarizing the map using a threshold; and (e) the accuracy value in the map is the threshold value. Greater than or equal to value Identifying an independent region included in a high-accuracy region in the binarization map generated using the threshold value, corresponding to a region larger than the threshold value; and (f) the step ( adjusting the threshold so that the number of independent regions identified in e) matches the number of groups obtained in step (b); and (g) in step (f) Identifying the detection target image in the processing target image based on the independent region included in the high accuracy region of the binarization map generated using the adjusted threshold value. .

本発明によれば、検出対象画像についての検出精度を向上させることができる。 According to the present invention, it is possible to improve detection accuracy for a detection target image.

画像検出装置の構成を示す図である。It is a figure which shows the structure of an image detection apparatus. 画像検出装置が備える複数の機能ブロックの構成を示す図である。It is a figure which shows the structure of the several functional block with which an image detection apparatus is provided. 検出部の構成を示す図である。It is a figure which shows the structure of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出部の動作を説明するための図である。It is a figure for demonstrating operation | movement of a detection part. 検出結果枠を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a detection result frame on a process target image. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップの生成方法を説明するための図である。It is a figure for demonstrating the production | generation method of an output value map. 出力値マップの一例を示す図である。It is a figure which shows an example of an output value map. 処理対象画像の一例を模式的に示す図である。It is a figure which shows an example of a process target image typically. 出力値マップの一例を示す図である。It is a figure which shows an example of an output value map. ２値化マップの一例を示す図である。It is a figure which shows an example of a binarization map. ２値化マップの高確度領域に設定された外接矩形を示す図である。It is a figure which shows the circumscribed rectangle set to the high-accuracy area | region of a binarization map. ２値化マップでの外接矩形を処理対象画像に設定した様子を示す図である。It is a figure which shows a mode that the circumscribed rectangle in a binarization map was set to the process target image. ２値化マップの一例を示す図である。It is a figure which shows an example of a binarization map. ２値化マップの高確度領域に設定された外接矩形を示す図である。It is a figure which shows the circumscribed rectangle set to the high-accuracy area | region of a binarization map. ２値化マップでの外接矩形を処理対象画像に設定した様子を示す図である。It is a figure which shows a mode that the circumscribed rectangle in a binarization map was set to the process target image. ２値化マップの一例を示す図である。It is a figure which shows an example of a binarization map. ２値化マップの高確度領域に設定された外接矩形を示す図である。It is a figure which shows the circumscribed rectangle set to the high-accuracy area | region of a binarization map. ２値化マップでの外接矩形を処理対象画像に設定した様子を示す図である。It is a figure which shows a mode that the circumscribed rectangle in a binarization map was set to the process target image. 画像検出装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of an image detection apparatus. グループ分け処理部の構成を示す図である。It is a figure which shows the structure of a grouping process part. K-means法を用いたクラスタリング処理を示すフローチャートである。It is a flowchart which shows the clustering process using a K-means method. 検出結果領域の特徴を表す特徴量の一例を示す図である。It is a figure which shows an example of the feature-value showing the characteristic of a detection result area | region. 複数の検出結果領域を処理対象画像に重ねて示す図である。It is a figure which overlaps and shows a some detection result area | region on a process target image. 複数の検出結果領域をグループ分けする様子を示す図である。It is a figure which shows a mode that a some detection result area | region is grouped. 変形例に係る画像検出装置が備える複数の機能ブロックの構成を示す図である。It is a figure which shows the structure of the several functional block with which the image detection apparatus which concerns on a modification is provided. 変形例に係る画像検出装置での検出対象画像を特定する方法を説明するための図である。It is a figure for demonstrating the method of specifying the detection target image in the image detection apparatus which concerns on a modification.

図１は実施の形態に係る画像検出装置１の構成を示す図である。本実施の形態に係る画像検出装置１は、入力される画像データが示す画像から検出対象画像を検出する。画像検出装置１は、例えば、監視カメラシステム、デジタルカメラシステム等で使用される。本実施の形態では、検出対象画像は、例えば人の顔画像である。以後、単に「顔画像」と言えば、人の顔画像を意味するものとする。また、検出対象画像の検出処理を行う対象の画像を「処理対象画像」と呼ぶ。 FIG. 1 is a diagram illustrating a configuration of an image detection apparatus 1 according to an embodiment. The image detection apparatus 1 according to the present embodiment detects a detection target image from an image indicated by input image data. The image detection apparatus 1 is used, for example, in a surveillance camera system, a digital camera system, or the like. In the present embodiment, the detection target image is, for example, a human face image. Hereinafter, simply speaking “face image” means a human face image. An image to be subjected to detection processing of a detection target image is referred to as a “processing target image”.

画像検出装置１は、一種のコンピュータであって、図１に示されるように、ＣＰＵ（Central Processing Unit）１０及び記憶部１１を備えている。記憶部１１は、ＲＯＭ（Read Only Memory）及びＲＡＭ（Random Access Memory）等の、画像検出装置１（ＣＰＵ１０）が読み取り可能な非一時的な記録媒体で構成されている。記憶部１１には、画像検出装置１の動作を制御するための制御プログラム１２等が記憶されている。記憶部１１は、ＲＯＭ及びＲＡＭ以外の、コンピュータが読み取り可能な非一時的な記録媒体を備えていても良い。記憶部１１は、例えば、ハードディスクドライブ、ＳＳＤ（Solid State Drive）、ＵＳＢ（Universal Serial Bus）メモリ等を備えていても良い。 The image detection apparatus 1 is a kind of computer, and includes a CPU (Central Processing Unit) 10 and a storage unit 11 as shown in FIG. The storage unit 11 is configured by a non-transitory recording medium that can be read by the image detection apparatus 1 (CPU 10), such as a ROM (Read Only Memory) and a RAM (Random Access Memory). The storage unit 11 stores a control program 12 for controlling the operation of the image detection apparatus 1 and the like. The storage unit 11 may include a computer-readable non-transitory recording medium other than the ROM and RAM. The storage unit 11 may include, for example, a hard disk drive, an SSD (Solid State Drive), a USB (Universal Serial Bus) memory, and the like.

画像検出装置１の各種機能は、ＣＰＵ１０が記憶部１１内の制御プログラム１２を実行することによって実現される。画像検出装置１では、制御プログラム１２が実行されることによって、図２に示されるような複数の機能ブロックが形成される。 Various functions of the image detection apparatus 1 are realized by the CPU 10 executing the control program 12 in the storage unit 11. In the image detection apparatus 1, a plurality of functional blocks as shown in FIG. 2 are formed by executing the control program 12.

図２に示されるように、画像検出装置１は、機能ブロックとして、画像入力部２と、検出部３と、マップ生成部４と、グループ分け処理部５と、しきい値調整部６と、２値化処理部７と、独立領域特定部８と、検出対象画像特定部９とを備えている。画像検出装置１が備える各種機能は、機能ブロックではなくハードウェア回路で実現しても良い。 As shown in FIG. 2, the image detection apparatus 1 includes, as functional blocks, an image input unit 2, a detection unit 3, a map generation unit 4, a grouping processing unit 5, a threshold adjustment unit 6, A binarization processing unit 7, an independent region specifying unit 8, and a detection target image specifying unit 9 are provided. Various functions provided in the image detection apparatus 1 may be realized by hardware circuits instead of function blocks.

画像入力部２には、監視カメラシステム等が備える撮像部（カメラ）で順次撮像された複数枚の画像をそれぞれ示す複数の画像データが順次入力される。画像入力部２は、処理対象画像を示す画像データを出力する。画像入力部２は、撮像部で得られる各画像を処理対象画像としても良いし、撮像部で得られる画像のうち、数秒毎に得られる画像を処理対象画像としても良い。撮像部では、例えば、１秒間にＬ枚（Ｌ≧２）の画像が撮像される。つまり、撮像部での撮像フレームレートは、Ｌｆｐｓ(frame per second）である。また、撮像部で撮像される画像では、行方向にＭ個（Ｍ≧２）のピクセルが並び、列方向にＮ個（Ｎ≧２）のピクセルが並んでいる。撮像部で撮像される画像の解像度は、例えばＶＧＡ（Video Graphics Array）であって、Ｍ＝６４０、Ｎ＝４８０となっている。 The image input unit 2 is sequentially input with a plurality of image data respectively indicating a plurality of images sequentially captured by an imaging unit (camera) included in the surveillance camera system or the like. The image input unit 2 outputs image data indicating the processing target image. The image input unit 2 may use each image obtained by the imaging unit as a processing target image, or may use an image obtained every few seconds among images obtained by the imaging unit as a processing target image. In the imaging unit, for example, L (L ≧ 2) images are captured per second. That is, the imaging frame rate at the imaging unit is Lfps (frame per second). In the image captured by the imaging unit, M (M ≧ 2) pixels are arranged in the row direction, and N (N ≧ 2) pixels are arranged in the column direction. The resolution of the image picked up by the image pickup unit is, for example, VGA (Video Graphics Array), and M = 640 and N = 480.

なお以後、撮像部で撮像される画像において、行方向にｍ個（ｍ≧１）のピクセルが並び、列方向にｎ個（ｎ≧１）のピクセルが並ぶ領域の大きさをｍｐ×ｎｐで表す（ｐはピクセルの意味）。また、行列状に配置された複数の値において、左上を基準にして第ｍ行目であって第ｎ列目に位置する値をｍ×ｎ番目の値と呼ぶことがある。 In the following, in an image captured by the imaging unit, the size of an area where m pixels (m ≧ 1) are arranged in the row direction and n pixels (n ≧ 1) are arranged in the column direction is mp × np. (P means pixel). In addition, among a plurality of values arranged in a matrix, a value located in the m-th row and the n-th column with reference to the upper left may be referred to as an m × n-th value.

検出部３は、画像入力部２から出力される画像データを使用して、処理対象画像に対して顔画像の検出を行う。具体的には、検出部３は、検出枠を用いて、処理対象画像に対して、当該検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する検出処理を行う。 The detection unit 3 uses the image data output from the image input unit 2 to detect a face image for the processing target image. Specifically, the detection unit 3 performs detection processing for detecting, as a detection result region, an area that is likely to be a face image having the same size as the detection frame for the processing target image using the detection frame. .

マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさを示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood as the face image based on the detection result of the detection unit 3.

グループ分け処理部５は、検出結果領域の位置に基づいて、検出部３で検出された複数の検出結果領域をグループ分けする。 The grouping processing unit 5 groups a plurality of detection result areas detected by the detection unit 3 based on the position of the detection result area.

２値化処理部７は、マップ生成部４で生成された出力値マップをしきい値を用いて２値化して２値化マップを生成する。 The binarization processing unit 7 binarizes the output value map generated by the map generation unit 4 using a threshold value to generate a binarized map.

独立領域特定部８は、出力値マップにおける、検出確度値がしきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された２値化マップでの領域（以後、「高確度領域」と呼ぶ）に含まれる独立領域を特定する。 The independent region specifying unit 8 corresponds to a region in the output value map corresponding to a region where the detection accuracy value is equal to or greater than the threshold value or greater than the threshold value, and is a binarized map generated using the threshold value. An independent region included in the region (hereinafter referred to as “high accuracy region”) is specified.

しきい値調整部６は、独立領域特定部８で特定される独立領域の数が、グループ分け処理部５で得られたグループの数と一致するように、２値化処理部７での２値化で使用されるしきい値を調整する。 The threshold adjustment unit 6 uses 2 in the binarization processing unit 7 so that the number of independent regions specified by the independent region specifying unit 8 matches the number of groups obtained by the grouping processing unit 5. Adjust the threshold used in the valuation.

検出対象画像特定部９は、しきい値調整部６で調整されたしきい値が用いられて２値化処理部７で生成された２値化マップの高確度領域に含まれる独立領域に基づいて、処理対象画像において顔画像を特定する。これにより、画像検出装置１では、処理対象画像から顔画像が検出される。 The detection target image specifying unit 9 is based on the independent region included in the high accuracy region of the binarization map generated by the binarization processing unit 7 using the threshold adjusted by the threshold adjustment unit 6. Thus, the face image is specified in the processing target image. As a result, the image detection apparatus 1 detects a face image from the processing target image.

次に、画像検出装置１の各ブロックの動作について詳細に説明する。 Next, the operation of each block of the image detection apparatus 1 will be described in detail.

＜検出処理＞
図３は検出部３の構成を示す図である。図３に示されるように、検出部３は、特徴量抽出部３０及び識別器３１を備えている。検出部３は、検出枠を用いて、処理対象画像において当該検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する検出処理を行う。以後、単に「検出処理」と言えば、検出部３でのこの検出処理を意味する。検出部３は、処理対象画像における様々な大きさの顔画像を検出するために、互いにサイズ（大きさ）の異なる複数種類の検出枠を使用する。検出部３では、例えば３０種類の検出枠が使用される。各検出枠は例えば正方形である。 <Detection process>
FIG. 3 is a diagram illustrating a configuration of the detection unit 3. As shown in FIG. 3, the detection unit 3 includes a feature amount extraction unit 30 and a discriminator 31. The detection unit 3 uses the detection frame to perform detection processing for detecting, as a detection result region, a region that is highly likely to be a face image having the same size as the detection frame in the processing target image. Hereinafter, simply speaking “detection process” means this detection process in the detection unit 3. The detection unit 3 uses a plurality of types of detection frames having different sizes (sizes) to detect face images of various sizes in the processing target image. In the detection unit 3, for example, 30 types of detection frames are used. Each detection frame is, for example, a square.

本実施の形態では、後述するように、特徴量抽出部３０は、画像から特徴量を抽出する。そして、特徴量抽出部３０においては、特徴量を抽出する対象の画像については、基準サイズ（正規化サイズ）の画像を使用する必要がある。 In the present embodiment, as will be described later, the feature amount extraction unit 30 extracts a feature amount from an image. The feature amount extraction unit 30 needs to use an image having a reference size (normalized size) as a target image from which feature amounts are extracted.

一方で、本実施の形態では、互いにサイズが異なる複数種類の検出枠には、基準サイズと同じサイズの検出枠と、基準サイズとは異なるサイズの検出枠とが含まれている。以後、基準サイズと同じサイズの検出枠を「基準検出枠」と呼び、基準サイズとは異なるサイズの検出枠を「非基準検出枠」と呼ぶ。本実施の形態では、複数種類の検出枠のうちのサイズが最小の検出枠が基準検出枠となっている。したがって、非基準検出枠のサイズは基準サイズよりも大きくなっている。基準検出枠のサイズは、例えば１６ｐ×１６ｐである。また、複数種類の検出枠には、例えば、大きさが１８ｐ×１８ｐの非基準検出枠及び大きさが２０ｐ×２０ｐの非基準検出枠などが含まれている。 On the other hand, in the present embodiment, the plurality of types of detection frames having different sizes include a detection frame having the same size as the reference size and a detection frame having a size different from the reference size. Hereinafter, a detection frame having the same size as the reference size is referred to as a “reference detection frame”, and a detection frame having a size different from the reference size is referred to as a “non-reference detection frame”. In the present embodiment, the detection frame having the smallest size among the plurality of types of detection frames is the reference detection frame. Therefore, the size of the non-reference detection frame is larger than the reference size. The size of the reference detection frame is, for example, 16p × 16p. The plurality of types of detection frames include, for example, a non-reference detection frame having a size of 18p × 18p and a non-reference detection frame having a size of 20p × 20p.

本実施の形態では、検出部３は、処理対象画像について基準検出枠を使用して検出処理を行う際には、処理対象画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部３は、処理対象画像において、顔画像である可能性が高いと判定した領域（基準検出枠内の画像）を検出結果領域とする。 In the present embodiment, when the detection unit 3 performs the detection process on the processing target image using the reference detection frame, the detection unit 3 moves the reference detection frame relative to the processing target image while moving the reference detection frame within the reference detection frame. A face image is detected for the image to determine whether the image is highly likely to be a face image. And the detection part 3 makes the area | region (image in a reference | standard detection frame) determined with possibility that it is a face image high in a process target image as a detection result area | region.

一方で、検出部３は、処理対象画像について非基準検出枠を使用して検出処理を行う際には、基準サイズとサイズが一致するように非基準検出枠をサイズ変更する。そして、検出部３は、非基準検出枠のサイズ変更に応じて処理対象画像のサイズ変更を行う。検出部３は、サイズ変更を行った処理対象画像に対して、サイズ変更を行った非基準検出枠を移動させながら、当該非基準検出枠内の画像に対して顔画像の検出を行って、当該画像が顔画像である可能性が高いかを判定する。そして、検出部３は、サイズ変更を行った処理対象画像において、顔画像である可能性が高いと判定した領域（サイズ変更後の非基準検出枠内の画像）に基づいて、サイズ変更が行われていない、本来のサイズの処理対象画像において顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とする。 On the other hand, when performing detection processing using the non-reference detection frame for the processing target image, the detection unit 3 changes the size of the non-reference detection frame so that the size matches the reference size. Then, the detection unit 3 changes the size of the processing target image in accordance with the size change of the non-reference detection frame. The detection unit 3 detects the face image for the image in the non-reference detection frame while moving the non-reference detection frame whose size has been changed with respect to the processing target image whose size has been changed, It is determined whether or not the image is likely to be a face image. Then, the detection unit 3 performs the size change based on the region (the image in the non-reference detection frame after the size change) that is determined to be highly likely to be a face image in the processing target image that has undergone the size change. A region that has a high possibility of being a face image in a processing target image of an original size that is not known is identified, and the region is set as a detection result region.

以後、処理対象画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該処理対象画像を「サイズ変更画像」と呼ぶ。また、処理対象画像に対して非基準検出枠が使用されて検出処理が行われる際のサイズ変更後の当該非基準検出枠を「サイズ変更検出枠」と呼ぶ。 Hereinafter, the processing target image after the size change when the non-reference detection frame is used for the processing target image and the detection processing is performed is referred to as a “size-changed image”. Further, the non-reference detection frame after the size change when the detection process is performed using the non-reference detection frame for the processing target image is referred to as a “size change detection frame”.

このように、本実施の形態では、検出部３が処理対象画像に対して基準検出枠を使用して検出処理を行う際の当該検出部３の動作と、検出部３が処理対象画像に対して非基準検出枠を使用して検出処理を行う際の当該検出部３の動作とが異なっている。以下に検出部３の動作について詳細に説明する。 Thus, in the present embodiment, the operation of the detection unit 3 when the detection unit 3 performs detection processing on the processing target image using the reference detection frame, and the detection unit 3 performs processing on the processing target image. Thus, the operation of the detection unit 3 when performing the detection process using the non-reference detection frame is different. The operation of the detection unit 3 will be described in detail below.

検出部３では、検出処理に基準検出枠が使用される際には、特徴量抽出部３０が、処理対象画像に対して基準検出枠を設定し、当該処理対象画像における当該基準検出枠内の画像から複数の特徴量を抽出する。一方で、検出処理に非基準検出枠が使用される際には、特徴量抽出部３０は、処理対象画像をサイズ変更して得られるサイズ変更画像に対して、非基準検出枠をサイズ変更して得られるサイズ変更検出枠を設定し、当該サイズ変更画像における当該サイズ変更検出枠内の画像から複数の特徴量を抽出する。以後、特徴量が抽出される、基準検出枠内の画像及びサイズ変更検出枠内の画像を総称して「枠内画像」と呼ぶことがある。 In the detection unit 3, when the reference detection frame is used for the detection process, the feature amount extraction unit 30 sets a reference detection frame for the processing target image, and within the reference detection frame in the processing target image. A plurality of feature amounts are extracted from the image. On the other hand, when a non-reference detection frame is used for the detection process, the feature amount extraction unit 30 resizes the non-reference detection frame with respect to a size-changed image obtained by resizing the processing target image. The size change detection frame obtained in this way is set, and a plurality of feature amounts are extracted from the image in the size change detection frame in the size change image. Hereinafter, the image in the reference detection frame and the image in the size change detection frame from which the feature amount is extracted may be collectively referred to as “in-frame image”.

ここで、基準検出枠のサイズは基準サイズと一致することから、処理対象画像における基準検出枠内の画像のサイズは基準サイズとなる。また、サイズ変更検出枠のサイズは基準サイズと一致することから、サイズ変更画像におけるサイズ変更検出枠内の画像のサイズは基準サイズとなる。よって、特徴量抽出部３０は、常に基準サイズの画像から特徴量を抽出することができる。特徴量抽出部３０は、枠内画像から、例えばＨａａｒ−ｌｉｋｅ特徴量やＬＢＰ（Local Binary Pattern）特徴量などの特徴量を抽出する。 Here, since the size of the reference detection frame matches the reference size, the size of the image in the reference detection frame in the processing target image becomes the reference size. In addition, since the size of the size change detection frame matches the reference size, the size of the image in the size change detection frame in the size change image becomes the reference size. Therefore, the feature amount extraction unit 30 can always extract the feature amount from the image of the reference size. The feature quantity extraction unit 30 extracts feature quantities such as Haar-like feature quantities and LBP (Local Binary Pattern) feature quantities from the in-frame image.

識別器３１は、特徴量抽出部３０が枠内画像から抽出した複数の特徴量から成る特徴ベクトルと、学習サンプル（学習用のサンプル画像）に基づいて生成された複数の重み係数から成る重みベクトルとに基づいて、当該枠内画像が顔画像である確からしさを示す検出確度値を算出する。具体的には、特徴量抽出部３０は、枠内画像についての特徴ベクトルと、重みベクトルとの内積を求め、当該内積に所定のバイアス値を加算して得られる実数値を、当該枠内画像が顔画像である確からしさを示す検出確度値とする。識別器３１で算出される検出確度値は、基準検出枠内の画像あるいはサイズ変更検出枠内の画像についての顔画像らしさ（顔らしさ）を示している。識別器３１では、例えば、ＳＶＭ（Support Vector Machine）あるいはＡｄａｂｏｏｓｔが使用される。 The discriminator 31 includes a feature vector composed of a plurality of feature amounts extracted from the in-frame image by the feature amount extraction unit 30, and a weight vector composed of a plurality of weight coefficients generated based on the learning sample (learning sample image). Based on the above, a detection accuracy value indicating the likelihood that the in-frame image is a face image is calculated. Specifically, the feature quantity extraction unit 30 obtains an inner product of the feature vector and the weight vector for the in-frame image, and adds a real value obtained by adding a predetermined bias value to the in-frame image. Is a detection accuracy value indicating the probability that is a face image. The detection accuracy value calculated by the discriminator 31 indicates the face image likelihood (face likelihood) of the image within the reference detection frame or the image within the size change detection frame. In the discriminator 31, for example, SVM (Support Vector Machine) or Adaboost is used.

識別器３１は、算出した検出確度値がしきい値以上であれば、枠内画像が顔画像である可能性が高いと判定する。つまり、基準検出枠が使用される際には、識別器３１は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。また、非基準検出枠が使用される際には、識別器３１は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定する。 If the calculated detection accuracy value is greater than or equal to the threshold value, the classifier 31 determines that the possibility that the in-frame image is a face image is high. That is, when the reference detection frame is used, the discriminator 31 determines that the image in the reference detection frame in the processing target image is a region that is highly likely to be a face image having the same size as the reference detection frame. To do. Further, when the non-reference detection frame is used, the discriminator 31 is an area in which the image in the size change detection frame in the size change image is likely to be a face image having the same size as the size change detection frame. Judge that there is.

一方で、識別器３１は、算出した検出確度値がしきい未満であれば、枠内画像が顔画像でない可能性が高いと判定する。つまり、基準検出枠が使用される際には、識別器３１は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。また、非基準検出枠が使用される際には、識別器３１は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域ではないと判定する。 On the other hand, the discriminator 31 determines that there is a high possibility that the in-frame image is not a face image if the calculated detection accuracy value is less than the threshold. That is, when the reference detection frame is used, the discriminator 31 determines that the image in the reference detection frame in the processing target image is not a region that is highly likely to be a face image having the same size as the reference detection frame. To do. When the non-reference detection frame is used, the discriminator 31 determines that the image in the size change detection frame in the size change image is likely to be a face image having the same size as the size change detection frame. Judge that there is no.

識別器３１は、処理対象画像における基準検出枠内の画像が、基準検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該画像を検出結果領域とし、当該基準検出枠を検出結果枠とする。 When the discriminator 31 determines that the image in the reference detection frame in the processing target image is a region that is highly likely to be a face image having the same size as the reference detection frame, the image is used as a detection result region, and the reference detection is performed. Let the frame be the detection result frame.

また識別器３１は、サイズ変更画像におけるサイズ変更検出枠内の画像が、サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域であると判定すると、当該領域の外形枠を仮検出結果枠とする。そして、識別器３１は、仮検出結果枠に基づいて、サイズ変更画像の元の画像である処理対象画像において、非基準検出枠と同じサイズの顔画像である可能性が高い領域を特定し、当該領域を検出結果領域とするとともに、当該検出結果領域の外形枠を最終的な検出結果枠とする。 If the discriminator 31 determines that the image in the size change detection frame in the size change image is an area that is highly likely to be a face image having the same size as the size change detection frame, the identifier 31 temporarily detects the outer frame of the area. The result frame. Then, the discriminator 31 specifies an area that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image that is the original image of the resized image based on the temporary detection result frame, The area is set as a detection result area, and the outer frame of the detection result area is set as a final detection result frame.

＜基準検出枠を用いた検出処理＞
次に、検出部３が処理対象画像に対して基準検出枠を移動させながら、当該基準検出枠内の画像が顔画像である可能性が高いかを判定する際の当該検出部３の一連の動作について説明する。図４〜７は、検出部３の当該動作を説明するための図である。検出部３は、基準検出枠をラスタスキャンさせながら、当該基準検出枠内の画像に対して顔画像の検出を行う。 <Detection process using reference detection frame>
Next, the detection unit 3 moves the reference detection frame with respect to the processing target image, and determines whether the image in the reference detection frame is likely to be a face image. The operation will be described. 4-7 is a figure for demonstrating the said operation | movement of the detection part 3. FIG. The detection unit 3 detects a face image with respect to an image in the reference detection frame while raster scanning the reference detection frame.

図４に示されるように、特徴量抽出部３０は、処理対象画像２０の左上にまず基準検出枠１００を設定して、当該基準検出枠１００内の画像から複数の特徴量を抽出する。識別器３１は、特徴量抽出部３０が抽出した複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、基準検出枠１００内の画像についての検出確度値を求める。そして、識別器３１は、算出した検出確度値がしきい値以上である場合には、処理対象画像２０での左上の基準検出枠１００内の領域が顔画像である可能性が高いと判定し、当該領域を検出結果領域とし、当該領域の外形枠である当該基準検出枠１００を検出結果枠とする。 As shown in FIG. 4, the feature amount extraction unit 30 first sets a reference detection frame 100 at the upper left of the processing target image 20, and extracts a plurality of feature amounts from the image in the reference detection frame 100. The discriminator 31 obtains a detection accuracy value for an image in the reference detection frame 100 based on a feature vector composed of a plurality of feature amounts extracted by the feature amount extraction unit 30 and a weight vector composed of a plurality of weight coefficients. . Then, when the calculated detection accuracy value is equal to or greater than the threshold value, the classifier 31 determines that there is a high possibility that the region in the upper left reference detection frame 100 in the processing target image 20 is a face image. The region is set as a detection result region, and the reference detection frame 100 which is the outer frame of the region is set as a detection result frame.

次に特徴量抽出部３０は、処理対象画像２０において基準検出枠１００を少し右に移動させる。特徴量抽出部３０は、例えば、１画素分あるいは数画素分だけ右に基準検出枠１００を移動させる。そして、特徴量抽出部３０は、処理対象画像２０における移動後の基準検出枠１００内の画像から複数の特徴量を抽出する。 Next, the feature amount extraction unit 30 moves the reference detection frame 100 slightly to the right in the processing target image 20. For example, the feature amount extraction unit 30 moves the reference detection frame 100 to the right by one pixel or several pixels. Then, the feature quantity extraction unit 30 extracts a plurality of feature quantities from the image in the reference detection frame 100 after movement in the processing target image 20.

その後、識別器３１は、特徴量抽出部３０で抽出された複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、移動後の基準検出枠１００内の画像についての検出確度値を求める。そして、識別器３１は、算出した検出確度値がしきい値以上である場合には、移動後の基準検出枠１００内の画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、当該画像の外形枠である移動後の基準検出枠１００を検出結果枠とする。 After that, the discriminator 31 uses the feature vector composed of a plurality of feature amounts extracted by the feature amount extraction unit 30 and the weight vector composed of a plurality of weight coefficients for the image in the reference detection frame 100 after movement. The detection accuracy value is obtained. When the calculated detection accuracy value is equal to or greater than the threshold value, the discriminator 31 determines that the image in the reference detection frame 100 after the movement is likely to be a face image, and determines the image as the image. In addition to the detection result area, the reference detection frame 100 after movement, which is the outer frame of the image, is used as the detection result frame.

その後、検出部３は同様に動作して、図５に示されるように、基準検出枠１００が処理対象画像２０の右端まで移動すると、検出部３は、右端の基準検出枠１００内の画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右端の基準検出枠１００内の画像を検出結果領域とするとともに、当該右端の基準検出枠１００を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and when the reference detection frame 100 moves to the right end of the processing target image 20 as illustrated in FIG. 5, the detection unit 3 detects the image in the reference detection frame 100 at the right end. The detection accuracy value is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 3 sets the image in the rightmost reference detection frame 100 as a detection result region, and uses the rightmost reference detection frame 100 as a detection result frame. To do.

次に、特徴量抽出部３０は、図６に示されるように、基準検出枠１００を少し下げつつ処理対象画像２０の左端に移動させた後、当該基準検出枠１００内の画像から複数の特徴量を抽出する。特徴量抽出部３０は、上下方向（列方向）において例えば１画素分あるいは数画素分だけ下に基準検出枠１００を移動させる。その後、識別器３１は、特徴量抽出部３０から抽出された複数の特徴量から成る特徴ベクトルと、複数の重み係数から成る重みベクトルとに基づいて、現在の基準検出枠１００内の画像についての検出確度値を求めて出力する。そして、識別器３１は、算出した検出確度値がしきい値以上である場合には、現在の基準検出枠１００内の画像が顔画像である可能性が高いと判定して、当該画像を検出結果領域とするとともに、当該基準検出枠１００を検出結果枠とする。 Next, as illustrated in FIG. 6, the feature amount extraction unit 30 moves the reference detection frame 100 to the left end of the processing target image 20 while slightly lowering the reference detection frame 100, and then extracts a plurality of features from the image in the reference detection frame 100. Extract the amount. The feature amount extraction unit 30 moves the reference detection frame 100 downward by, for example, one pixel or several pixels in the vertical direction (column direction). Thereafter, the discriminator 31 uses the feature vector composed of a plurality of feature amounts extracted from the feature amount extraction unit 30 and the weight vector composed of a plurality of weight coefficients to calculate the image in the current reference detection frame 100. Find and output the detection accuracy value. When the calculated detection accuracy value is equal to or greater than the threshold value, the discriminator 31 determines that the image in the current reference detection frame 100 is likely to be a face image, and detects the image. In addition to the result area, the reference detection frame 100 is set as a detection result frame.

その後、検出部３は同様に動作して、図７に示されるように、基準検出枠１００が処理対象画像２０の右下まで移動すると、検出部３は、右下の当該基準検出枠１００内の画像についての検出確度値を求める。そして、検出部３は、求めた検出確度値がしきい値以上であれば、右下の基準検出枠１００内の画像を検出結果領域とするとともに、当該右下の基準検出枠を検出結果枠とする。 Thereafter, the detection unit 3 operates in the same manner, and when the reference detection frame 100 moves to the lower right of the processing target image 20 as illustrated in FIG. 7, the detection unit 3 moves within the lower right reference detection frame 100. The detection accuracy value for the image is obtained. If the obtained detection accuracy value is equal to or greater than the threshold value, the detection unit 3 sets the image in the lower right reference detection frame 100 as a detection result region and uses the lower right reference detection frame as the detection result frame. And

以上のようにして、検出部３は、基準検出枠を使用して、処理対象画像において、当該基準検出枠と同じサイズの顔画像である可能性が高い領域を検出結果領域として検出する。言い換えれば、検出部３は、基準検出枠を使用して、処理対象画像において、当該基準検出枠と同じサイズの顔画像を特定する。 As described above, the detection unit 3 uses the reference detection frame to detect, in the processing target image, an area that is highly likely to be a face image having the same size as the reference detection frame as a detection result area. In other words, the detection unit 3 specifies a face image having the same size as the reference detection frame in the processing target image using the reference detection frame.

＜非基準検出枠を用いた検出処理＞
検出部３が非基準検出枠を使用して検出処理を行う際には、特徴量抽出部３０は、非基準検出枠の大きさが基準サイズ（基準検出枠のサイズ）と一致するように、当該非基準検出枠をサイズ変更する。そして、特徴量抽出部３０は、非基準検出枠についてのサイズ変更比率と同じだけ処理対象画像をサイズ変更する。 <Detection process using non-reference detection frame>
When the detection unit 3 performs the detection process using the non-reference detection frame, the feature amount extraction unit 30 is configured so that the size of the non-reference detection frame matches the reference size (the size of the reference detection frame). The non-reference detection frame is resized. Then, the feature amount extraction unit 30 resizes the processing target image by the same size change ratio as the non-reference detection frame.

本実施の形態では、基準サイズは１６ｐ×１６ｐであることから、例えば、大きさがＲｐ×Ｒｐ（Ｒ＞１６）の非基準検出枠が使用される場合、特徴量抽出部３０は、当該非基準検出枠の縦幅（上下方向の幅）及び横幅（左右方向の幅）をそれぞれ（１６／Ｒ）倍して当該非基準検出枠を縮小し、サイズ変更検出枠を生成する。そして、特徴量抽出部３０は、処理対象画像の縦幅（画素数）及び横幅（画素数）をそれぞれ（１６／Ｒ）倍して当該処理対象画像を縮小し、サイズ変更画像を生成する。その後、検出部３は、上述の図４〜７を用いて説明した処理と同様に、サイズ変更画像に対してサイズ変更検出枠を移動させながら、当該サイズ変更検出枠内の画像から特徴量を抽出し、当該特徴量に基づいて、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いか判定する。つまり、検出部３は、サイズ変更検出枠を用いて、サイズ変更画像において当該サイズ変更検出枠と同じサイズの顔画像である可能性が高い領域を検出する処理を行う。以後、この処理を「サイズ変更版検出処理」と呼ぶ。 In the present embodiment, since the reference size is 16p × 16p, for example, when a non-reference detection frame having a size of Rp × Rp (R> 16) is used, the feature amount extraction unit 30 performs the non-reference detection The non-reference detection frame is reduced by multiplying the vertical width (vertical width) and horizontal width (horizontal width) of the reference detection frame by (16 / R), respectively, to generate a size change detection frame. Then, the feature amount extraction unit 30 reduces the processing target image by multiplying the vertical width (number of pixels) and the horizontal width (number of pixels) of the processing target image by (16 / R), and generates a size-changed image. After that, the detection unit 3 moves the size change detection frame with respect to the size change image and moves the feature amount from the image in the size change detection frame in the same manner as the processing described with reference to FIGS. Based on the extracted feature amount, it is determined whether there is a high possibility that the image in the size change detection frame is a face image having the same size as the size change detection frame. That is, the detection unit 3 performs processing for detecting an area that is highly likely to be a face image having the same size as the size change detection frame in the size change image using the size change detection frame. Hereinafter, this process is referred to as a “size-changed version detection process”.

検出部３は、サイズ変更版検出処理において、サイズ変更画像に対してサイズ変更検出枠を設定し、当該サイズ変更検出枠内の画像が、当該サイズ変更検出枠と同じサイズの顔画像である可能性が高いと判定すると、当該画像の外形枠である当該サイズ変更検出枠を仮検出結果枠とする。 In the size change version detection process, the detection unit 3 sets a size change detection frame for the size change image, and the image in the size change detection frame may be a face image having the same size as the size change detection frame. If it is determined that the property is high, the size change detection frame that is the outer frame of the image is set as a temporary detection result frame.

検出部３では、サイズ変更画像について少なくとも一つの仮検出結果枠が得られると、識別器３１が、当該少なくとも一つの仮検出結果枠を、当該サイズ変更画像の元になる処理対象画像に応じた検出結果枠に変換する。 In the detection unit 3, when at least one temporary detection result frame is obtained for the size-changed image, the discriminator 31 determines the at least one temporary detection result frame according to the processing target image that is the basis of the size-changed image. Convert to detection result frame.

具体的には、識別器３１は、まず、サイズ変更画像に対して、得られた少なくとも一つの仮検出結果枠を設定する。図８は、サイズ変更画像１２０に対して仮検出結果枠１３０が設定されている様子を示す図である。図８の例では、サイズ変更画像１２０に対して複数の仮検出結果枠１３０が設定されている。 Specifically, the classifier 31 first sets at least one temporary detection result frame obtained for the resized image. FIG. 8 is a diagram illustrating a state in which the temporary detection result frame 130 is set for the size-changed image 120. In the example of FIG. 8, a plurality of temporary detection result frames 130 are set for the resized image 120.

次に識別器３１は、図９に示されるように、仮検出結果枠１３０が設定されたサイズ変更画像１２０を拡大（サイズ変更）して元のサイズに戻すことによって、サイズ変更画像１２０を処理対象画像２０に変換する。これにより、サイズ変更画像１２０に設定された仮検出結果枠１３０も拡大されて、仮検出結果枠１３０は、図９に示されるように、処理対象画像２０に応じた検出結果枠１５０に変換される。処理対象画像２０における検出結果枠１５０内の領域が、処理対象画像２０において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域となる。これにより、検出部３では、サイズ変更版検出処理によって得られた仮検出結果枠１３０に基づいて、処理対象画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域が特定される。 Next, as shown in FIG. 9, the classifier 31 processes the resized image 120 by enlarging (resizing) the resized image 120 in which the temporary detection result frame 130 is set and returning it to the original size. The target image 20 is converted. As a result, the temporary detection result frame 130 set in the size-changed image 120 is also enlarged, and the temporary detection result frame 130 is converted into a detection result frame 150 corresponding to the processing target image 20 as shown in FIG. The A region in the detection result frame 150 in the processing target image 20 is a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image 20. Thereby, in the detection unit 3, based on the temporary detection result frame 130 obtained by the size-changed version detection process, a detection result region that is highly likely to be a face image having the same size as the non-reference detection frame in the processing target image. Identified.

このように、検出部３は、非基準検出枠を使用して処理対象画像についての検出処理を行う際には、サイズが基準サイズと一致するようにサイズ変更した非基準検出枠と、当該非基準検出枠のサイズ変更に応じてサイズ変更した処理対象画像とを使用してサイズ変更版検出処理を行う。これにより、基準サイズとは異なるサイズの検出枠が使用される場合であっても、特徴量抽出部３０は、基準サイズの画像から特徴量を抽出できる。そして、検出部３は、サイズ変更版検出処理の結果に基づいて、処理対象画像において非基準検出枠と同じサイズの顔画像である可能性が高い検出結果領域を特定する。これにより、検出部３では非基準検出枠が用いられた検出処理が行われる。 As described above, when the detection unit 3 performs the detection process on the processing target image using the non-reference detection frame, the non-reference detection frame that has been resized so that the size matches the reference size, The size-changed version detection process is performed using the processing target image whose size has been changed according to the size change of the reference detection frame. Thereby, even when a detection frame having a size different from the reference size is used, the feature amount extraction unit 30 can extract the feature amount from the image of the reference size. And the detection part 3 pinpoints the detection result area | region with high possibility of being a face image of the same size as a non-reference | standard detection frame in a process target image based on the result of a size-change version detection process. Thus, the detection unit 3 performs detection processing using the non-reference detection frame.

検出部３は、以上のような検出処理を、複数種類の検出枠のそれぞれを用いて行う。これにより、処理対象画像に顔画像が含まれている場合には、検出結果領域（顔画像である可能性が高い領域）及び検出結果枠（顔画像である可能性が高い領域の外形枠）が得られるとともに、検出結果枠に対応した検出確度値が得られる。処理対象画像について得られた検出結果枠に対応した検出確度値とは、当該処理対象画像における当該検出結果枠内の画像が顔画像である確からしさを示している。 The detection unit 3 performs the above detection process using each of a plurality of types of detection frames. Thereby, when a face image is included in the processing target image, a detection result area (area that is highly likely to be a face image) and a detection result frame (outer frame of an area that is likely to be a face image) Is obtained, and a detection accuracy value corresponding to the detection result frame is obtained. The detection accuracy value corresponding to the detection result frame obtained for the processing target image indicates the probability that the image in the detection result frame in the processing target image is a face image.

図１０は、処理対象画像２０について得られた検出結果枠１５０が当該処理対象画像２０に重ねて配置された様子を示す図である。図１０に示されるように、互いにサイズの異なる複数種類の検出枠が使用されて検出処理が行われることによって、様々な大きさの検出結果枠１５０が得られる。言い換えれば、様々な大きさの検出結果領域が得られる。これは、処理対象画像２０に含まれる様々な大きさの顔画像が検出されていることを意味している。処理対象画像２０に含まれる各顔画像の近くには、複数の検出結果枠１５０が位置しており、複数の検出結果枠１５０は重なることがある。つまり、処理対象画像２０に含まれる各顔画像の近くには、複数の検出結果領域が位置しており、複数の検出結果領域は重なることがある。 FIG. 10 is a diagram illustrating a state in which the detection result frame 150 obtained for the processing target image 20 is arranged so as to overlap the processing target image 20. As shown in FIG. 10, detection processing frames 150 of various sizes are obtained by performing detection processing using a plurality of types of detection frames having different sizes. In other words, detection result areas of various sizes can be obtained. This means that face images of various sizes included in the processing target image 20 are detected. A plurality of detection result frames 150 are positioned near each face image included in the processing target image 20, and the plurality of detection result frames 150 may overlap. That is, a plurality of detection result areas are located near each face image included in the processing target image 20, and the plurality of detection result areas may overlap.

＜出力値マップ生成処理＞
マップ生成部４は、検出部３での検出結果に基づいて、顔画像としての確からしさ（顔画像らしさ）を示す検出確度値についての処理対象画像での分布を示す出力値マップを生成する。 <Output value map generation processing>
The map generation unit 4 generates an output value map indicating the distribution in the processing target image with respect to the detection accuracy value indicating the likelihood (face image likelihood) as the face image based on the detection result of the detection unit 3.

具体的には、マップ生成部４は、処理対象画像と同様に、行方向にＭ個の値が並び、列方向にＮ個の値が並ぶ、合計（Ｍ×Ｎ）個の値から成るマップ２００を考える。そして、マップ生成部４は、処理対象画像についての一つの検出結果枠を対象検出結果枠とし、対象検出結果枠と同じ位置に、対象検出結果枠と同じ大きさの枠２１０をマップ２００に対して設定する。図１１は、マップ２００に対して枠２１０を設定した様子を示す図である。 Specifically, the map generation unit 4 is a map composed of a total of (M × N) values in which M values are arranged in the row direction and N values are arranged in the column direction, similarly to the processing target image. Think about 200. Then, the map generation unit 4 sets one detection result frame for the processing target image as the target detection result frame, and sets a frame 210 having the same size as the target detection result frame to the map 200 at the same position as the target detection result frame. To set. FIG. 11 is a diagram illustrating a state in which a frame 210 is set for the map 200.

次にマップ生成部４は、マップ２００における、枠２１０外の各値については“０”とし、枠２１０内の各値については、対象検出結果枠に対応する検出確度値（対象検出結果枠となった検出枠内の画像に対して顔画像の検出を行った結果得られた検出確度値）を用いて決定する。対象検出結果枠の大きさが、例えば１６ｐ×１６ｐであるとすると、枠２１０内には、行方向に１６個、列方向に１６個、合計２５６個の値が存在する。また、対象検出結果枠の大きさが、例えば２０ｐ×２０ｐであるとすると、枠２１０内には、行方向に２０個、列方向に２０個、合計４００個の値が存在する。図１２は、枠２１０内の各値を決定する方法を説明するための図である。 Next, the map generation unit 4 sets “0” for each value outside the frame 210 in the map 200, and for each value within the frame 210, a detection accuracy value corresponding to the target detection result frame (the target detection result frame and The detection accuracy value obtained as a result of detecting the face image with respect to the image within the detection frame is determined. If the size of the target detection result frame is, for example, 16p × 16p, there are 16 values in the row 210, 16 in the column direction and 16 in the column direction, for a total of 256 values. Further, assuming that the size of the target detection result frame is 20p × 20p, for example, there are 20 values in the row 210 and 20 in the column direction, for a total of 400 values. FIG. 12 is a diagram for explaining a method for determining each value in the frame 210.

マップ生成部４は、枠２１０内の中心２１１の値を、検出部３で求められた、対象検出結果枠に対応する検出確度値とする。そして、マップ生成部４は、枠２１０内のそれ以外の複数の値を、枠２１０の中心２１１の値を最大値とした正規分布曲線に従って枠２１０内の中心２１１から外側に向けて値が徐々に小さくなるようにする。これにより、マップ２００を構成する複数の値のそれぞれが決定されて、対象検出結果枠に対応するマップ２００が完成する。 The map generation unit 4 sets the value of the center 211 in the frame 210 as the detection accuracy value corresponding to the target detection result frame obtained by the detection unit 3. Then, the map generation unit 4 gradually increases the values of the other values in the frame 210 from the center 211 in the frame 210 to the outside according to a normal distribution curve with the value at the center 211 of the frame 210 as the maximum value. To be smaller. Thereby, each of a plurality of values constituting the map 200 is determined, and the map 200 corresponding to the target detection result frame is completed.

以上のようにして、マップ生成部４は、処理対象画像についての複数の検出結果枠にそれぞれ対応する複数のマップ２００を生成する。そして、マップ生成部４は、生成した複数のマップ２００を合成して出力値マップを生成する。 As described above, the map generation unit 4 generates a plurality of maps 200 respectively corresponding to a plurality of detection result frames for the processing target image. And the map production | generation part 4 synthesize | combines the produced | generated several map 200, and produces | generates an output value map.

具体的には、マップ生成部４は、生成した複数のマップ２００のｍ×ｎ番目の値を加算し、それによって得られた加算値を出力値マップのｍ×ｎ番目の検出確度値とする。マップ生成部４は、このようにして、出力値マップを構成する各検出確度値を求める。これにより、処理対象画像での検出確度値の分布を示す出力値マップが完成する。出力値マップでは、処理対象画像と同様に、行方向にＭ個の検出確度値が並び、列方向にＮ個の検出確度値が並んでいる。出力値マップは（Ｍ×Ｎ）個の検出確度値で構成される。出力値マップを参照すれば、処理対象画像において顔画像らしさが高い領域を特定することができる。つまり、出力値マップを参照することによって、処理対象画像おける顔画像を特定することができる。 Specifically, the map generation unit 4 adds the m × n-th values of the plurality of generated maps 200, and uses the obtained addition value as the m × n-th detection accuracy value of the output value map. . In this way, the map generation unit 4 obtains each detection accuracy value constituting the output value map. As a result, an output value map indicating the distribution of detection accuracy values in the processing target image is completed. In the output value map, similarly to the processing target image, M detection accuracy values are arranged in the row direction and N detection accuracy values are arranged in the column direction. The output value map is composed of (M × N) detection accuracy values. By referring to the output value map, it is possible to specify a region having a high likelihood of a face image in the processing target image. That is, the face image in the processing target image can be specified by referring to the output value map.

図１３は、処理対象画像２０についての出力値マップを当該処理対象画像２０に重ねて示す図である。図１３では、理解し易いように、検出確度値の大きさを例えば第１段階から第５段階の５段階に分けて出力値マップを示している。図１３及び後述の図１５に示される出力値マップにおいては、検出確度値が、最も大きい第５段階に属する領域については縦線のハッチングが示されており、２番目に大きい第４段階に属する領域については砂地のハッチングが示されている。また、図１３及び図１５に示される出力値マップにおいては、検出確度値が、３番目に大きい第３段階に属する領域については右上がりのハッチングが示されており、４番目に大きい第２段階に属する領域については左上がりのハッチングが示されている。そして、図１３及び図１５に示される出力値マップにおいては、検出確度値が、最も小さい第１段階に属する領域についてはハッチングが示されていない。 FIG. 13 is a diagram showing an output value map for the processing target image 20 superimposed on the processing target image 20. In FIG. 13, for easy understanding, the output value map is shown by dividing the magnitude of the detection accuracy value into, for example, five stages from the first stage to the fifth stage. In the output value map shown in FIG. 13 and FIG. 15 described later, vertical line hatching is shown for the region belonging to the fifth stage where the detection accuracy value is the largest, and belongs to the second largest fourth stage. The area is shown as sand hatching. In the output value maps shown in FIG. 13 and FIG. 15, the region belonging to the third stage where the detection accuracy value is the third largest shows a right-up hatching, and the fourth largest second stage. For areas belonging to, left-upward hatching is shown. In the output value maps shown in FIGS. 13 and 15, hatching is not shown for the region belonging to the first stage having the smallest detection accuracy value.

図１３に示される出力値マップにおいては、処理対象画像２０での顔画像に対応する領域（顔画像と同じ位置にある領域）での検出確度値が高くなっている。これは、処理対象画像２０に含まれる顔画像が適切に検出されていることを意味する。また、出力値マップにおける、処理対象画像２０での顔画像に対応する領域では、顔画像の中心付近と同じ位置での検出確度値が最も大きくなっており、外側に向かうほど検出確度値が小さくなっている。 In the output value map shown in FIG. 13, the detection accuracy value in the region corresponding to the face image in the processing target image 20 (region in the same position as the face image) is high. This means that the face image included in the processing target image 20 is properly detected. In the output value map, in the region corresponding to the face image in the processing target image 20, the detection accuracy value at the same position as the vicinity of the center of the face image is the largest, and the detection accuracy value decreases toward the outside. It has become.

＜２値化処理＞
２値化処理部７は、マップ生成部４で生成された出力値マップをしきい値を用いて２値化して２値化マップを生成する。具体的に、２値化処理部７は、出力値マップにおいて、検出確度値がしきい値以上あるいは当該しきい値よりも大きい領域の各値を例えば“１”に変更し、検出確度値が当該しきい値未満あるいは当該しきい値以下の領域の各値を例えば“０”に変更する。これにより、出力値マップにおける、検出確度値がしきい値以上あるいは当該しきい値よりも大きい領域に対応する、各値が“１”である高確度領域と、出力値マップにおける、検出確度値がしきい値未満あるいは当該しきい値以下の領域に対応する、各値が“０”である低確度領域とで構成された２値化マップが生成される。 <Binarization processing>
The binarization processing unit 7 binarizes the output value map generated by the map generation unit 4 using a threshold value to generate a binarized map. Specifically, the binarization processing unit 7 changes each value in a region where the detection accuracy value is equal to or larger than the threshold value or larger than the threshold value to, for example, “1” in the output value map, and the detection accuracy value is Each value in the area less than or less than the threshold value is changed to, for example, “0”. Thus, a high accuracy region where each value is “1” corresponding to a region where the detection accuracy value is greater than or equal to the threshold value or larger than the threshold value in the output value map, and the detection accuracy value in the output value map A binarized map composed of low-accuracy regions each having a value of “0” corresponding to a region of less than or less than the threshold value is generated.

図１４は処理対象画像２０の一例を模式的に示す図である。図１５は、図１４に示される処理対象画像２０についての出力値マップ４０を示す図である。図１６は、図１５に示される出力値マップ４０を所定のしきい値を用いて２値化して生成された２値化マップ５０を示す図である。 FIG. 14 is a diagram schematically illustrating an example of the processing target image 20. FIG. 15 is a diagram showing an output value map 40 for the processing target image 20 shown in FIG. FIG. 16 is a diagram showing a binarization map 50 generated by binarizing the output value map 40 shown in FIG. 15 using a predetermined threshold value.

図１５に示されるように、出力値マップ４０では、処理対象画像２０に含まれる顔画像２０ａに対応する領域４０ａでの検出確度値や、処理対象画像２０に含まれる顔画像２０ｂに対応する領域４０ｂでの検出確度値は大きくなっている。一方で、出力値マップ４０では、処理対象画像２０に含まれる顔画像２０ｃに対応する領域４０ｃでの検出確度値は小さくなっている。 As shown in FIG. 15, in the output value map 40, the detection accuracy value in the area 40 a corresponding to the face image 20 a included in the processing target image 20 and the area corresponding to the face image 20 b included in the processing target image 20. The detection accuracy value at 40b is large. On the other hand, in the output value map 40, the detection accuracy value in the region 40c corresponding to the face image 20c included in the processing target image 20 is small.

図１５に示される出力値マップ４０を、例えば、検出確度値についての第２段階（左上がりのハッチング）と第３段階（右上がりのハッチング）の境界の値をしきい値として２値化すると、図１６に示される２値化マップ５０が得られる。図１６では、高確度領域５１には斜線のハッチングが示されており、低確度領域５２にはハッチングが示されていない。出力値マップ４０では、顔画像２０ｃに対応する領域４０ｃでの検出確度値は、全体的に、顔画像２０ａ，２０ｂに対応する領域４０ａ，４０ｂでの検出確度値よりもが小さくなっていることから、２値化マップ５０の高確度領域５１では、顔画像２０ｃに対応する領域５１ｃは、顔画像２０ａ，２０ｂにそれぞれ対応する領域５１ａ，５１ｂよりも小さくなっている。 For example, when the output value map 40 shown in FIG. 15 is binarized using, as a threshold value, a boundary value between the second stage (upward hatching) and the third stage (upward hatching) of the detection accuracy value. A binarized map 50 shown in FIG. 16 is obtained. In FIG. 16, hatching is shown in the high accuracy region 51, and hatching is not shown in the low accuracy region 52. In the output value map 40, the detection accuracy value in the region 40c corresponding to the face image 20c is generally smaller than the detection accuracy value in the regions 40a and 40b corresponding to the face images 20a and 20b. Thus, in the high accuracy region 51 of the binarized map 50, the region 51c corresponding to the face image 20c is smaller than the regions 51a and 51b corresponding to the face images 20a and 20b, respectively.

２値化マップ５０の生成で用いられるしきい値を適切に調整すると、図１６に示されるように、２値化マップ５０の高確度領域５１には、処理対象画像２０に含まれる複数の顔画像２０ａ〜２０ｃにそれぞれ対応する互いに独立した（分離した）複数の領域５１ａ〜５１ｃが含まれるようになる。よって、当該複数の領域５１ａ〜５１ｃから、処理対象画像２０に含まれる複数の顔画像２０ａ〜２０ｃのそれぞれを個別に特定することが可能となる。２値化マップ５０の生成で用いられるしきい値については、後述するように、しきい値調整部６で適切に調整される。 When the threshold value used in generating the binarized map 50 is appropriately adjusted, a plurality of faces included in the processing target image 20 are included in the high accuracy area 51 of the binarized map 50 as shown in FIG. A plurality of independent (separated) regions 51a to 51c corresponding to the images 20a to 20c are included. Therefore, each of the plurality of face images 20a to 20c included in the processing target image 20 can be individually specified from the plurality of regions 51a to 51c. The threshold value used in the generation of the binarization map 50 is appropriately adjusted by the threshold value adjustment unit 6 as will be described later.

＜検出対象画像特定処理＞
独立領域特定部８は、しきい値調整部６で調整されたしきい値が用いられて２値化処理部７で生成された２値化マップの高確度領域に含まれる独立領域（島領域）を特定する。図１６の例では、領域５１ａ〜５１ｃのそれぞれが独立領域として特定される。以後、しきい値調整部６で調整されたしきい値が用いられて生成された２値化マップを特に「特定用２値化マップ」と呼ぶ。特定用２値化マップの高確度領域に含まれる独立領域については、当該特定用２値化マップに対して４連結等を用いたラベリングを行うことによって特定することができる。 <Detection target image specifying process>
The independent area specifying unit 8 uses the threshold adjusted by the threshold adjustment unit 6 and includes an independent area (island area) included in the high accuracy area of the binarization map generated by the binarization processing unit 7. ). In the example of FIG. 16, each of the areas 51a to 51c is specified as an independent area. Hereinafter, the binarization map generated by using the threshold adjusted by the threshold adjustment unit 6 is particularly referred to as “specific binarization map”. The independent area included in the high-accuracy area of the specifying binarization map can be specified by performing labeling using 4-connection or the like on the specifying binarization map.

検出対象画像特定部９は、独立領域特定部８で特定された独立領域に基づいて、処理対象画像において顔画像を特定する。具体的には、まず検出対象画像特定部９は、独立領域特定部８で特定された各独立領域について、当該独立領域に外接する外接矩形を求める。図１７は、例えば図１６に示される２値化マップ５０が特定用２値化マップとして使用され、検出対象画像特定部９が、当該２値化マップ５０の高確度領域５１に含まれる独立領域５１ａ〜５１ｃについての外接矩形を求めた際の当該外接矩形を示す図である。図１７に示される外接矩形３００ａ〜３００ｃは、それぞれ、図１６に示される２値化マップ５０の高確度領域５１に含まれる独立領域５１ａ〜５１ｃの外接矩形である。 The detection target image specifying unit 9 specifies a face image in the processing target image based on the independent region specified by the independent region specifying unit 8. Specifically, the detection target image specifying unit 9 first obtains a circumscribed rectangle that circumscribes the independent region for each independent region specified by the independent region specifying unit 8. In FIG. 17, for example, the binarization map 50 shown in FIG. 16 is used as the binarization map for identification, and the detection target image identification unit 9 is an independent area included in the high accuracy area 51 of the binarization map 50. It is a figure which shows the said circumscribed rectangle at the time of calculating | requiring the circumscribed rectangle about 51a-51c. The circumscribed rectangles 300a to 300c shown in FIG. 17 are circumscribed rectangles of the independent areas 51a to 51c included in the high accuracy area 51 of the binarized map 50 shown in FIG.

検出対象画像特定部９は、特定用２値化マップの高確度領域の各独立領域についての外接矩形を求めると、当該外接矩形を処理対象画像に設定する。図１８は、図１７に示される外接矩形３００ａ〜３００ｃを図１４に示される処理対象画像２０に設定した様子を示す図である。検出対象画像特定部９は、処理対象画像に設定された各外接矩形について、当該外接矩形内の画像が一つの顔画像であると判断する。これにより、処理対象画像２０において顔画像が特定される。 When the detection target image specifying unit 9 obtains a circumscribed rectangle for each independent region of the high-accuracy region of the specifying binarized map, the detection target image specifying unit 9 sets the circumscribed rectangle as a processing target image. 18 is a diagram illustrating a state in which the circumscribed rectangles 300a to 300c illustrated in FIG. 17 are set in the processing target image 20 illustrated in FIG. For each circumscribed rectangle set as the processing target image, the detection target image specifying unit 9 determines that the image in the circumscribed rectangle is one face image. Thereby, a face image is specified in the processing target image 20.

画像検出装置１は、処理対象画像を表示装置に表示する際には、図１８に示されるように、検出対象画像特定部９で求められた外接矩形を処理対象画像に重ねて表示する。 When the processing target image is displayed on the display device, the image detection device 1 displays the circumscribed rectangle obtained by the detection target image specifying unit 9 so as to overlap the processing target image, as shown in FIG.

また、画像検出装置１は、予め登録された顔画像と、処理対象画像において特定した顔画像（外接矩形内の画像）とを比較し、両者が一致するか否かを判定しても良い。そして、画像検出装置１は、予め登録された顔画像と、処理対象画像において特定した顔画像とが一致しない場合には、処理対象画像での当該顔画像に対してモザイク処理を行った上で、当該処理対象画像を表示装置に表示しても良い。これにより、本実施の形態に係る画像検出装置１を監視カメラシステムに使用した場合において、監視カメラによって隣家の人の顔画像が撮影された場合であっても、当該顔画像を認識できないようにすることができる。つまり、プライバシーマスクを実現することができる。 Further, the image detection apparatus 1 may compare a face image registered in advance with a face image specified in the processing target image (an image in a circumscribed rectangle) and determine whether or not they match. If the face image registered in advance and the face image specified in the processing target image do not match, the image detection apparatus 1 performs mosaic processing on the face image in the processing target image. The processing target image may be displayed on the display device. As a result, when the image detection apparatus 1 according to the present embodiment is used in a surveillance camera system, even when a face image of a neighbor's person is photographed by the surveillance camera, the face image cannot be recognized. can do. That is, a privacy mask can be realized.

＜しきい値調整処理＞
２値化処理部７が出力値マップを２値化する際に使用するしきい値が適切に設定されないと、画像検出装置１は処理対象画像から顔画像を正しく検出できない可能性がある。以下にこの点について説明する。 <Threshold adjustment processing>
If the threshold used when the binarization processing unit 7 binarizes the output value map is not set appropriately, the image detection apparatus 1 may not be able to correctly detect the face image from the processing target image. This point will be described below.

図１９は、図１５に示される出力値マップ４０を、図１６に示される２値化マップ５０の生成で使用されたしきい値よりも小さいしきい値で２値化して得られる２値化マップ５０を示す図である。 FIG. 19 shows a binarization obtained by binarizing the output value map 40 shown in FIG. 15 with a threshold smaller than the threshold used in the generation of the binarization map 50 shown in FIG. It is a figure which shows the map.

出力値マップ４０が２値化される際のしきい値が小さい場合には、出力値マップ４０において検出確度値があまり大きくない領域についても高確度領域５１となる。したがって、図１９に示されるように、高確度領域５１では、距離が近い顔画像２０ａ，２０ｂに対応する領域５１ａ，５１ｂが連結して一つの独立領域となることがある。この場合には、図１９に示される２値化マップ５０の高確度領域５１に含まれる各独立領域についての外接矩形が求められると、図２０に示されるように、領域５１ａ，５１ｂから成る独立領域に外接する外接矩形３００ｄと、領域５１ｃに外接する外接矩形３００ｃとが生成される。 When the threshold value when the output value map 40 is binarized is small, a region where the detection accuracy value is not so large in the output value map 40 also becomes the high accuracy region 51. Accordingly, as shown in FIG. 19, in the high accuracy region 51, the regions 51a and 51b corresponding to the face images 20a and 20b that are close to each other may be connected to form one independent region. In this case, when a circumscribed rectangle for each independent area included in the high-accuracy area 51 of the binarization map 50 shown in FIG. 19 is obtained, an independent area composed of areas 51a and 51b is obtained as shown in FIG. A circumscribed rectangle 300d circumscribing the region and a circumscribed rectangle 300c circumscribing the region 51c are generated.

外接矩形３００ｃ，３００ｄが処理対象画像２０に設定されると、図２１に示されるように、二つの顔画像２０ａ，２０ｂに対して一つの外接矩形３００ｄが設定され、顔画像２０ｃに対して一つの外接矩形３００ｃが設定される。検出対象画像特定部９は、処理対象画像での一つの外接矩形内の画像を一つの顔画像とすることから、処理対象画像２０から顔画像２０ｃについては適切に検出することができるものの、顔画像２０ａ，２０ｂについては一つの顔画像として特定され、顔画像２０ａ，２０ｂのそれぞれを個別に検出することが困難となる。 When the circumscribed rectangles 300c and 300d are set as the processing target image 20, as shown in FIG. 21, one circumscribed rectangle 300d is set for the two face images 20a and 20b, and one for the face image 20c. Two circumscribed rectangles 300c are set. Since the detection target image specifying unit 9 sets the image within one circumscribed rectangle in the processing target image as one face image, the face image 20c can be appropriately detected from the processing target image 20, but the face The images 20a and 20b are specified as one face image, and it becomes difficult to individually detect the face images 20a and 20b.

図２２は、図１５に示される出力値マップ４０を、図１６に示される２値化マップ５０の生成で使用されたしきい値よりも大きいしきい値で２値化して得られる２値化マップ５０を示す図である。 FIG. 22 shows a binarization obtained by binarizing the output value map 40 shown in FIG. 15 with a threshold value larger than the threshold value used in the generation of the binarization map 50 shown in FIG. It is a figure which shows the map.

出力値マップ４０が２値化される際のしきい値が大きい場合には、出力値マップ４０において検出確度値があまり大きくない領域については高確度領域５１とならない。したがって、図２２に示されるように、出力値マップ４０での対応する領域での検出確度値が小さい顔画像２０ｃについては、当該顔画像２０ｃに対応する領域が高確度領域５１に含まれないことがある。この場合には、図２２に示される２値化マップ５０の高確度領域５１に含まれる各独立領域についての外接矩形が求められると、図２３に示されるように、領域５１ａに外接する外接矩形３００ａと、領域５１ｂに外接する外接矩形３００ｂとが生成される。 When the threshold value when the output value map 40 is binarized is large, a region where the detection accuracy value is not so large in the output value map 40 does not become the high accuracy region 51. Therefore, as shown in FIG. 22, for the face image 20 c having a small detection accuracy value in the corresponding region in the output value map 40, the region corresponding to the face image 20 c is not included in the high accuracy region 51. There is. In this case, when a circumscribed rectangle for each independent area included in the high accuracy area 51 of the binarized map 50 shown in FIG. 22 is obtained, a circumscribed rectangle circumscribing the area 51a is obtained as shown in FIG. 300a and a circumscribed rectangle 300b circumscribing the region 51b are generated.

外接矩形３００ａ，３００ｂが処理対象画像２０に設定されると、図２４に示されるように、顔画像２０ａ，２０ｂに対して外接矩形３００ａ，３００ｂがそれぞれ設定されるものの、顔画像２０ｃには外接矩形が設定されない。したがって、顔画像２０ａ，２０ｂについては検出できるものの、顔画像２０ｃについては検出することが困難となる。 When the circumscribed rectangles 300a and 300b are set as the processing target image 20, the circumscribed rectangles 300a and 300b are set for the face images 20a and 20b, respectively, as shown in FIG. The rectangle is not set. Therefore, although it is possible to detect the face images 20a and 20b, it is difficult to detect the face image 20c.

このように、２値化マップの生成で使用されるしきい値が小さい場合には、近い距離にある複数の顔画像を適切に検出することが困難となる。 As described above, when the threshold value used for generating the binarized map is small, it is difficult to appropriately detect a plurality of face images at close distances.

一方で、２値化マップの生成で使用されるしきい値が大きい場合には、出力値マップでの対応する領域の検出確度値が小さい顔画像を適切に検出することが困難となる。 On the other hand, when the threshold value used in generating the binarized map is large, it is difficult to appropriately detect a face image having a small detection accuracy value of the corresponding region in the output value map.

そこで、本実施の形態では、検出対象画像特定部９が、処理対象画像において、出力値マップでの対応する領域の検出確度値が小さい顔画像を特定することができるとともに、距離が近い複数の顔画像のそれぞれを個別に特定することができるように、しきい値調整部６が２値化マップの生成で用いられるしきい値を適切に調整する。以下にしきい値調整部６がしきい値を調整する際の画像検出装置１の動作について詳細に説明する。 Therefore, in the present embodiment, the detection target image specifying unit 9 can specify a face image with a small detection accuracy value of the corresponding region in the output value map in the processing target image, and a plurality of distances that are close to each other. The threshold value adjusting unit 6 appropriately adjusts the threshold value used in the generation of the binarized map so that each face image can be specified individually. Hereinafter, the operation of the image detection apparatus 1 when the threshold adjustment unit 6 adjusts the threshold will be described in detail.

図２５は画像検出装置１でのしきい値調整処理を示すフローチャートである。しきい値調整処理では、図２５に示されるように、まずステップｓ１において、グループ分け処理部５が、検出部３で検出された複数の検出結果領域（検出結果枠）をグループ分けする。後述するように、グループ分け処理部５は、異なる顔画像に対応する検出結果領域は異なるグループとなり、かつ同じ顔画像に対応する検出結果領域は同じグループとなる可能性が高くなるように、検出部３で検出された複数の検出結果領域（検出結果枠）をグループ分けするため、グループ分け処理部５で得られたグループの数は、処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。よって、ステップｓ１においては、検出部３で検出された複数の検出結果領域がグループ分けされることによって、処理対象画像に含まれる顔画像の数が推定されると言える。グループ分け処理部５の動作については後で詳細に説明する。 FIG. 25 is a flowchart showing threshold adjustment processing in the image detection apparatus 1. In the threshold adjustment process, as shown in FIG. 25, first, in step s1, the grouping processing unit 5 groups a plurality of detection result areas (detection result frames) detected by the detection unit 3. As will be described later, the grouping processing unit 5 performs detection so that detection result areas corresponding to different face images are in different groups, and detection result areas corresponding to the same face image are likely to be in the same group. Since the plurality of detection result areas (detection result frames) detected by the unit 3 are grouped, the number of groups obtained by the grouping processing unit 5 may match the number of face images included in the processing target image. Increases nature. Therefore, in step s1, it can be said that the number of face images included in the processing target image is estimated by grouping a plurality of detection result areas detected by the detection unit 3. The operation of the grouping processing unit 5 will be described in detail later.

次にステップｓ２において、しきい値調整部６は、２値化マップの生成で使用されるしきい値を２値化処理部７に仮設定する。ここでは、できるだけ小さい値のしきい値が２値化処理部７に仮設定される。そして、ステップｓ３において、２値化処理部７は、ステップｓ２で仮設定されたしきい値を用いてマップ生成部４で生成された出力値マップを２値化し、２値化マップを生成する。 Next, in step s2, the threshold adjustment unit 6 provisionally sets a threshold used for generating the binarization map in the binarization processing unit 7. Here, a threshold value having a value as small as possible is temporarily set in the binarization processing unit 7. In step s3, the binarization processing unit 7 binarizes the output value map generated by the map generation unit 4 using the threshold value temporarily set in step s2, and generates a binarized map. .

次にステップｓ４において、独立領域特定部８は、ステップｓ３で生成された２値化マップの高確度領域に含まれる独立領域を特定する。 Next, in step s4, the independent area specifying unit 8 specifies an independent area included in the high accuracy area of the binarized map generated in step s3.

次にステップｓ５において、しきい値調整部６は、ステップｓ４で特定された独立領域の数が、ステップｓ１でのグループ分け処理部５のグループ分けで得られたグループの数と一致するかを判断する。つまり、しきい値調整部６は、ステップｓ４で特定された独立領域の数が、ステップｓ１で推定された、処理対象画像に含まれる顔画像の数と一致するかを判断する。 Next, in step s5, the threshold adjustment unit 6 determines whether the number of independent areas specified in step s4 matches the number of groups obtained by grouping by the grouping processing unit 5 in step s1. to decide. That is, the threshold adjustment unit 6 determines whether or not the number of independent areas specified in step s4 matches the number of face images included in the processing target image estimated in step s1.

ステップｓ５において、ステップｓ４で特定された独立領域の数が、ステップｓ１のグループ分け処理で得られたグループの数と一致しないと判断されると、上述のステップｓ２が再度実行されて、２値化マップの生成で使用されるしきい値が２値化処理部７に仮設定される。ここでのステップｓ２においては、前回の値よりも所定量だけ大きいしきい値が２値化処理部７に仮設定される。その後、ステップｓ３が実行され、以後、画像検出装置１は同様に動作する。 In step s5, when it is determined that the number of independent areas specified in step s4 does not match the number of groups obtained in the grouping process in step s1, the above-described step s2 is executed again to obtain a binary value. A threshold value used in generating the binarization map is provisionally set in the binarization processing unit 7. In step s <b> 2 here, a threshold value that is larger than the previous value by a predetermined amount is temporarily set in the binarization processing unit 7. Thereafter, step s3 is executed, and thereafter the image detection apparatus 1 operates in the same manner.

ステップｓ５において、ステップｓ４で特定された独立領域の数が、ステップｓ１のグループ分け処理で得られたグループの数と一致すると判断されると、ステップｓ６において、しきい値調整部６は、ステップｓ２で仮設定したしきい値を適切なしきい値として決定する。これにより、しきい値調整処理が終了する。 If it is determined in step s5 that the number of independent areas specified in step s4 matches the number of groups obtained in the grouping process in step s1, the threshold adjustment unit 6 in step s6 The threshold value temporarily set in s2 is determined as an appropriate threshold value. As a result, the threshold value adjustment process ends.

このように、本実施の形態では、しきい値調整部６は、独立領域特定部８で特定される独立領域の数が、グループ分け処理部５で得られたグループの数（処理対象画像に含まれる顔画像の数）と一致するように、２値化マップの生成で使用されるしきい値を調整している。 As described above, in the present embodiment, the threshold adjustment unit 6 determines that the number of independent regions specified by the independent region specifying unit 8 is the number of groups obtained by the grouping processing unit 5 (the processing target image). The threshold value used in the generation of the binarized map is adjusted to match the number of face images included.

なお、本例では、しきい値を少しずつ増加させることによって、独立領域特定部８で特定される独立領域の数が、グループ分け処理部５で得られたグループの数と一致するようなしきい値を求めているが、最初に仮設定するしきい値をできるだけ大きくしておいて、しきい値を少しずつ減少させることによって、独立領域特定部８で特定される独立領域の数が、グループ分け処理部５で得られたグループの数と一致するようなしきい値を求めても良い。 In this example, the threshold is increased little by little so that the number of independent regions specified by the independent region specifying unit 8 matches the number of groups obtained by the grouping processing unit 5. The number of independent areas specified by the independent area specifying unit 8 is determined by increasing the threshold value initially set as much as possible and decreasing the threshold value little by little. A threshold value that matches the number of groups obtained by the division processing unit 5 may be obtained.

ステップｓ６において適切なしきい値が決定されると、検出対象画像特定部９は、当該適切なしきい値、つまりしきい値調整部６で調整されたしきい値が用いられて生成された特定用２値化マップの高確度領域に含まれる独立領域に基づいて、上述のように処理対象画像において顔画像を特定する。特定用２値化マップの高確度領域に含まれる独立領域の数は、処理対象画像に含まれる顔画像の数と一致する可能性が高いことから、当該独立領域に基づいて処理対象画像において顔画像を特定することによって、処理対象画像に含まれる各顔画像を個別に特定できる可能性が向上する。よって、顔画像についての検出精度が向上する。 When an appropriate threshold value is determined in step s6, the detection target image specifying unit 9 uses the appropriate threshold value, that is, the threshold value adjusted by the threshold value adjusting unit 6, to generate the specifying value. Based on the independent area included in the high accuracy area of the binarized map, the face image is specified in the processing target image as described above. Since there is a high possibility that the number of independent regions included in the high-accuracy region of the binarization map for identification matches the number of face images included in the processing target image, the face in the processing target image is based on the independent region. By specifying the image, the possibility that each face image included in the processing target image can be specified individually is improved. Therefore, the detection accuracy for the face image is improved.

なお、ステップｓ６の後に、２値化処理部７が調整後のしきい値を用いて特定用２値化マップを生成しても良いし、ステップｓ６の直近に実行されたステップｓ３で生成された２値化マップ（ステップｓ６で適切であると判断されたしきい値が用いられて生成された２値化マップ）が特定用２値化マップとされても良い。 Note that after step s6, the binarization processing unit 7 may generate the specifying binarization map using the adjusted threshold value, or may be generated in step s3 executed immediately after step s6. Alternatively, the binarization map (the binarization map generated using the threshold determined to be appropriate in step s6) may be used as the specifying binarization map.

＜グループ分け処理について＞
次にステップｓ１のグループ分け処理について詳細に説明する。図２６はグループ分け処理部５の構成を示す図である。図２６に示されるように、グループ分け処理部５は、クラスタリング処理部５００と、分離度取得部５１０と、使用グループ数決定部５２０とを備えている。 <About grouping process>
Next, the grouping process in step s1 will be described in detail. FIG. 26 is a diagram showing the configuration of the grouping processing unit 5. As shown in FIG. 26, the grouping processing unit 5 includes a clustering processing unit 500, a separation degree acquiring unit 510, and a use group number determining unit 520.

クラスタリング処理部５００は、予め定められた複数種類のグループ数のそれぞれについて、検出部３で検出された複数の検出結果領域を当該グループ数でグループ分けする。本実施の形態では、クラスタリング処理部５００は、検出結果領域の位置、大きさ及びそれに対応する検出確度値に基づいて、検出部３で検出された複数の検出結果領域をグループ分けする。つまり、クラスタリング処理部５００は、検出結果領域の位置、大きさ及びそれに対応する検出確度値が近いもの同士ができるだけ同じグループとなるように、検出部３で検出された複数の検出結果領域をグループ分けする。 The clustering processing unit 500 divides a plurality of detection result areas detected by the detection unit 3 into groups according to the number of groups for each of a plurality of types of groups determined in advance. In the present embodiment, the clustering processing unit 500 groups a plurality of detection result regions detected by the detection unit 3 based on the position and size of the detection result region and the corresponding detection accuracy value. That is, the clustering processing unit 500 groups a plurality of detection result regions detected by the detection unit 3 so that the detection result regions having similar positions, sizes, and corresponding detection accuracy values are as much as possible. Divide.

分離度取得部５１０は、複数種類のグループ数のそれぞれについて、クラスタリング処理部５００において検出部３で検出された複数の検出結果領域が当該グループ数でグループ分けされた結果得られる複数のグループの間での分離の程度を示す分離度を求める。分離度は、複数のグループが相違している程度を示す相違度とも言える。本実施の形態では、分離度取得部５１０は、クラスタリング処理部５００で得られた複数のグループの間において、検出結果領域の位置、大きさ及びそれに対応する検出確度値が分離（相違）している程度を示す分離度を求める。したがって、分離度は、複数のグループの間において、検出結果領域の位置、大きさ及びそれに対応する検出確度値が離れているほど高くなる。 The separability obtaining unit 510 is configured to obtain, for each of a plurality of types of groups, between a plurality of groups obtained as a result of grouping a plurality of detection result areas detected by the detection unit 3 in the clustering processing unit 500 by the number of groups. The degree of separation indicating the degree of separation at is obtained. The degree of separation can also be said to be a degree of difference indicating the degree to which a plurality of groups are different. In the present embodiment, the degree-of-separation obtaining unit 510 separates (differs) the position and size of the detection result region and the corresponding detection accuracy value among the plurality of groups obtained by the clustering processing unit 500. The degree of separation indicating the degree to which it is present is obtained. Therefore, the degree of separation increases as the position and size of the detection result region and the corresponding detection accuracy value are separated between the plurality of groups.

使用グループ数決定部５２０は、複数種類のグループ数のうち、検出部３で検出された複数の検出結果領域がそのグループ数でグループ分けされた結果得られる複数のグループの間での分離度が最も高いグループ数を適切なグループ数として、当該適切なグループ数を、しきい値調整部６がしきい値の調整で使用する使用グループ数として決定する。 The use group number determination unit 520 has a degree of separation between a plurality of groups obtained as a result of grouping a plurality of detection result areas detected by the detection unit 3 out of a plurality of types of groups. The highest number of groups is determined as the appropriate number of groups, and the appropriate number of groups is determined as the number of groups used by the threshold adjustment unit 6 for adjusting the threshold.

後述のように、クラスタリング処理部５００において、検出部３で検出された複数の検出結果領域が使用グループ数（適切なグループ数）でグループ分けされることによって得られる複数のグループは、処理対象画像に含まれる複数の顔画像にそれぞれ対応する可能性が高くなる。したがって、しきい値調整部６が、独立領域特定部８で特定される独立領域の数が使用グループ数（適切なグループ数）と一致するようにしきい値を調整することによって、調整後のしきい値が使用された生成された２値化マップの高確度領域に含まれる独立領域の数は、処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。よって、処理対象画像に含まれる各顔画像を個別に特定できる可能性が高くなる。 As will be described later, in the clustering processing unit 500, a plurality of groups obtained by grouping a plurality of detection result areas detected by the detection unit 3 by the number of used groups (appropriate number of groups) There is a high possibility of corresponding to a plurality of face images included in. Therefore, the threshold adjustment unit 6 adjusts the threshold value so that the number of independent areas specified by the independent area specifying unit 8 matches the number of used groups (appropriate number of groups). There is a high possibility that the number of independent regions included in the high-accuracy region of the generated binarized map using the threshold value matches the number of face images included in the processing target image. Accordingly, there is a high possibility that each face image included in the processing target image can be specified individually.

以下に、クラスタリング処理部５００、分離度取得部５１０及び使用グループ数決定部５２０の動作について詳細に説明する。 Hereinafter, the operations of the clustering processing unit 500, the separation degree acquiring unit 510, and the used group number determining unit 520 will be described in detail.

＜クラスタリング処理＞
本実施の形態では、クラスタリング処理部５００は、例えば、K-means法と呼ばれるクラスタリング手法を使用して、検出部３で検出された複数の検出結果領域をグループ分け（クラスタリング）する。クラスタリングにおいては、グループ分けを行う際の各グループがクラスタと呼ばれている。以下では、まずK-means法について説明し、その後、クラスタリング処理部５００でのK-means法を用いたグループ分けについて説明する。 <Clustering processing>
In the present embodiment, the clustering processing unit 500 groups (clusters) a plurality of detection result areas detected by the detection unit 3 using, for example, a clustering technique called a K-means method. In clustering, each group used for grouping is called a cluster. Hereinafter, the K-means method will be described first, and then grouping using the K-means method in the clustering processing unit 500 will be described.

K-means法では、グループ分けを行う際のグループ数ｋが予め決まっている。そして、K-means法では、クラスタごとに設定されるクラスタ中心と呼ばれる基準点が使用されて、グループ分け対象のデータ群（以後、「対象データ群」と呼ぶ）がグループ数ｋでグループ分けされる。対象データ群を構成する各データを「対象データ」と呼ぶ。 In the K-means method, the number k of groups for grouping is determined in advance. In the K-means method, a reference point called a cluster center set for each cluster is used, and data groups to be grouped (hereinafter referred to as “target data groups”) are grouped by the number k of groups. The Each data constituting the target data group is referred to as “target data”.

K-means法では、以下の式（１）で表される評価関数φを最小化する各クラスタのクラスタ中心を見付けることによって、対象データ群Ｔがｋ個のクラスタにグループ分けされる。 In the K-means method, the target data group T is grouped into k clusters by finding the cluster center of each cluster that minimizes the evaluation function φ expressed by the following equation (1).

式（１）において、ｔｊは番号ｊの対象データを意味している。対象データ群Ｔを構成する対象データの総数をｒ（ｒ≧２）とすると１≦ｊ≦ｒとなる。また式（１）において、ｃｉ（１≦ｉ≦ｋ）は、番号ｉのクラスタについてのクラスタ中心を意味している。 In formula (1), tj means the target data of number j. When the total number of target data constituting the target data group T is r (r ≧ 2), 1 ≦ j ≦ r. In the formula (1), ci (1 ≦ i ≦ k) means the cluster center for the cluster of number i.

式（１）の右辺の「ｍｉｎ||ｔｊ−ｃｉ||^２」は、対象データを表す特徴量（例えばＸ座標及びＹ座標）についての特徴量空間（例えばＸＹ平面）において、対象データｔｊの位置（詳細には、対象データｘｉを表す特徴量で特定される当該対象データｔｊの位置）に対して最も距離が近いクラスタ中心と、当該対象データｔｊの位置との間の距離の２乗を意味している。つまり、「ｍｉｎ||ｔｊ−ｃｉ||^２」は、特徴量空間においての対象データｔｊの位置についてクラスタ中心との最短距離の２乗を意味している。そして、式（１）の右辺のシグマは、対象データ群Ｔを構成する複数の対象データの位置についてのクラスタ中心との最短距離の２乗の総和を意味している。評価関数φを最小化する各クラスタのクラスタ中心については、図２７に示される処理を行うことによって見付けることができる。 “Min || tj−ci || ² ” on the right side of the expression (1) represents the target data tj in the feature amount space (eg, XY plane) for the feature amount (eg, X coordinate and Y coordinate) representing the target data. The square of the distance between the cluster center closest to the position (specifically, the position of the target data tj specified by the feature value representing the target data xi) and the position of the target data tj is calculated. I mean. That is, “min || tj−ci || ² ” means the square of the shortest distance from the cluster center with respect to the position of the target data tj in the feature amount space. The sigma on the right side of Equation (1) means the sum of the squares of the shortest distances from the cluster center with respect to the positions of a plurality of target data constituting the target data group T. The cluster center of each cluster that minimizes the evaluation function φ can be found by performing the processing shown in FIG.

図２７はK-means法を説明するためのフローチャートである。K-means法では、まずステップｓ１１において、対象データ群を構成する各データに対してｋ個のクラスタの中からランダムに一つの初期のクラスタが割り当てられる。 FIG. 27 is a flowchart for explaining the K-means method. In the K-means method, first, in step s11, one initial cluster is randomly assigned from the k clusters to each data constituting the target data group.

次にステップｓ１２において、各クラスタについての初期のクラスタ中心が求められる。番目ｉのクラスタのクラスタ中心ｃｉについては、例えば、以下の式（２）で求められる。 Next, in step s12, an initial cluster center for each cluster is determined. For example, the cluster center ci of the i-th cluster is obtained by the following equation (2).

式（２）において、Ｃｉは、番号ｉのクラスタ（グループ）が割り当てられた複数の対象データから成るデータ群を意味しており、｜Ｃｉ｜は当該データ群での対象データの総数を意味している。そして、式（２）の右辺は、番号ｉのクラスタが割り当てられた複数の対象データについての特徴量空間での位置の平均値を示している。したがって、本実施の形態では、番目ｉのクラスタのクラスタ中心ｃｉは、番号ｉのクラスタが割り当てられた複数の対象データの位置の平均値となっている。なお、クラスタ中心については他の方法で決定して良い。 In Equation (2), Ci means a data group composed of a plurality of target data to which a cluster (group) of number i is assigned, and | Ci | means the total number of target data in the data group. ing. The right side of Expression (2) indicates an average value of positions in the feature amount space for a plurality of target data to which the cluster of number i is assigned. Therefore, in the present embodiment, the cluster center ci of the i-th cluster is the average value of the positions of a plurality of target data to which the cluster with the number i is assigned. The cluster center may be determined by other methods.

次にステップｓ１３において、対象データ群Ｔを構成する複数の対象データのそれぞれに対して、ｋ個のクラスタ中心のうち特徴量空間において当該対象データについての位置に最も近いクラスタ中心を有するクラスタが割り当てられる。これにより、対象データ群Ｔを構成する複数の対象データのそれぞれに対してクラスタが割り当て直される。 Next, in step s13, a cluster having a cluster center closest to the position of the target data in the feature amount space among the k cluster centers is assigned to each of the plurality of target data constituting the target data group T. It is done. As a result, the cluster is reassigned to each of the plurality of target data constituting the target data group T.

次にステップｓ１４において、上述の式（２）が用いられてｋ個のクラスタのクラスタ中心が求め直される。 Next, in step s14, the cluster center of k clusters is re-determined using the above equation (2).

次にステップｓ１５において、各クラスタのクラスタ中心が動かなくなったか判断される。あるクラスタについて、ステップｓ１４で求められたクラスタ中心と、それよりも一つ前に求められたクラスタ中心との差分の絶対値がしきい値よりも小さい場合には、当該あるクラスタのクラスタ中心は動かなくなったと判断される。一方で、当該絶対値がしきい値以上の場合には、当該あるクラスタのクラスタ中心は動いていると判断される。 Next, in step s15, it is determined whether the cluster center of each cluster has stopped moving. When the absolute value of the difference between the cluster center obtained in step s14 and the cluster center obtained immediately before it is smaller than the threshold for a certain cluster, the cluster center of the given cluster is Judged to stop moving. On the other hand, if the absolute value is greater than or equal to the threshold value, it is determined that the cluster center of the certain cluster is moving.

ステップｓ１５において、ｋ個のクラスタのうちの一つでもそのクラスタ中心が動いていると判断されると、ステップｓ１３が再度実行されて、対象データ群Ｔを構成する複数の対象データのそれぞれに対してクラスタが割り当て直される。以後、同様にしてステップｓ１４，ｓ１５が実行される。 If it is determined in step s15 that the cluster center is moving even in one of the k clusters, step s13 is executed again for each of a plurality of target data constituting the target data group T. The cluster is reassigned. Thereafter, steps s14 and s15 are executed in the same manner.

一方で、ステップｓ１５において、ｋ個のクラスタのクラスタ中心がすべて動かなくなったと判断されると、クラスタリングが終了する。ｋ個のクラスタのクラスタ中心がすべて動かなくなったということは、上記の評価関数φが最小値となったことを意味している。クラスタリングが終了した際に、対象データ群Ｔを構成する各対象データに割り当てられているクラスタが、当該対象データが最終的に属するクラスタとなり、対象データ群Ｔが複数のグループに分けられることになる。 On the other hand, if it is determined in step s15 that all the cluster centers of the k clusters have stopped moving, clustering ends. The fact that all the cluster centers of k clusters have stopped moving means that the evaluation function φ has become the minimum value. When clustering is completed, the cluster assigned to each target data constituting the target data group T becomes the cluster to which the target data finally belongs, and the target data group T is divided into a plurality of groups. .

以上のようなK-means法を用いて、クラスタリング処理部５００は、検出部３で検出された複数の検出結果領域をグループ分けする。本実施の形態では、検出結果領域が対象データとなり、検出部３で検出された複数の検出結果領域が対象データ群Ｔとなる。また、本実施の形態では、対象データである検出結果領域の特徴を表す特徴量として、例えば、処理対象画像での検出結果領域の位置、大きさ及びそれに対応する検出確度値ｖが使用される。 Using the K-means method as described above, the clustering processing unit 500 groups a plurality of detection result regions detected by the detection unit 3. In the present embodiment, the detection result area is the target data, and the plurality of detection result areas detected by the detection unit 3 are the target data group T. In the present embodiment, for example, the position and size of the detection result region in the processing target image and the detection accuracy value v corresponding to the feature amount representing the feature of the detection result region that is the target data are used. .

ここで、本実施の形態では、処理対象画像の左上の角を原点とし、行方向をｘ軸方向とし、列方向をｙ軸方向とするＸＹ平面が処理対象画像に定められている。本実施の形態では、検出結果領域を表す特徴量の一つである検出結果領域の位置が、例えば、ＸＹ平面上での検出結果領域の左上の角のＸ座標ｘ及びＹ座標ｙで表される。また、検出結果領域を表す特徴量の一つである検出結果領域の大きさが、検出結果領域の幅ｗ（Ｘ軸方向の長さ）及び高さｈ（Ｙ軸方向の長さ）で表される。したがって、クラスタリング処理部５００は、K-means法を用いて検出部３で検出された複数の検出結果領域をグループ分けする際には、各検出結果領域を、当該検出結果領域の左上の角についてのＸ座標ｘ及びＹ座標ｙと、当該検出結果領域の幅ｗ及び高さｈと、当該検出結果領域に対応する検出確度値ｖとから成る５次元の特徴量で表現する。よって、クラスタリング処理部５００では、当該５次元の特徴量についての特徴量空間（以後、「５次元特徴量空間」と呼ぶ）がクラスタリングで使用される。 Here, in the present embodiment, an XY plane having the upper left corner of the processing target image as the origin, the row direction as the x axis direction, and the column direction as the y axis direction is defined as the processing target image. In the present embodiment, the position of the detection result area, which is one of the feature amounts representing the detection result area, is represented by, for example, the X coordinate x and Y coordinate y of the upper left corner of the detection result area on the XY plane. The The size of the detection result area, which is one of the feature amounts representing the detection result area, is represented by the width w (length in the X-axis direction) and height h (length in the Y-axis direction) of the detection result area. Is done. Therefore, when the clustering processing unit 500 groups a plurality of detection result areas detected by the detection unit 3 using the K-means method, the clustering processing unit 500 sets each detection result area for the upper left corner of the detection result area. This is expressed by a five-dimensional feature amount consisting of the X coordinate x and Y coordinate y, the width w and height h of the detection result area, and the detection accuracy value v corresponding to the detection result area. Therefore, in the clustering processing unit 500, a feature amount space (hereinafter referred to as “5-dimensional feature amount space”) for the five-dimensional feature amount is used in clustering.

以後、検出結果領域の左上の角についてのＸＹ平面上でのＸ座標ｘ及びＹ座標ｙをそれぞれ「左上Ｘ座標ｘ」及び「左上Ｙ座標ｙ」と呼ぶことがある。検出結果領域は、左上Ｘ座標ｘ、左上Ｙ座標ｙ、幅ｗ、高さｈ及び検出確度値ｖの５次元の特徴量（５次元のパラメータ）で表される。図２８は、ある検出結果領域１６０（検出結果枠１５０）についての当該５次元の特徴量を示す図である。 Hereinafter, the X coordinate x and Y coordinate y on the XY plane for the upper left corner of the detection result area may be referred to as “upper left X coordinate x” and “upper left Y coordinate y”, respectively. The detection result area is represented by a five-dimensional feature amount (a five-dimensional parameter) of an upper left X coordinate x, an upper left Y coordinate y, a width w, a height h, and a detection accuracy value v. FIG. 28 is a diagram illustrating the five-dimensional feature amount for a certain detection result region 160 (detection result frame 150).

クラスタリング処理部５００は、予め定められた複数種類のグループ数のそれぞれについて、検出部３で検出された複数の検出結果領域を当該グループ数でグループ分けする。本実施の形態では、例えば、グループ数（クラスタ数）ｋ＝２，３，４，５となっている。したがって、クラスタリング処理部５００は、検出部３で検出された複数の検出結果領域を、２つのグループに分ける処理と、３つのグループに分ける処理と、４つのグループに分ける処理と、５つのグループに分ける処理とを行う。 The clustering processing unit 500 divides a plurality of detection result areas detected by the detection unit 3 into groups according to the number of groups for each of a plurality of types of groups determined in advance. In the present embodiment, for example, the number of groups (number of clusters) k = 2, 3, 4, and 5. Accordingly, the clustering processing unit 500 divides a plurality of detection result areas detected by the detection unit 3 into two groups, three groups, four groups, and five groups. The process to divide.

なお、グループ数の種類の数については、処理対象画像から検出すべき顔画像の数に応じて決定される。例えば、撮像部が家の玄関前に設置される場合には、処理対象画像に含まれる顔画像の数が５を越える可能性は低いことから、本例のように、グループ数ｋ＝２，３，４，５とする。 The number of types of groups is determined according to the number of face images to be detected from the processing target image. For example, when the imaging unit is installed in front of the entrance of the house, since the possibility that the number of face images included in the processing target image exceeds 5 is low, the number of groups k = 2, as in this example. 3,4,5.

図２９は、上述の図１４に示される処理対象画像２０から検出された複数の検出結果領域１６０（検出結果枠１５０）が当該処理対象画像２０に重ねて配置された様子を示す図である。また図３０は、図２９に示される複数の検出結果領域１６０が例えば３つのグループ６００に分けられた様子を示す図である。 FIG. 29 is a diagram showing a state in which a plurality of detection result areas 160 (detection result frames 150) detected from the processing target image 20 shown in FIG. FIG. 30 is a diagram illustrating a state in which the plurality of detection result areas 160 illustrated in FIG. 29 are divided into, for example, three groups 600.

次に、クラスタリング処理部５００が、検出部３で検出された複数の検出結果領域をグループ数ｋでグループ分けする場合の当該クラスタリング処理部５００の動作を上述の図２７を参照しながら説明する。以後、検出部３で検出された複数の検出結果領域を「対象検出結果領域群」と呼ぶ。 Next, the operation of the clustering processing unit 500 when the clustering processing unit 500 groups a plurality of detection result areas detected by the detection unit 3 by the number k of groups will be described with reference to FIG. 27 described above. Hereinafter, the plurality of detection result areas detected by the detection unit 3 are referred to as “target detection result area groups”.

クラスタリング処理部５００は、まずステップｓ１１において、対象検出結果領域群を構成する各検出結果領域に対してｋ個のクラスタの中からランダムに一つの初期のクラスタを割り当てる。 First, in step s11, the clustering processing unit 500 randomly assigns one initial cluster out of k clusters to each detection result region constituting the target detection result region group.

次にステップｓ１２において、クラスタリング処理部５００は、上記の式（２）を用いて各クラスタについての初期のクラスタ中心を求める。 Next, in step s12, the clustering processing unit 500 obtains an initial cluster center for each cluster using the above equation (2).

次にステップｓ１３において、クラスタリング処理部５００は、対象検出結果領域群を構成する複数の検出結果領域のそれぞれに対して、５次元特徴量空間において、ｋ個のクラスタ中心のうち当該検出結果領域の位置（詳細には、当該検出結果領域を表す左上Ｘ座標ｘ、左上Ｙ座標ｙ、幅ｗ、高さｈ及び検出確度値ｖの５つの特徴量で特定される当該検出結果領域の位置）に最も近いクラスタ中心を有するクラスタが割り当てられる。これにより、対象検出結果領域群を構成する複数の検出結果領域のそれぞれに対してクラスタが割り当て直される。 Next, in step s13, the clustering processing unit 500, for each of a plurality of detection result areas constituting the target detection result area group, in the five-dimensional feature amount space, of the detection result area of the k cluster centers. The position (specifically, the position of the detection result area specified by the five feature amounts of the upper left X coordinate x, the upper left Y coordinate y, the width w, the height h, and the detection accuracy value v representing the detection result area). The cluster with the closest cluster center is assigned. As a result, the cluster is reassigned to each of the plurality of detection result areas constituting the target detection result area group.

次にステップｓ１４において、クラスタリング処理部５００は、上述の式（２）を用いてｋ個のクラスタのクラスタ中心を求め直す。 Next, in step s14, the clustering processing unit 500 recalculates the cluster centers of the k clusters using the above equation (2).

次にステップｓ１５において、クラスタリング処理部５００は、上述のようにして、各クラスタのクラスタ中心が動かなくなったかを判断する。 In step s15, the clustering processing unit 500 determines whether the cluster center of each cluster has stopped moving as described above.

ステップｓ１５において、ｋ個のクラスタのうちの一つでもそのクラスタ中心が動いていると判断されると、ステップｓ１３が再度実行されて、対象検出結果領域群を構成する複数の検出結果領域のそれぞれに対してクラスタが割り当て直される。以後、同様にしてステップｓ１４，ｓ１５が実行される。 If it is determined in step s15 that the cluster center is moving even in one of the k clusters, step s13 is executed again, and each of the plurality of detection result areas constituting the target detection result area group is detected. The cluster is reassigned to Thereafter, steps s14 and s15 are executed in the same manner.

一方で、ステップｓ１５において、ｋ個のクラスタのクラスタ中心がすべて動かなくなったと判断されると、クラスタリング処理部５００はクラスタリングを終了する。クラスタリングが終了した際に、対象検出結果領域群を構成する各検出結果領域に割り当てられているクラスタが、当該検出結果領域が最終的に属するクラスタとなる。よって、対象検出結果領域群がｋ個のグループに分けられる。 On the other hand, when it is determined in step s15 that all the cluster centers of the k clusters have stopped moving, the clustering processing unit 500 ends the clustering. When clustering is completed, the cluster assigned to each detection result area constituting the target detection result area group becomes the cluster to which the detection result area finally belongs. Therefore, the target detection result region group is divided into k groups.

このように、クラスタリング処理部５００では、検出結果領域の特徴を表す特徴量に基づいて、K-means法が用いられて対象検出結果領域群がｋ個のグループに分けられている。したがって、特徴量が近い検出結果領域は一つのグループに集められる可能性が高くなる。 As described above, in the clustering processing unit 500, the target detection result region group is divided into k groups using the K-means method based on the feature amount representing the feature of the detection result region. Therefore, there is a high possibility that detection result areas having similar feature amounts are collected in one group.

＜分離度算出処理及び使用グループ数決定処理＞
クラスタリング処理部５００において、対象検出結果領域群がグループ数ｋでグループ分けされる際に、グループ数ｋの値が小さければ、一つのグループの中に特徴量があまり近くない複数の検出結果領域が含まれる可能性が高くなる。一方で、グループ数ｋの値が大きければ、特徴量が近い複数の検出結果領域が一つのグループに含まれず互いに別のグループに含まれる可能性が高くなる。 <Separation degree calculation process and used group number determination process>
When the target detection result region group is grouped by the group number k in the clustering processing unit 500, if the value of the group number k is small, a plurality of detection result regions whose feature amounts are not very close in one group. It is likely to be included. On the other hand, if the value of the group number k is large, there is a high possibility that a plurality of detection result regions having similar feature amounts are not included in one group but are included in different groups.

このように、対象検出結果領域群がグループ分けされる際に使用されるグループ数ｋの値によっては、一つのグループにおいて特徴量があまり近くない検出結果領域が含まれたり、特徴量が近い検出結果領域が異なるグループに含まれたりすることがある。 In this way, depending on the value of the number of groups k used when the target detection result region group is grouped, detection result regions that do not have very close feature amounts in one group are included, or detection that features are close to each other The result area may be included in different groups.

そこで、本実施の形態では、分離度取得部５１０が、対象検出結果領域群がグループ分けされて得られる複数のグループの間での分離の程度を示す分離度を求めて、使用グループ数決定部５２０が、当該分離度に基づいて適切なグループ数を決定する。つまり、使用グループ数決定部５２０は、当該分離度に基づいて、対象検出結果領域群に対してそのグループ数でグループ分けを行った際に、特徴量が近くない検出結果領域は異なるグループとなり、かつ特徴量が近い検出結果領域は同じグループとなる可能性が高くなるようなグループ数を求める。以下に、分離度取得部５１０と使用グループ数決定部５２０の動作について説明する。 Therefore, in the present embodiment, the degree-of-separation obtaining unit 510 obtains the degree of separation indicating the degree of separation between a plurality of groups obtained by grouping the target detection result region group, and uses the number-of-use-group determining unit. 520 determines an appropriate number of groups based on the degree of separation. That is, when the use group number determination unit 520 performs grouping on the target detection result region group by the number of groups based on the degree of separation, the detection result regions that are not close to the feature amount become different groups. In addition, the number of groups is obtained such that the detection result areas having similar feature amounts are likely to be the same group. Hereinafter, the operations of the degree-of-separation obtaining unit 510 and the used group number determining unit 520 will be described.

分離度取得部５１０は、クラスタリング処理部５００においてグループ数ｋで対象検出結果領域群がグループ分けされて得られるｋ個のグループの間での分離の程度を示す分離度αを以下の式（３）を用いて求める。 The degree-of-separation acquiring unit 510 sets the degree of separation α indicating the degree of separation between k groups obtained by grouping the target detection result region groups by the number k of groups in the clustering processing unit 500 according to the following formula (3 ).

式（３）でのグループ内共分散Ｉ及びグループ間共分散Ｄは以下の式（４），（５）で表される。 In-group covariance I and inter-group covariance D in Expression (3) are expressed by Expressions (4) and (5) below.

式（４）及び式（５）において、ｒｐ（１≦ｐ≦ｋ）は、番号ｐのグループ（クラスタ）に含まれる複数の検出結果領域の総数を意味しており、ｒｑ（１≦ｑ≦ｋ）は番号ｑのグループに含まれる複数の検出結果領域の総数を意味している。 In Expression (4) and Expression (5), rp (1 ≦ p ≦ k) means the total number of detection result regions included in the group (cluster) of number p, and rq (1 ≦ q ≦ k) means the total number of detection result areas included in the group of number q.

式（４）において、σｘｐ、σｙｐ、σｗｐ、σｈｐ及びσｖｐは、番号ｐのグループに含まれる複数の検出結果領域での左上Ｘ座標ｘ、左上Ｙ座標ｙ、幅ｗ、高さｈ及び検出確度値ｖの標準偏差をそれぞれ意味している。式（５）において、μｘｐ、μｙｐ、μｗｐ、μｈｐ及びμｖｐは、番号ｐのグループに含まれる複数の検出結果領域での左上Ｘ座標ｘ、左上Ｙ座標ｙ、幅ｗ、高さｈ及び検出確度値ｖの平均値をそれぞれ意味している。式（５）において、μｘａ、μｙａ、μｗａ、μｈａ及びμｖａは、対象検出結果領域群全体での左上Ｘ座標ｘ、左上Ｙ座標ｙ、幅ｗ、高さｈ及び検出確度値ｖの平均値をそれぞれ意味している。 In Expression (4), σxp, σyp, σwp, σhp, and σvp are the upper left X coordinate x, the upper left Y coordinate y, the width w, the height h, and the detection accuracy in a plurality of detection result areas included in the group of number p. Each means the standard deviation of the value v. In Expression (5), μxp, μyp, μwp, μhp, and μvp are the upper left X coordinate x, the upper left Y coordinate y, the width w, the height h, and the detection accuracy in a plurality of detection result areas included in the number p group. Each means the average value v. In Expression (5), μxa, μya, μwa, μha, and μva are average values of the upper left X coordinate x, upper left Y coordinate y, width w, height h, and detection accuracy value v in the entire target detection result region group. Each means.

式（４）に示されるグループ内共分散Ｉは、各グループ内での検出結果領域の特徴量のばらつきを示している。グループ内共分散Ｉが小さいということは、各グループ内において検出結果領域の特徴量のばらつきが小さいことを意味している。ｋ個のグループのそれぞれにおいて検出結果領域の特徴量のばらつきが小さいということは、ｋ個のグループの間において特徴量が分離されていると考えることができる。したがって、本実施の形態では、式（３）から理解できるように、グループ内共分散Ｉが小さくなると分離度αが大きくなる。 The intra-group covariance I shown in Expression (4) indicates the variation in the feature amount of the detection result region within each group. The fact that the intragroup covariance I is small means that the variation in the feature amount of the detection result region is small in each group. The fact that the variation in the feature amount of the detection result region in each of the k groups is small can be considered that the feature amount is separated among the k groups. Therefore, in this embodiment, as can be understood from the equation (3), the separation degree α increases as the intra-group covariance I decreases.

一方で、式（５）に示されるグループ間共分散Ｄは、ｋ個のグループ間での検出結果領域の特徴量のばらつきを示している。グループ間での検出結果領域の特徴量のばらつきが大きいということは、グループ間において特徴量が分離されていると考えることができることから、本実施の形態では、式（３）から理解できるように、グループ間共分散Ｄが大きくなると分離度αが大きくなる。 On the other hand, the inter-group covariance D shown in Expression (5) indicates the variation in the feature amount of the detection result region among the k groups. The large variation in the feature amount of the detection result region between the groups can be considered that the feature amount is separated between the groups. Therefore, in this embodiment, as can be understood from Expression (3). As the inter-group covariance D increases, the degree of separation α increases.

このように、グループ数ｋについての分離度αは、ｋ個のグループの間において特徴量が分離されている程度を示している。 Thus, the degree of separation α for the number k of groups indicates the degree to which feature quantities are separated among k groups.

分離度取得部５１０は、クラスタリング処理部５００が使用した複数種類のグループ数（ｋ＝２，３，４，５）のそれぞれについて、上記の式（３）〜（５）を使用して分離度αを求める。 The degree-of-separation acquiring unit 510 uses the above equations (3) to (5) for each of a plurality of types of groups (k = 2, 3, 4, 5) used by the clustering processing unit 500. Find α.

使用グループ数決定部５２０は、クラスタリング処理部５００が使用した複数種類のグループ数のうち、それに対応する分離度αが最も高いグループ数を適切なグループ数とする。例えば、上述の図１４に示される処理対象画像２０に関して、ｋ＝２についての分離度αが“１．４”、ｋ＝３についての分離度αが“５．３”、ｋ＝４についての分離度αが“０．５”、ｋ＝５についての分離度αが“２．２”であるとすると、分離度α＝５．３に対応するグループ数ｋ＝３が、適切なグループ数であると決定される。あるグループ数についての分離度αが高いということは、当該グループ数でグループ分けされて得られるｋ個のグループの間において特徴量が適切に分離されていることを意味している。したがって、分離度αが高いグループ数でグループ分けされることによって、特徴量が近くない検出結果領域は異なるグループとなり、かつ特徴量が近い検出結果領域は同じグループとなる可能性が高くなる。 The used group number determination unit 520 sets the number of groups having the highest degree of separation α corresponding to the number of groups used by the clustering processing unit 500 as an appropriate group number. For example, regarding the processing target image 20 shown in FIG. 14 described above, the separation degree α for k = 2 is “1.4”, the separation degree α for k = 3 is “5.3”, and k = 4. If the degree of separation α is “0.5” and the degree of separation α for k = 5 is “2.2”, the number of groups k = 3 corresponding to the degree of separation α = 5.3 is the appropriate number of groups. It is determined that A high degree of separation α for a certain number of groups means that feature quantities are appropriately separated among k groups obtained by grouping by the number of groups. Therefore, by grouping with the number of groups having a high degree of separation α, detection result areas that are not close to each other in feature quantity are different groups, and detection result areas that are close to feature quantities are likely to be the same group.

ここで、対象検出結果領域群において、処理対象画像での同じ顔画像に対応する複数の検出結果領域の間では、それらの特徴量（ｘ、ｙ、ｗ、ｈ及びｖ）が近くなる可能性が高い。言い換えれば、対象検出結果領域群において、処理対象画像での異なる顔画像に対応する複数の検出結果領域の間では、それらの特徴量が相違する可能性が高い。したがって、特徴量が近くない検出結果領域は異なるグループとなり、かつ特徴量が近い検出結果領域は同じグループとなる可能性が高くなるように、対象検出結果領域群をグループ分けすることによって、異なる顔画像に対応する検出結果領域は異なるグループとなり、かつ同じ顔画像に対応する検出結果領域は同じグループとなる可能性が高くなる。つまり、対象検出結果領域群が、使用グループ数決定部５２０で決定された適切なグループ数でグループ分けされることによって、異なる顔画像に対応する検出結果領域は異なるグループとなり、かつ同じ顔画像に対応する検出結果領域は同じグループとなる可能性が高くなる。よって、適切なグループ数、つまり、対象検出結果領域群が適切にグループ分けされて得られる複数のグループの数は、処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。 Here, in the target detection result region group, the feature amounts (x, y, w, h, and v) may be close between a plurality of detection result regions corresponding to the same face image in the processing target image. Is expensive. In other words, in the target detection result region group, there is a high possibility that the feature amounts of the plurality of detection result regions corresponding to different face images in the processing target image are different. Therefore, the detection result regions that are not close to the feature amount are in different groups, and the detection result regions that are close to the feature amount are more likely to be in the same group. The detection result areas corresponding to the images are in different groups, and the detection result areas corresponding to the same face image are likely to be in the same group. That is, the target detection result region group is grouped by the appropriate number of groups determined by the use group number determination unit 520, so that the detection result regions corresponding to different face images become different groups, and the same face image. Corresponding detection result areas are more likely to be in the same group. Therefore, there is a high possibility that the appropriate number of groups, that is, the number of a plurality of groups obtained by appropriately grouping the target detection result region groups, matches the number of face images included in the processing target image.

使用グループ数決定部５２０は、クラスタリング処理部５００が使用した複数種類のグループ数から適切なグループ数を決定すると、当該適切なグループ数を、しきい値調整部６がしきい値調整で使用する使用グループ数とする。しきい値調整部６は、独立領域特定部８が特定する、２値化マップの高確度領域に含まれる独立領域の数が、使用グループ数決定部５２０で決定された使用グループ数と一致するように、２値化マップの生成で使用するしきい値を調整する。 When the use group number determination unit 520 determines an appropriate number of groups from the plurality of types of groups used by the clustering processing unit 500, the threshold adjustment unit 6 uses the appropriate number of groups for threshold adjustment. The number of groups used. In the threshold adjustment unit 6, the number of independent regions included in the high-accuracy region of the binarized map specified by the independent region specifying unit 8 matches the number of used groups determined by the used group number determining unit 520. As described above, the threshold value used in the generation of the binarization map is adjusted.

以上のように、本実施の形態では、グループ分け処理部５が、検出結果領域の特徴量に基づいて対象検出結果領域群を適切にグループ分けすることによって、得られた複数のグループの数が、処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。図１４に示される処理対象画像２０に関する上記の例では、適切なグループ数、つまり対象検出結果領域群を適切にグループ分けすることによって得られた複数のグループの数が“３”となっており、このグループの数は、図１４に示される処理対象画像２０に含まれる顔画像２０ａ〜２０ｃの数と一致している。したがって、しきい値調整部６が、２値化マップの高確度領域に含まれる独立領域の数が、グループ分け処理部５で得られたグループの数と一致するように、２値化マップの生成で使用されるしきい値を調整することによって、２値化マップの高確度領域には、処理対象画像に含まれる複数の顔画像にそれぞれ対応する複数の独立領域が含まれる可能性が高くなる。よって、２値化マップの高確度領域に含まれる独立領域に基づいて検出対象画像を特定することによって、処理対象画像に含まれる複数の顔画像のそれぞれを個別に特定することができる可能性が高くなる。その結果、顔画像についての検出精度が向上する。 As described above, in the present embodiment, the grouping processing unit 5 appropriately groups the target detection result region group based on the feature amount of the detection result region, thereby obtaining the number of groups obtained. There is a high possibility that the number of face images included in the processing target image matches. In the above example regarding the processing target image 20 shown in FIG. 14, the appropriate number of groups, that is, the number of the plurality of groups obtained by appropriately grouping the target detection result region group is “3”. The number of groups coincides with the number of face images 20a to 20c included in the processing target image 20 shown in FIG. Therefore, the threshold adjustment unit 6 makes the binarization map so that the number of independent regions included in the high accuracy region of the binarization map matches the number of groups obtained by the grouping processing unit 5. By adjusting the threshold value used in generation, the high-accuracy region of the binarized map is likely to include a plurality of independent regions respectively corresponding to a plurality of face images included in the processing target image. Become. Therefore, by specifying the detection target image based on the independent region included in the high accuracy region of the binarized map, there is a possibility that each of the plurality of face images included in the processing target image can be specified individually. Get higher. As a result, the detection accuracy for the face image is improved.

なお上記の例では、グループ分け処理部５は、検出結果領域の特徴量として、検出結果領域の位置（ｘ及びｙ）、検出結果領域の大きさ（ｗ及びｈ）、検出結果領域に対応する検出確度値のすべてを使用してグループ分け処理を行っているが、これらのうち少なくとも検出結果領域の位置を使用してグループ分け処理を行っても良い。 In the above example, the grouping processing unit 5 corresponds to the position of the detection result area (x and y), the size of the detection result area (w and h), and the detection result area as the feature amount of the detection result area. Although the grouping process is performed using all the detection accuracy values, the grouping process may be performed using at least the position of the detection result area among them.

また、本実施の形態では、サイズの異なる複数種類の検出枠が使用されて検出処理が行われているが、一種類の検出枠だけが使用されて検出処理が行われて良い。この場合には、検出結果領域の特徴量として検出結果領域の大きさは使用されなくても良い。本実施の形態のように、サイズの異なる複数種類の検出枠が使用されて検出処理が行われる場合において、検出結果領域の特徴量として検出結果領域の大きさが使用される際には、検出結果領域の大きさが使用されないときと比較して、グループ分け処理部５で得られた複数のグループの数が処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。よって、２値化マップの高確度領域に含まれる独立領域に基づいて検出対象画像を特定することによって、処理対象画像に含まれる複数の顔画像のそれぞれを個別に特定することができる可能性が高くなり、顔画像についての検出精度が向上する。 Further, in the present embodiment, detection processing is performed using a plurality of types of detection frames having different sizes, but only one type of detection frame may be used for detection processing. In this case, the size of the detection result area may not be used as the feature amount of the detection result area. When the detection process is performed using a plurality of types of detection frames having different sizes as in the present embodiment, when the size of the detection result area is used as the feature amount of the detection result area, the detection is performed. Compared with the case where the size of the result area is not used, there is a higher possibility that the number of groups obtained by the grouping processing unit 5 matches the number of face images included in the processing target image. Therefore, by specifying the detection target image based on the independent region included in the high accuracy region of the binarized map, there is a possibility that each of the plurality of face images included in the processing target image can be specified individually. The detection accuracy for the face image is improved.

また、本実施の形態のように、検出結果領域の特徴量として検出確度値が使用される際には、検出確度値が使用されないときと比較して、グループ分け処理部５で得られた複数のグループの数が処理対象画像に含まれる顔画像の数と一致する可能性が高くなる。よって、処理対象画像に含まれる複数の顔画像のそれぞれを個別に特定することができる可能性が高くなり、顔画像についての検出精度が向上する。 Further, as in the present embodiment, when the detection accuracy value is used as the feature amount of the detection result region, the plurality of pieces obtained by the grouping processing unit 5 are compared with the case where the detection accuracy value is not used. There is a high possibility that the number of groups matches the number of face images included in the processing target image. Therefore, there is a high possibility that each of the plurality of face images included in the processing target image can be individually specified, and the detection accuracy for the face image is improved.

上記の例では、対象検出結果領域群のグループ分けに、K-means法が使用されているが、グループ分け処理部５は対象検出結果領域群を他の方法でグループ分けしても良い。例えば、グループ分け処理部５は、Mean-Shift法を用いて対象検出結果領域群をグループ分け（クラスタリング）しても良い。 In the above example, the K-means method is used for grouping the target detection result region group. However, the grouping processing unit 5 may group the target detection result region group by another method. For example, the grouping processing unit 5 may group (cluster) the target detection result region group using the Mean-Shift method.

＜変形例＞
上記の例では、出力値マップを使用して処理対象画像において顔画像を特定していたが、出力値マップを使用せずに処理対象画像において顔画像を特定しても良い。以下に出力値マップを使用しない顔画像の特定方法の一例について説明する。 <Modification>
In the above example, the face image is specified in the processing target image using the output value map, but the face image may be specified in the processing target image without using the output value map. An example of a face image specifying method that does not use an output value map will be described below.

図３１は本変形例に係る画像検出装置１の機能ブロックを示す図である。本変形例では、検出対象画像特定部９０が、グループ分け処理部５において対象検出結果領域群が適切にグループ分けされて得られる複数のグループに基づいて処理対象画像において顔画像を特定する。 FIG. 31 is a diagram showing functional blocks of the image detection apparatus 1 according to this modification. In this modification, the detection target image specifying unit 90 specifies a face image in the processing target image based on a plurality of groups obtained by appropriately grouping the target detection result region groups in the grouping processing unit 5.

検出対象画像特定部９０は、グループ分け処理部５において適切なしきい値でグループ分けされることによって得られた複数のグループのそれぞれについて、当該グループに対応する顔画像を処理対象画像において特定する。以後、当該複数のグループのそれぞれを「最終グループ」と呼ぶ。また、説明対象の最終グループを「対象グループ」と呼ぶ。 The detection target image specifying unit 90 specifies, for each of a plurality of groups obtained by grouping with an appropriate threshold value in the grouping processing unit 5, a face image corresponding to the group in the processing target image. Hereinafter, each of the plurality of groups is referred to as a “final group”. The final group to be explained is called a “target group”.

図３２は、処理対象画像において対象グループ６００に対応する顔画像２０ｚを特定する方法を説明するための図である。図３２の例では、対象グループ６００には検出結果領域１６０ａ〜１６０ｆが含まれている。 FIG. 32 is a diagram for explaining a method for specifying the face image 20z corresponding to the target group 600 in the processing target image. In the example of FIG. 32, the target group 600 includes detection result areas 160a to 160f.

検出対象画像特定部９０は、処理対象画像において対象グループ６００に対応する顔画像２０ｚを特定する際には、対象グループ６００内の複数の検出結果領域１６０ａ〜１６０ｆでの左上Ｘ座標ｘの中心値ｘｃと左上Ｙ座標ｙの中心値ｙｃを求める。中心値ｘｃは、対象グループ６００内の複数の検出結果領域１６０ａ〜１６０ｆの左上Ｘ座標ｘにおける最大値ｘｍａｘ及び最小値ｘｍｉｎに基づいて以下の式（６）を使用して求めることができる。 When the detection target image specifying unit 90 specifies the face image 20z corresponding to the target group 600 in the processing target image, the center value of the upper left X coordinate x in the plurality of detection result regions 160a to 160f in the target group 600. The center value yc of xc and the upper left Y coordinate y is obtained. The center value xc can be obtained using the following formula (6) based on the maximum value xmax and the minimum value xmin in the upper left X coordinate x of the plurality of detection result regions 160a to 160f in the target group 600.

また中心値ｙｃは、対象グループ６００内の複数の検出結果領域１６０ａ〜１６０ｆの左上Ｙ座標ｙにおける最大値ｙｍａｘ及び最小値ｙｍｉｎに基づいて以下の式（７）を使用して求めることができる。 The center value yc can be obtained using the following formula (7) based on the maximum value ymax and the minimum value ymin at the upper left Y coordinate y of the plurality of detection result regions 160a to 160f in the target group 600.

次に検出対象画像特定部９０は、対象グループ６００内における、左上Ｘ座標ｘが中心値ｘｃ以上である複数の検出結果領域１６０ｄ，１６０ｅ，１６０ｆの左上Ｘ座標ｘと中心値ｘｃとの間の距離の平均値を求める。この平均値を右側平均距離と呼ぶ。 Next, the detection target image specifying unit 90 includes, in the target group 600, between the upper left X coordinate x and the central value xc of the plurality of detection result areas 160d, 160e, and 160f in which the upper left X coordinate x is equal to or greater than the central value xc. Find the average distance. This average value is called the right average distance.

また、検出対象画像特定部９０は、左上Ｘ座標ｘが中心値ｘｃ未満である複数の検出結果領域１６０ａ，１６０ｂ，１６０ｃの左上Ｘ座標ｘと中心値ｘｃとの間の距離の平均値を求める。この平均値を左側平均距離と呼ぶ。 Further, the detection target image specifying unit 90 obtains an average value of the distances between the upper left X coordinate x and the central value xc of the plurality of detection result regions 160a, 160b, and 160c whose upper left X coordinate x is less than the central value xc. . This average value is called the left average distance.

また、検出対象画像特定部９０は、左上Ｙ座標ｙが中心値ｙｃ以上である複数の検出結果領域１６０ａ，１６０ｂ，１６０ｃ，１６０ｆの左上Ｙ座標ｙと中心値ｙｃとの間の距離の平均値を求める。この平均値を上側平均距離と呼ぶ。 The detection target image specifying unit 90 also calculates an average value of the distances between the upper left Y coordinate y and the central value yc of the plurality of detection result areas 160a, 160b, 160c, and 160f in which the upper left Y coordinate y is equal to or greater than the central value yc. Ask for. This average value is called the upper average distance.

そして、検出対象画像特定部９０は、左上Ｙ座標ｙが中心値ｙｃ未満である複数の検出結果領域１６０ｄ，１６０ｅの左上Ｙ座標ｙと中心値ｙｃとの間の距離の平均値を求める。この平均値を下側平均距離と呼ぶ。 Then, the detection target image specifying unit 90 calculates an average value of the distances between the upper left Y coordinate y and the central value yc of the plurality of detection result areas 160d and 160e whose upper left Y coordinate y is less than the central value yc. This average value is called the lower average distance.

検出対象画像特定部９０は、中心値ｘｃに対して右側平均距離を足し合わせて得られるＸ座標を、処理対象画像において対象グループ６００に対応する顔画像２０ｚであると思われる矩形領域（以後、「検出画像領域」と呼ぶ）の右端のＸ座標とする。 The detection target image specifying unit 90 uses an X coordinate obtained by adding the right average distance to the center value xc as a rectangular area (hereinafter referred to as a face image 20z corresponding to the target group 600 in the processing target image). The X coordinate of the right end of “detected image region”).

また、検出対象画像特定部９０は、中心値ｘｃから左側平均距離を差し引いて得られるＸ座標を検出画像領域の左端のＸ座標とする。 In addition, the detection target image specifying unit 90 sets the X coordinate obtained by subtracting the left average distance from the center value xc as the X coordinate of the left end of the detection image region.

また、検出対象画像特定部９０は、中心値ｙｃに対して上側平均距離を足し合わせて得られるＹ座標を検出画像領域の上端のＹ座標とする。 In addition, the detection target image specifying unit 90 sets the Y coordinate obtained by adding the upper average distance to the center value yc as the Y coordinate of the upper end of the detection image region.

そして、検出対象画像特定部９０は、中心値ｙｃから下側平均距離を差し引いて得られるＹ座標を検出画像領域の下端のＹ座標とする。 Then, the detection target image specifying unit 90 sets the Y coordinate obtained by subtracting the lower average distance from the center value yc as the Y coordinate of the lower end of the detection image region.

このようにして、対象グループ６００に対応する検出画像領域の右端、左端、上端及び下端の位置が決まると、処理対象画像において対象グループ６００に対応する顔画像２０ｚが特定される。 In this manner, when the positions of the right end, left end, upper end, and lower end of the detection image region corresponding to the target group 600 are determined, the face image 20z corresponding to the target group 600 is specified in the processing target image.

以上のようにして、検出対象画像特定部９０は、グループ分け処理部５で得られる複数の最終グループのそれぞれについて、当該最終グループに対応する顔画像を処理対象画像において特定する。複数の最終グループの数は、処理対象画像に含まれる顔画像の数と一致する可能性が高いことから、本変形例においても、処理対象画像に含まれる各顔画像を個別に特定できる可能性が向上する。よって、顔画像についての検出精度が向上する。 As described above, the detection target image specifying unit 90 specifies, for each of the plurality of final groups obtained by the grouping processing unit 5, the face image corresponding to the final group in the processing target image. Since there is a high possibility that the number of the plurality of final groups matches the number of face images included in the processing target image, there is a possibility that each face image included in the processing target image can be individually identified in this modification as well. Will improve. Therefore, the detection accuracy for the face image is improved.

本変形例のように、出力値マップが使用されずにグループ内の各検出結果領域の位置に基づいて処理対象画像において顔画像が特定される場合には、出力値マップが使用されて処理対象画像において顔画像が特定される際と比較して、顔画像が実際よりも小さく特定される可能性は高いものの、顔検出に必要な処理量を低減することができる。 When the face image is specified in the processing target image based on the position of each detection result area in the group without using the output value map as in this modification, the output value map is used and the processing target Compared to when a face image is specified in an image, although the possibility that the face image is specified smaller than the actual image is high, the amount of processing necessary for face detection can be reduced.

上記において画像検出装置１は詳細に説明されたが、上記した説明は、全ての局面において例示であって、この発明がそれに限定されるものではない。例えば、検出対象画像については、人の顔画像以外の画像であっても良い。また、上述した各種の例は、相互に矛盾しない限り組み合わせて適用可能である。そして、例示されていない無数の変形例が、この発明の範囲から外れることなく想定され得るものと解される。 Although the image detection apparatus 1 has been described in detail above, the above description is illustrative in all aspects, and the present invention is not limited thereto. For example, the detection target image may be an image other than a human face image. The various examples described above can be applied in combination as long as they do not contradict each other. And it is understood that the countless modification which is not illustrated can be assumed without deviating from the scope of the present invention.

１画像検出装置
３検出部
４マップ生成部
５グループ分け処理部
６しきい値調整部
７２値化処理部
８独立領域特定部
９，９０検出対象画像特定部 DESCRIPTION OF SYMBOLS 1 Image detection apparatus 3 Detection part 4 Map production | generation part 5 Grouping process part 6 Threshold adjustment part 7 Binarization process part 8 Independent area | region specific part 9,90 Detection target image specific part

Claims

処理対象画像から検出対象画像を検出する画像検出装置であって、
検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う検出部と、
前記検出結果領域の位置に基づいて、前記検出部で検出された複数の検出結果領域をグループ分けするグループ分け処理部と、
前記検出部での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成するマップ生成部と、
しきい値を用いて前記マップを２値化して２値化マップを生成する２値化処理部と、
前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する独立領域特定部と、
前記独立領域特定部で特定される前記独立領域の数が、前記グループ分け処理部で得られたグループの数と一致するように、前記しきい値を調整するしきい値調整部と、
前記しきい値調整部で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する検出対象画像特定部と
を備える、画像検出装置。 An image detection device for detecting a detection target image from a processing target image,
A detection unit that performs a detection process for detecting, as a detection result region, a region that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
A grouping processing unit for grouping a plurality of detection result regions detected by the detection unit based on the position of the detection result region;
A map generation unit that generates a map indicating a distribution in the processing target image with respect to an accuracy value indicating the certainty as the detection target image based on the detection result in the detection unit;
A binarization processing unit that binarizes the map using a threshold value to generate a binarized map;
Included in the high-accuracy region in the binarized map generated using the threshold value, corresponding to a region where the accuracy value is greater than or greater than the threshold value in the map. An independent area specifying part for specifying an independent area;
A threshold value adjusting unit that adjusts the threshold value so that the number of independent regions specified by the independent region specifying unit matches the number of groups obtained by the grouping processing unit;
Based on the independent region included in the high-accuracy region of the binarized map generated using the threshold adjusted by the threshold adjustment unit, the detection target image in the processing target image An image detection apparatus comprising: a detection target image specifying unit that specifies

請求項１に記載の画像検出装置であって、
前記グループ分け処理部は、前記検出結果領域が前記検出対象画像である確からしさにも基づいて、前記検出部で検出された複数の検出結果領域をグループ分けする、画像検出装置。 The image detection apparatus according to claim 1,
The said grouping process part is an image detection apparatus which groups the some detection result area | region detected by the said detection part based on the probability that the said detection result area | region is the said detection target image.

請求項１及び請求項２のいずれか一つに記載の画像検出装置であって、
前記検出部は、互いに大きさが異なる複数種類の検出枠を用いて前記検出処理を行い、
前記グループ分け処理部は、前記検出結果領域の大きさにも基づいて、前記検出部で検出された複数の検出結果領域をグループ分けする、画像検出装置。 The image detection apparatus according to any one of claims 1 and 2,
The detection unit performs the detection process using a plurality of types of detection frames having different sizes.
The said grouping process part is an image detection apparatus which groups the some detection result area | region detected by the said detection part based on the magnitude | size of the said detection result area | region.

請求項１乃至請求項３のいずれか一つに記載の画像検出装置であって、
前記グループ分け処理部は、
複数種類のグループ数のそれぞれについて、前記検出部で検出された複数の検出結果領域を当該グループ数でグループ分けする処理部と、
前記複数種類のグループ数のそれぞれについて、前記処理部において前記検出部で検出された複数の検出結果領域が当該グループ数でグループ分けされた結果得られる複数のグループの間での分離の程度を示す分離度を求める分離度取得部と、
前記複数種類のグループ数のうち、前記検出部で検出された複数の検出結果領域がそのグループ数でグループ分けされた結果得られる複数のグループの間での前記分離度が最も高いグループ数を、前記しきい値調整部が前記しきい値の調整で使用する使用グループ数として決定する使用グループ数決定部と
を有する、画像検出装置。 The image detection apparatus according to any one of claims 1 to 3,
The grouping processing unit
For each of a plurality of types of group numbers, a processing unit that groups a plurality of detection result areas detected by the detection unit by the number of groups,
For each of the plurality of types of groups, the degree of separation between the plurality of groups obtained as a result of grouping the plurality of detection result areas detected by the detection unit in the processing unit by the number of groups is indicated. A separability obtaining unit for obtaining separability;
Among the plurality of types of groups, the number of groups having the highest degree of separation among a plurality of groups obtained as a result of grouping the plurality of detection result areas detected by the detection unit by the number of groups, An image detection apparatus comprising: a use group number determination unit that determines the number of use groups used by the threshold adjustment unit for adjusting the threshold value.

請求項１乃至請求項４のいずれか一つに記載の画像検出装置であって、
前記検出対象画像は、人の顔画像である、画像検出装置。 An image detection apparatus according to any one of claims 1 to 4, wherein
The image detection apparatus, wherein the detection target image is a human face image.

処理対象画像から検出対象画像を検出する画像検出装置を制御するための制御プログラムであって、
前記画像検出装置に、
（ａ）検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、
（ｂ）前記検出結果領域の位置に基づいて、前記工程（ａ）において検出された複数の検出結果領域をグループ分けする工程と、
（ｃ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成する工程と、
（ｄ）しきい値を用いて前記マップを２値化して２値化マップを生成する工程と、
（ｅ）前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する工程と、
（ｆ）前記工程（ｅ）で特定される前記独立領域の数が、前記工程（ｂ）で得られたグループの数と一致するように、前記しきい値を調整する工程と、
（ｇ）前記工程（ｆ）で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する工程と
を実行させるための制御プログラム。 A control program for controlling an image detection device that detects a detection target image from a processing target image,
In the image detection device,
(A) performing a detection process for detecting, as a detection result area, an area that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
(B) grouping the plurality of detection result areas detected in the step (a) based on the position of the detection result area;
(C) based on the detection result in the step (a), generating a map indicating the distribution in the processing target image with respect to the accuracy value indicating the probability as the detection target image;
(D) binarizing the map using a threshold value to generate a binarized map;
(E) A high-accuracy region in the binarized map generated using the threshold value, corresponding to a region where the accuracy value is greater than or greater than the threshold value in the map. Identifying an independent region included in
(F) adjusting the threshold value so that the number of independent regions specified in the step (e) matches the number of groups obtained in the step (b);
(G) The detection in the processing target image based on the independent area included in the high-accuracy area of the binarization map generated using the threshold value adjusted in the step (f). A control program for executing a step of specifying a target image.

処理対象画像から検出対象画像を検出する画像検出方法であって、
（ａ）検出枠を用いて、前記処理対象画像に対して、当該検出枠と同じサイズの前記検出対象画像である可能性が高い領域を検出結果領域として検出する検出処理を行う工程と、
（ｂ）前記検出結果領域の位置に基づいて、前記工程（ａ）において検出された複数の検出結果領域をグループ分けする工程と、
（ｃ）前記工程（ａ）での検出結果に基づいて、前記検出対象画像としての確からしさを示す確度値についての前記処理対象画像での分布を示すマップを生成する工程と、
（ｄ）しきい値を用いて前記マップを２値化して２値化マップを生成する工程と、
（ｅ）前記マップにおける、前記確度値が前記しきい値以上あるいは当該しきい値よりも大きい領域に対応する、当該しきい値が用いられて生成された前記２値化マップでの高確度領域に含まれる独立領域を特定する工程と、
（ｆ）前記工程（ｅ）で特定される前記独立領域の数が、前記工程（ｂ）で得られたグループの数と一致するように、前記しきい値を調整する工程と、
（ｇ）前記工程（ｆ）で調整された前記しきい値が用いられて生成された前記２値化マップの前記高確度領域に含まれる前記独立領域に基づいて、前記処理対象画像において前記検出対象画像を特定する工程と
を備える、画像検出方法。 An image detection method for detecting a detection target image from a processing target image,
(A) performing a detection process for detecting, as a detection result area, an area that is highly likely to be the detection target image having the same size as the detection frame, using the detection frame;
(B) grouping the plurality of detection result areas detected in the step (a) based on the position of the detection result area;
(C) based on the detection result in the step (a), generating a map indicating the distribution in the processing target image with respect to the accuracy value indicating the probability as the detection target image;
(D) binarizing the map using a threshold value to generate a binarized map;
(E) A high-accuracy region in the binarized map generated using the threshold value, corresponding to a region where the accuracy value is greater than or greater than the threshold value in the map. Identifying an independent region included in
(F) adjusting the threshold value so that the number of independent regions specified in the step (e) matches the number of groups obtained in the step (b);
(G) The detection in the processing target image based on the independent area included in the high-accuracy area of the binarization map generated using the threshold value adjusted in the step (f). An image detection method comprising: specifying a target image.