JP6383639B2

JP6383639B2 - Image processing apparatus and program

Info

Publication number: JP6383639B2
Application number: JP2014220616A
Authority: JP
Inventors: 崇之梅田; 豪入江; 新井　啓之; 啓之新井; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-10-29
Filing date: 2014-10-29
Publication date: 2018-08-29
Anticipated expiration: 2034-10-29
Also published as: JP2016091051A

Description

本発明は、画像処理装置、及び画像処理プログラムに係り、特に、画像から検出対象を検出する画像処理装置、及び画像処理プログラムに関する。 The present invention relates to an image processing apparatus and an image processing program, and more particularly to an image processing apparatus and an image processing program for detecting a detection target from an image.

近年、画像や動画に写る物体を自動的に検出及び識別する技術によって、ＷＥＢ画像検索システムや、実世界の物体をクエリとして商品を検索するシステムなどが実現されている。これらのシステムは、画像に写る物体、例えば、服や動物などを自動的に検出し、対応する商品や物体の名称を提示することを目的としている。ユーザの多様な検索ニーズに答えるためには、物体そのものだけでなく、物体を構成するパーツや要素（例えば服における襟やその形）を自動的に検出する技術が必要となる。そのようなパーツや要素はアトリビュートと呼ばれ、近年、アトリビュート検出の研究開発が進められている。例えば、非特許文献１に記載の技術では、犬の尻尾・足・毛皮、飛行機の翼・窓・金属といった物体を構成するパーツや質感をアトリビュートとして扱っている。また、非特許文献２に記載の技術では、衣服の襟の形、袖の長さ、柄などをアトリビュートとして扱っている。 In recent years, a WEB image search system, a system for searching for goods using a real-world object as a query, and the like have been realized by a technique for automatically detecting and identifying an object appearing in an image or a moving image. These systems are intended to automatically detect an object shown in an image, such as clothes and animals, and present the name of the corresponding product or object. In order to answer various search needs of users, a technique for automatically detecting not only the object itself but also parts and elements (for example, a collar and its shape in clothes) constituting the object is required. Such parts and elements are called attributes, and in recent years, research and development of attribute detection has been promoted. For example, in the technique described in Non-Patent Document 1, parts and textures constituting an object such as a dog's tail / foot / fur and an airplane wing / window / metal are treated as attributes. In the technique described in Non-Patent Document 2, the shape of clothes, the length of sleeves, the pattern, and the like are handled as attributes.

アトリビュートの検出を実現するためには、アトリビュートを検出すべき物体とみなし、従来の物体検出手法を適用することが簡便な方法である。物体検出の方法について従来いくつかの発明がなされ、開示されてきている。 In order to realize the attribute detection, it is a simple method to consider the attribute as an object to be detected and apply a conventional object detection method. Several inventions have been made and disclosed for object detection methods.

例えば、特許文献１に記載の発明では、予め用意した物体のテンプレートと入力画像とをマッチングすることにより、入力画像中の検出対象である物体を検出している。さらに、特許文献１に記載の技術では、入力画像のシーン構成を推定し、前景に対してのみテンプレートを走査することで、精度の良い検出を行っている。 For example, in the invention described in Patent Document 1, an object that is a detection target in an input image is detected by matching an object template prepared in advance with the input image. Furthermore, in the technique described in Patent Document 1, the scene configuration of the input image is estimated, and the template is scanned only with respect to the foreground, thereby performing highly accurate detection.

また、非特許文献３に記載の技術では、学習画像に含まれる検出対象を示す領域が矩形領域として与えられた学習画像から学習した検出器を用いて、画像全体を走査し検出対象を検出している。この方法では、各学習画像における矩形領域に含まれる画素の画素値から、ＨＯＧ（Histograms of Oriented Gradients）等の特徴量を抽出し（非特許文献４）、ＳＶＭ（Support vector machine）等の識別器を用いて検出対象の検出器を学習する。検出時には、画像に対して任意の大きさの矩形をずらしながら当てはめ、各矩形内の領域から抽出した特徴量を学習した検出器に入力し、その矩形内に検出対象が存在する確率（スコア）を得る。そして、閾値を超えたスコアを持つ矩形の位置を最終的な検出結果としている。 In the technique described in Non-Patent Document 3, the entire image is scanned to detect a detection target using a detector that has been learned from a learning image in which a region indicating the detection target included in the learning image is given as a rectangular region. ing. In this method, feature quantities such as HOG (Histograms of Oriented Gradients) are extracted from pixel values of pixels included in a rectangular area in each learning image (Non-Patent Document 4), and a discriminator such as SVM (Support vector machine). Is used to learn the detection target detector. At the time of detection, a rectangle of an arbitrary size is applied to the image while being shifted, and the feature quantity extracted from the area in each rectangle is input to the learned detector, and the probability (score) that the detection target exists in the rectangle Get. The position of a rectangle having a score exceeding the threshold is used as the final detection result.

特開２０１２−１２３５６７号公報JP 2012-123567 A

A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, "Describing objects by their Attributes", In CVPR, pp. 1778−1785, 2009.A. Farhadi, I. Endres, D. Hoiem, and D. Forsyth, "Describing objects by their Attributes", In CVPR, pp. 1778-1785, 2009. H. Chen, A. Gallagher, B. Girod, "Describing Clothing by Semantic Attributes", In ECCV, pp.609-623, 2012.H. Chen, A. Gallagher, B. Girod, "Describing Clothing by Semantic Attributes", In ECCV, pp.609-623, 2012. T. Malisiewicz, A. Gupta, A. A. Efros, "Ensemble of Exemplar-SVMs for Object Detection and Beyond", In ICCV, 2011.T. Malisiewicz, A. Gupta, A. A. Efros, "Ensemble of Exemplar-SVMs for Object Detection and Beyond", In ICCV, 2011. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", CVPR, 2005.N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", CVPR, 2005.

しかしながら、上記非特許文献３に記載の技術では、検出対象を示す領域以外の領域であっても、学習した検出対象の特徴と類似した特徴を持つ領域の場合には、検出対象として誤検出されてしまう場合がある。一般に、アトリビュートは物体を構成するパーツであり、従来の物体検出手法が対象としている物体より小さな領域を検出する必要がある。小さな領域から得られる特徴は情報量が少なくなるため、画像からアトリビュートを検出する場合、物体全体を検出する場合と比較して誤検出が発生し易くなる。 However, in the technique described in Non-Patent Document 3, even in a region other than the region indicating the detection target, a region having a feature similar to the learned feature of the detection target is erroneously detected as the detection target. May end up. Generally, an attribute is a part that constitutes an object, and it is necessary to detect an area smaller than an object that is a target of a conventional object detection method. Since a feature obtained from a small region has a small amount of information, erroneous detection is more likely to occur when detecting an attribute from an image than when detecting an entire object.

本発明は、このような課題に鑑みてなされたものであり、検出対象が物体のパーツや要素のように画像上での領域が小さい場合でも、検出対象を精度良く検出することができる画像処理装置及びプログラムを提供することを目的とする。 The present invention has been made in view of such problems, and image processing capable of accurately detecting a detection target even when the detection target is a small area on the image, such as an object part or element. An object is to provide an apparatus and a program.

上記目的を達成するために、第１の発明に係る画像処理装置は、複数種類の検出対象を含む複数の学習画像の各々から抽出された前記検出対象の種類毎の特徴と前記検出対象の種類とを対応付けて学習した検出器と、入力画像から抽出された特徴とに基づいて、前記入力画像から前記検出対象の候補を種類毎に検出する検出手段と、前記入力画像における前記検出対象の候補の各々の出現位置に対して、前記複数の学習画像の各々における前記検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布を反映させた結果に基づいて、前記検出対象の候補から検出対象を種類毎に特定する特定手段と、を含んで構成されている。 In order to achieve the above object, an image processing apparatus according to the first invention, a plurality of types of detection object of the detection target extracted from each of the plurality of training images including kinds each feature and the detected type of And detecting means for detecting the detection target candidates for each type from the input image, based on the features extracted from the input image, and the detection target of the input image For each appearance position of candidates, a prior distribution in which the appearance position for each type of the detection target in each of the plurality of learning images is learned, and a prior distribution in which a relative positional relationship between different types of detection targets is learned. And specifying means for specifying the detection target for each type from the detection target candidates based on the reflected result.

第１の発明に係る画像処理装置によれば、複数種類の検出対象を含む複数の学習画像の各々から抽出された検出対象の種類毎の特徴と検出対象の種類とを対応付けて学習した検出器と、複数の学習画像の各々における検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布とが予め用意されている。検出手段は、入力画像から抽出された特徴と検出器とに基づいて、入力画像から検出対象の候補を種類毎に検出する。そして、特定手段は、入力画像における検出対象の候補の各々の出現位置に対して、検出対象の種類毎の出現位置の事前分布、及び異なる種類の検出対象間の相対位置関係の事前分布を反映させた結果に基づいて、検出対象の候補から検出対象を特定する。 According to the image processing apparatus according to the first invention, the detection learned in association with characteristics and kinds of the detection target of each type of detection target extracted from each of the plurality of training images including a plurality of types of detection target And a prior distribution in which the appearance position of each type of detection target in each of the plurality of learning images is learned, and a prior distribution in which a relative positional relationship between different types of detection targets is learned . The detection means detects a detection target candidate for each type from the input image based on the feature extracted from the input image and the detector. Then, the specifying unit reflects the prior distribution of the appearance positions for each type of detection target and the prior distribution of the relative positional relationship between different types of detection targets with respect to the appearance positions of the detection target candidates in the input image. Based on the result, the detection target is specified from the detection target candidates.

このように、検出器を用いて入力画像から検出対象を種類毎に検出する際、画像上での検出対象の種類毎の出現位置についての事前分布、及び異なる種類の検出対象間の相対位置関係の事前分布を反映させるため、検出対象が物体のパーツや要素のように画像上での領域が小さい場合でも、検出対象を精度良く検出することができる。 Thus, when detecting the detection target for each type from the input image using the detector, the prior distribution of the appearance position for each type of the detection target on the image and the relative positional relationship between the different types of detection targets for reflecting the prior distribution, it may be detected even when the region on the image as the object parts and elements is small, to detect the detection target accurately.

また、第２の発明に係る画像処理装置は、複数種類の検出対象を含む複数の学習画像の各々における前記検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布に基づいて、入力画像から前記検出対象を種類毎に検出する範囲を特定する特定手段と、前記複数の学習画像の各々から抽出された前記検出対象の種類毎の特徴と前記検出対象の種類とを対応付けて学習した検出器と、前記特定手段により特定された前記範囲から抽出された特徴とに基づいて、前記入力画像から前記検出対象を種類毎に検出する検出手段と、を含んで構成することができる。 The image processing apparatus according to the second aspect of the present invention provides a prior distribution in which the appearance positions for each type of the detection target in each of a plurality of learning images including a plurality of types of detection targets are learned , and between different types of detection targets based on the prior distribution learned the relative positional relationship, specifying means for specifying a range to be detected from the input image for each type of the detection target, it is extracted from each of the plurality of training images of each type of the detection target The detection target is detected for each type from the input image based on a detector learned by associating a feature with the type of the detection target and the feature extracted from the range specified by the specifying unit. And a detecting means.

第２の発明に係る画像処理装置によれば、複数種類の検出対象を含む複数の学習画像の各々から抽出された検出対象の種類毎の特徴と検出対象の種類とを対応付けて学習した検出器と、複数の学習画像の各々における検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布とが予め用意されている。特定手段は、検出対象の出現位置の事前分布、及び異なる種類の検出対象間の相対位置関係の事前分布に基づいて、入力画像から検出対象を種類毎に検出する範囲を特定する。そして、検出手段は、特定手段により特定された範囲から抽出された特徴と検出器とに基づいて、入力画像から検出対象を種類毎に検出する。 According to the image processing apparatus according to the second invention, the detection learned in association with characteristics and kinds of the detection target of each type of detection target extracted from each of the plurality of training images including a plurality of types of detection target And a prior distribution in which the appearance position of each type of detection target in each of the plurality of learning images is learned, and a prior distribution in which a relative positional relationship between different types of detection targets is learned . The specifying unit specifies a range in which the detection target is detected for each type from the input image based on the prior distribution of the appearance position of the detection target and the prior distribution of the relative positional relationship between the different types of detection targets . And a detection means detects a detection target for every kind from an input image based on the feature extracted from the range specified by the specification means, and a detector.

このように、画像上での検出対象の出現位置についての事前分布、及び異なる種類の検出対象間の相対位置関係の事前分布を反映させた範囲から検出対象を種類毎に検出するため、誤検出を低減することができると共に、検出処理の高速化を図ることができる。 In this way, the detection target is detected for each type from the range reflecting the prior distribution of the appearance position of the detection target on the image and the prior distribution of the relative positional relationship between different types of detection target. Can be reduced, and the speed of the detection process can be increased.

また、第１または第２の発明に係る画像処理装置において、前記事前分布を、前記学習画像の各位置における前記検出対象の種類毎の出現確率を値として有する行列、及び一の種類の検出対象から見た他の種類の検出対象の出現確率を値として有する行列に対して、ガウシアンフィルタを用いたぼかし処理を施した行列で表すことができる。これにより、学習画像上での検出対象の出現位置の揺らぎを低減し、より検出精度を向上させることができる。 Further, in the image processing device according to the first or second invention, the prior distribution is a matrix having the appearance probability for each type of the detection target at each position of the learning image as a value , and one type of detection This can be expressed as a matrix obtained by performing a blurring process using a Gaussian filter on a matrix having the appearance probability of another type of detection target as viewed from the target . Thereby, the fluctuation of the appearance position of the detection target on the learning image can be reduced, and the detection accuracy can be further improved.

また、第１または第２の発明に係る画像処理装置は、前記複数の学習画像を用いて、前記検出器及び前記事前分布を学習する学習手段を含んで構成することができる。 The image processing apparatus according to the first or second invention may be configured to include learning means for learning the detector and the prior distribution using the plurality of learning images.

また、第３の発明に係る画像処理プログラムは、コンピュータを、上記の画像処理装置を構成する各手段として機能させるためのプログラムである。 An image processing program according to the third invention is a program for causing a computer to function as each means constituting the above-described image processing apparatus.

以上説明したように、本発明の画像処理装置及びプログラムによれば、検出器を用いて入力画像から検出対象を検出する際、画像上での検出対象の出現位置についての事前分布、及び異なる種類の検出対象間の相対位置関係の事前分布を反映させる。そのため、検出対象が物体のパーツや要素のように画像上での領域が小さい場合でも、検出対象を精度良く検出することができる、という効果が得られる。 As described above, according to the image processing apparatus and program of the present invention, when detecting a detection target from an input image using a detector, the prior distribution of the appearance position of the detection target on the image , and different types prior distribution of the relative positional relationship between the detection object Ru to reflect the. Therefore, even when the detection target is a small area on the image, such as an object part or element, the detection target can be detected with high accuracy.

第１の実施の形態に係る学習処理装置の機能ブロック図である。It is a functional block diagram of the learning processing apparatus which concerns on 1st Embodiment. 学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of a learning process. 第１の実施の形態における事前分布学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the prior distribution learning process in 1st Embodiment. アトリビュート単体の出現位置についての事前分布の学習を説明するための図である。It is a figure for demonstrating learning of the prior distribution about the appearance position of an attribute single-piece | unit. アトリビュート間の相対位置関係についての事前分布の学習を説明するための図である。It is a figure for demonstrating learning of the prior distribution about the relative positional relationship between attributes. アトリビュート単体の出現位置についての事前分布を表す行列の正規化の一例を示す図である。It is a figure which shows an example of normalization of the matrix showing the prior distribution about the appearance position of an attribute single-piece | unit. アトリビュート間の相対位置関係についての事前分布を表す行列の正規化の一例を示す図である。It is a figure which shows an example of normalization of the matrix showing the prior distribution about the relative positional relationship between attributes. 第１の実施の形態における検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the detection process in 1st Embodiment. 第１の実施の形態における事前分布反映処理の一例を示すフローチャートである。It is a flowchart which shows an example of the prior distribution reflection process in 1st Embodiment. 第１の実施の形態における事前分布反映処理の一例を示すフローチャートである。It is a flowchart which shows an example of the prior distribution reflection process in 1st Embodiment. アトリビュート単体の出現位置についての事前分布の反映を説明するための図である。It is a figure for demonstrating reflection of the prior distribution about the appearance position of an attribute single-piece | unit. アトリビュート単体の出現位置についての事前分布の反映を説明するための図である。It is a figure for demonstrating reflection of the prior distribution about the appearance position of an attribute single-piece | unit. アトリビュート間の相対位置関係についての事前分布の反映を説明するための図である。It is a figure for demonstrating reflection of the prior distribution about the relative positional relationship between attributes. 第２の実施の形態に係る学習処理装置の機能ブロック図である。It is a functional block diagram of the learning processing apparatus which concerns on 2nd Embodiment. 第２の実施の形態における検出処理の一例を示すフローチャートである。It is a flowchart which shows an example of the detection process in 2nd Embodiment. 第２の実施の形態における事前分布反映処理の一例を示すフローチャートである。It is a flowchart which shows an example of the prior distribution reflection process in 2nd Embodiment.

以下、図面を参照して本発明の実施の形態を詳細に説明する。以下の各実施の形態では、物体のパーツや要素などのアトリビュートを検出対象として検出する画像処理装置について説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following embodiments, an image processing apparatus that detects attributes such as parts and elements of an object as a detection target will be described.

＜第１の実施の形態＞
第１の実施の形態に係る画像処理装置１０は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチン及び検出処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成される。また、画像処理装置１０は、機能的には、図１に示すように、学習部２０と、検出部４０とを含んだ構成で表すことができる。また、学習部２０は、学習データ入力部２１と、特徴抽出部２２と、検出器学習部２３と、事前分布学習部２４とを含む。また、検出部４０は、画像入力部４１と、特徴抽出部４２と、検出処理部４３と、事前分布反映部４４と、検出結果出力部４５とを含む。なお、特徴抽出部４２及び検出処理部４３は、本発明の検出手段の一例であり、事前分布反映部４４は、本発明の特定手段の一例である。 <First Embodiment>
The image processing apparatus 10 according to the first embodiment is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a learning process routine and a detection process routine described later. Further, the image processing apparatus 10 can be functionally represented by a configuration including a learning unit 20 and a detection unit 40 as shown in FIG. The learning unit 20 includes a learning data input unit 21, a feature extraction unit 22, a detector learning unit 23, and a prior distribution learning unit 24. The detection unit 40 includes an image input unit 41, a feature extraction unit 42, a detection processing unit 43, a prior distribution reflection unit 44, and a detection result output unit 45. The feature extraction unit 42 and the detection processing unit 43 are examples of the detection unit of the present invention, and the prior distribution reflection unit 44 is an example of the specification unit of the present invention.

まず、学習部２０の各部について詳述する。 First, each part of the learning unit 20 will be described in detail.

学習データ入力部２１は、複数の学習データが格納された学習データ・データベース（ＤＢ）３１から学習データを取得し、特徴抽出部２２及び事前分布学習部２４の各々に出力する。 The learning data input unit 21 acquires learning data from a learning data database (DB) 31 in which a plurality of learning data is stored, and outputs the learning data to each of the feature extraction unit 22 and the prior distribution learning unit 24.

ここで、学習データは、検出対象である複数種類のアトリビュートを含む学習画像と、各学習画像中の検出対象の位置情報とのペアで構成される。検出対象の位置情報は、検出対象を示す領域を、例えば画素位置（座標）等で特定した情報である。例えば、検出対象を示す領域を、学習画像内の４点で記述される矩形で表した場合、検出対象の位置情報は、その４点の画素位置で表すことができる。また、検出対象を示す領域以外の領域をマスクしたマスク画像により、検出対象の位置情報を表してもよい。マスク画像は、例えば、マスクされた画素の画素値を０としたものとすることができる。また、学習画像のサイズは全て同一である。なお、サイズが異なる場合は、線形補間やニアレストネイバーなどの方法を用いて、同一サイズにリサイズした画像を学習画像として用いる。 Here, the learning data includes a pair of a learning image including a plurality of types of attributes to be detected and position information of the detection target in each learning image. The position information of the detection target is information specifying an area indicating the detection target by, for example, a pixel position (coordinates). For example, when the area indicating the detection target is represented by a rectangle described by four points in the learning image, the position information of the detection target can be represented by the pixel positions of the four points. The position information of the detection target may be represented by a mask image obtained by masking an area other than the area indicating the detection target. For example, the mask image may have a pixel value of 0 as a masked pixel. Moreover, all the sizes of the learning images are the same. When the sizes are different, an image resized to the same size using a method such as linear interpolation or nearest neighbor is used as a learning image.

特徴抽出部２２は、学習データ入力部２１から出力された学習データに含まれる検出対象の位置情報に基づいて、学習データに含まれる学習画像においてアトリビュートが存在する領域を特定する。そして、特徴抽出部２２は、特定した領域から特徴量を抽出する。抽出する特徴量は、例えば、非特許文献４に記載のＨＯＧ（Histograms of Oriented Gradients）特徴量などの特徴量を抽出することができる。特徴抽出部２２は、抽出した特徴量と、特徴量を抽出した領域に存在するアトリビュートの種類と対応付けて、検出器学習部２３へ出力する。 The feature extraction unit 22 specifies a region where an attribute exists in the learning image included in the learning data based on the position information of the detection target included in the learning data output from the learning data input unit 21. And the feature extraction part 22 extracts the feature-value from the specified area | region. As the feature quantity to be extracted, for example, a feature quantity such as a HOG (Histograms of Oriented Gradients) feature quantity described in Non-Patent Document 4 can be extracted. The feature extraction unit 22 outputs the extracted feature quantity to the detector learning unit 23 in association with the attribute type existing in the region from which the feature quantity is extracted.

検出器学習部２３は、特徴抽出部２２から出力された特徴量を用いて、アトリビュートの種類毎に検出器を学習する。検出器の学習には、例えば、非特許文献３に記載のＥｘｅｍｐｌａｒＳＶＭ（Support vector machine）などの手法を用いることができる。検出器学習部２３は、学習したアトリビュートの種類毎の検出器と、アトリビュートの種類の名称とを対応付けて、検出器ＤＢ３２に格納する。 The detector learning unit 23 learns a detector for each attribute type using the feature amount output from the feature extraction unit 22. For the detector learning, for example, a technique such as an Exemplar SVM (Support vector machine) described in Non-Patent Document 3 can be used. The detector learning unit 23 stores the learned detector for each attribute type in association with the name of the attribute type in the detector DB 32.

事前分布学習部２４は、学習データ入力部２１から出力された学習データに含まれる検出対象の位置情報に基づいて、学習データに含まれる複数の学習画像の各々におけるアトリビュートの出現位置についての事前分布を、アトリビュートの種類毎に学習する。事前分布学習部２４は、学習画像におけるアトリビュート単体の出現位置についての事前分布と、アトリビュート間の相対位置関係についての事前分布、具体的には、他の種類のアトリビュートの出現位置に対する相対的な出現位置についての事前分布とを学習する。事前分布学習部２４は、学習したアトリビュートの種類毎の事前分布と、アトリビュートの種類の名称とを対応付けて、事前分布ＤＢ３３に格納する。 The prior distribution learning unit 24 is based on the position information of the detection target included in the learning data output from the learning data input unit 21, and the prior distribution regarding the appearance positions of the attributes in each of the plurality of learning images included in the learning data. Is learned for each attribute type. The prior distribution learning unit 24 includes a prior distribution for the appearance position of the single attribute in the learning image and a prior distribution for the relative positional relationship between the attributes, specifically, a relative appearance with respect to the appearance position of another type of attribute. Learn prior distribution about position. The prior distribution learning unit 24 associates the learned prior distribution for each attribute type with the name of the attribute type, and stores them in the prior distribution DB 33.

次に、検出部４０の各部について詳述する。 Next, each part of the detection unit 40 will be described in detail.

画像入力部４１は、任意の画像である入力画像３６の入力を受け付け、入力画像３６のサイズが、学習データの学習画像のサイズと異なる場合には、線形補間やニアレストネイバーなどの方法を用いて、学習画像と同一サイズにリサイズする。画像入力部４１は、入力画像３６を特徴抽出部４２へ出力する。 The image input unit 41 receives an input of an input image 36 that is an arbitrary image, and when the size of the input image 36 is different from the size of the learning image of the learning data, a method such as linear interpolation or nearest neighbor is used. Resize to the same size as the learning image. The image input unit 41 outputs the input image 36 to the feature extraction unit 42.

特徴抽出部４２は、画像入力部４１から出力された入力画像３６に対して、任意の大きさの矩形をずらしながら当てはめ、矩形内の領域から、学習部２０の特徴抽出部２２で抽出される特徴量と同様の特徴量を抽出する。特徴抽出部４２は、抽出した特徴量と、特徴量を抽出した領域の位置、すなわち入力画像３６に当てはめた矩形の位置とを対応付けて、検出処理部４３へ出力する。 The feature extraction unit 42 fits the input image 36 output from the image input unit 41 while shifting a rectangle of an arbitrary size, and is extracted by the feature extraction unit 22 of the learning unit 20 from the region within the rectangle. A feature quantity similar to the feature quantity is extracted. The feature extraction unit 42 associates the extracted feature amount with the position of the region from which the feature amount is extracted, that is, the position of the rectangle applied to the input image 36, and outputs it to the detection processing unit 43.

検出処理部４３は、特徴抽出部４２から出力された特徴量及び矩形の位置を取得する。また、検出処理部４３は、検出器ＤＢ３２から、アトリビュートの種類毎の検出器を取得する。そして、検出処理部４３は、特徴抽出部４２で各矩形内の領域から抽出された特徴量を、アトリビュートの種類毎の検出器の各々に入力し、検出器の出力として、アトリビュートの各種類に対する検出スコアを得る。検出スコアは、矩形内の画像が検出対象であるアトリビュートであることの尤もらしさが高いほど、高い値となる。 The detection processing unit 43 acquires the feature amount and the rectangular position output from the feature extraction unit 42. Further, the detection processing unit 43 acquires a detector for each attribute type from the detector DB 32. Then, the detection processing unit 43 inputs the feature amount extracted from the region in each rectangle by the feature extraction unit 42 to each detector for each attribute type, and outputs the detector for each type of attribute. Obtain a detection score. The detection score has a higher value as the likelihood that the image in the rectangle is an attribute to be detected is higher.

検出処理部４３は、検出スコアに基づいて、アトリビュートの種類毎に所定個の検出候補を検出する。例えば、検出処理部４３は、検出スコアが予め定めた閾値以上となる矩形内の画像や、検出スコアの上位所定個に対応する矩形内の画像を検出候補とすることができる。検出処理部４３は、アトリビュートの種類毎の検出候補と、その検出候補の位置情報（矩形の位置情報）と、検出スコアとを対応付けて、事前分布反映部４４へ出力する。 The detection processing unit 43 detects a predetermined number of detection candidates for each type of attribute based on the detection score. For example, the detection processing unit 43 can set an image in a rectangle whose detection score is equal to or greater than a predetermined threshold or an image in a rectangle corresponding to the upper predetermined number of detection scores as detection candidates. The detection processing unit 43 associates the detection candidates for each attribute type, the position information of the detection candidates (rectangular position information), and the detection score, and outputs them to the prior distribution reflection unit 44.

事前分布反映部４４は、検出処理部４３から出力されたアトリビュートの種類毎の検出候補、検出候補の位置、及び検出スコアを取得する。また、事前分布反映部４４は、事前分布ＤＢ３３から、アトリビュートの種類毎の事前分布を取得する。そして、事前分布反映部４４は、検出候補の各々に、その検出候補のアトリビュートの種類に対応した事前分布を反映させ、検出スコアを更新する。事前分布反映部４４は、更新した検出スコアに基づいて、アトリビュートの種類毎の検出候補から検出対象を特定し、その検出対象の位置情報（矩形の位置情報）を、検出結果出力部４５へ出力する。 The prior distribution reflection unit 44 acquires a detection candidate, a position of the detection candidate, and a detection score for each attribute type output from the detection processing unit 43. Further, the prior distribution reflecting unit 44 acquires the prior distribution for each attribute type from the prior distribution DB 33. Then, the prior distribution reflecting unit 44 reflects the prior distribution corresponding to the type of attribute of the detection candidate in each detection candidate, and updates the detection score. The prior distribution reflection unit 44 identifies a detection target from detection candidates for each attribute type based on the updated detection score, and outputs the detection target position information (rectangular position information) to the detection result output unit 45. To do.

検出結果出力部４５は、事前分布反映部４４から出力されたアトリビュートの種類毎の検出対象の位置情報を入力画像３６に紐づけて、検出結果３７として出力する。 The detection result output unit 45 associates the position information of the detection target for each attribute type output from the prior distribution reflection unit 44 with the input image 36 and outputs it as the detection result 37.

次に、第１の実施の形態に係る画像処理装置１０の作用について説明する。画像処理装置１０は、アトリビュートの種類毎の検出器及び事前分布を学習する学習処理と、入力画像３６から検出対象であるアトリビュートを検出する検出処理を実行する。以下、各処理について説明する。 Next, the operation of the image processing apparatus 10 according to the first embodiment will be described. The image processing apparatus 10 executes a learning process for learning a detector and a prior distribution for each attribute type, and a detection process for detecting an attribute as a detection target from the input image 36. Hereinafter, each process will be described.

まず、検出対象である複数種類のアトリビュートを含む学習画像と、各学習画像中の検出対象の位置情報とのペアで構成される複数の学習データが学習データＤＢ３１に格納された状態で、画像処理装置１０が、図２に示す学習処理を実行する。 First, image processing is performed in a state where a plurality of learning data composed of pairs of learning images including a plurality of types of attributes that are detection targets and position information of detection targets in each learning image are stored in the learning data DB 31. The device 10 executes the learning process shown in FIG.

図２に示す学習処理のステップＳ１０で、学習データ入力部２１が、学習データＤＢ３１から学習データを取得し、特徴抽出部２２及び事前分布学習部２４の各々に出力する。 In step S <b> 10 of the learning process shown in FIG. 2, the learning data input unit 21 acquires learning data from the learning data DB 31 and outputs it to each of the feature extraction unit 22 and the prior distribution learning unit 24.

次に、ステップＳ２０で、特徴抽出部２２が、学習データ入力部２１から出力された学習データを取得し、学習データに含まれる検出対象の位置情報に基づいて、学習データに含まれる学習画像においてアトリビュートが存在する領域を特定する。そして、特徴抽出部２２は、特定した領域から特徴量を抽出する。特徴抽出部２２は、抽出した特徴量と、特徴量を抽出した領域に存在するアトリビュートの種類と対応付けて、検出器学習部２３へ出力する。 Next, in step S20, the feature extraction unit 22 acquires the learning data output from the learning data input unit 21, and based on the position information of the detection target included in the learning data, in the learning image included in the learning data. Identifies the area where the attribute exists. And the feature extraction part 22 extracts the feature-value from the specified area | region. The feature extraction unit 22 outputs the extracted feature quantity to the detector learning unit 23 in association with the attribute type existing in the region from which the feature quantity is extracted.

次に、ステップＳ３０で、検出器学習部２３が、特徴抽出部２２から出力された特徴量を用いて、アトリビュートの種類毎に検出器を学習する。検出器学習部２３は、学習したアトリビュートの種類毎の検出器と、アトリビュートの種類の名称とを対応付けて、検出器ＤＢ３２に格納する。 Next, in step S <b> 30, the detector learning unit 23 learns a detector for each attribute type using the feature amount output from the feature extraction unit 22. The detector learning unit 23 stores the learned detector for each attribute type in association with the name of the attribute type in the detector DB 32.

次に、ステップＳ４０で、事前分布学習部２４が、詳細を図３に示す事前分布学習処理を実行し、アトリビュートの種類毎の事前分布を学習する。事前分布学習部２４は、学習したアトリビュートの種類毎の事前分布と、アトリビュートの種類の名称とを対応付けて、事前分布ＤＢ３３に格納し、学習処理は終了する。 Next, in step S40, the prior distribution learning unit 24 executes the prior distribution learning process shown in detail in FIG. 3 to learn the prior distribution for each attribute type. The prior distribution learning unit 24 associates the learned prior distribution for each attribute type with the name of the attribute type and stores them in the prior distribution DB 33, and the learning process ends.

なお、ステップＳ４０の事前分布学習処理は、ステップＳ２０の前に実行してもよいし、ステップＳ２０及びＳ３０の処理と、ステップＳ４０の処理とを、並行して実行してもよい。 The prior distribution learning process in step S40 may be executed before step S20, or the processes in steps S20 and S30 and the process in step S40 may be executed in parallel.

ここで、図３を参照して、事前分布学習処理について詳述する。 Here, the prior distribution learning process will be described in detail with reference to FIG.

ステップＳ４１で、事前分布学習部２４が、学習データ入力部２１から出力された学習データを取得する。ここでは、学習データには、Ｎ枚の学習画像、及び各学習画像に含まれるＭ種類のアトリビュートを示す領域の位置情報が含まれるものとする。また、ここでは、説明の簡易化のため、アトリビュートを示す領域を矩形（長方形）とし、その位置情報を、矩形の対角２点（左上角と右下角）の画素位置で表す場合について説明する。なお、画素位置は、行方向の位置をｘ、列方向の位置をｙ、学習画像の左上角の画素を原点（［１，１］）とし、学習画像の下方向をｘのプラス方向、右方向をｙのプラス方向とする画像座標系における座標［ｘ，ｙ］で表す。 In step S <b> 41, the prior distribution learning unit 24 acquires the learning data output from the learning data input unit 21. Here, it is assumed that the learning data includes N pieces of learning images and position information of areas indicating M types of attributes included in each learning image. Further, here, for the sake of simplification of explanation, a case will be described in which the region indicating the attribute is a rectangle (rectangle) and the position information is represented by pixel positions of two diagonal points (upper left corner and lower right corner) of the rectangle. . The pixel position is x in the row direction, y in the column direction, the pixel in the upper left corner of the learning image as the origin ([1, 1]), and the downward direction in the learning image is the plus direction of x, right This is represented by coordinates [x, y] in the image coordinate system in which the direction is the plus direction of y.

以下、ｎ枚目の学習画像に含まれる種類ｍのアトリビュートを示す矩形領域を、「矩形Ｒ＿ｎｍ」と表記する。ここでは、ｎ＝［１，２，・・・，Ｎ］、及びｍ＝［１，２，・・・，Ｍ］である。また、ｎ枚目の学習画像を「学習画像ｎ」、種類ｍのアトリビュートを「アトリビュートｍ」と表記する。 Hereinafter, the rectangular area indicating the attribute of type m included in the nth learning image is referred to as “rectangle R_nm”. Here, n = [1, 2,..., N] and m = [1, 2,. Also, the nth learning image is denoted as “learning image n”, and the attribute of type m is denoted as “attribute m”.

次に、ステップＳ４２で、学習画像と同様のサイズの零行列Ｐ＿ｉ、及び学習画像の倍のサイズの零行列Ｐ＿ｉｊ（ｉ，ｊ＝［１，２，・・・，Ｍ］）を作成する。本実施の形態では、以下で詳述するように、アトリビュートｉ単体の出現位置についての事前分布を表す行列Ｐ＿ｉ（ｉ＝［１，２，・・・，Ｍ］）と、アトリビュート間の相対位置関係についての事前分布を表す行列Ｐ＿ｉｊ（ｉ，ｊ＝［１，２，…，Ｍ］，ｉ≠ｊ）とを学習する。零行列Ｐ＿ｉ及びＰ＿ｉｊは、これらの事前分布を表す行列Ｐ＿ｉ及びＰ＿ｉｊを初期化したものである。 Next, in step S42, a zero matrix P_i having the same size as the learning image and a zero matrix P_ij (i, j = [1, 2,..., M]) having a size twice that of the learning image are created. In this embodiment, as will be described in detail below, a matrix P_i (i = [1, 2,..., M]) representing a prior distribution of the appearance position of the attribute i alone and the relative position between the attributes. A matrix P_ij (i, j = [1, 2,..., M], i ≠ j) representing a prior distribution regarding the relationship is learned. The zero matrices P_i and P_ij are obtained by initializing the matrices P_i and P_ij representing these prior distributions.

なお、学習画像と同様のサイズの行列とは、学習画像の縦及び横の画素数と行数及び列数が同じ行列である。また、学習画像の倍のサイズの行列とは、ここでは、学習画像の重心画素から上、下、右、左の各方向の画素数を倍にしたサイズである。例えば、学習画像が縦５画素×横５画素のサイズの場合、倍のサイズは、上、下、右、左の各方向へ２画素ずつ拡張した９×９画素である。従って、学習画像の倍のサイズの零行列Ｐ＿ｉｊは、各要素が０の９行９列の行列となる。 The matrix having the same size as the learning image is a matrix having the same number of vertical and horizontal pixels, the number of rows, and the number of columns of the learning image. In addition, the matrix having a size twice as large as the learning image is a size obtained by doubling the number of pixels in each of the upper, lower, right, and left directions from the center pixel of the learning image. For example, when the learning image has a size of 5 pixels in the vertical direction and 5 pixels in the horizontal direction, the double size is 9 × 9 pixels expanded by 2 pixels in each of the upper, lower, right, and left directions. Therefore, the zero matrix P_ij that is twice the size of the learning image is a 9-row 9-column matrix in which each element is 0.

次に、ステップＳ４３で、事前分布学習部２４が、学習画像ｎを特定するためのループ変数ｎを１に初期化する。次に、ステップＳ４４で、事前分布学習部２４が、学習画像ｎに含まれるアトリビュートのうち、処理対象のアトリビュートｉを特定するためのループ変数ｉを１に初期化する。次に、ステップＳ４４で、事前分布学習部２４が、学習画像ｎに含まれるアトリビュートのうち、他のアトリビュートｊを特定するためのループ変数ｊを１に初期化する。 Next, in step S43, the prior distribution learning unit 24 initializes a loop variable n for specifying the learning image n to 1. Next, in step S44, the prior distribution learning unit 24 initializes the loop variable i for specifying the processing target attribute i among the attributes included in the learning image n to 1. Next, in step S44, the prior distribution learning unit 24 initializes a loop variable j for identifying another attribute j among the attributes included in the learning image n to 1.

次に、ステップＳ４６で、事前分布学習部２４が、ｉとｊとが同値か否かを判定する。同値の場合は、ステップＳ４７へ移行し、同値ではない場合には、ステップＳ４８へ移行する。 Next, in step S46, the prior distribution learning unit 24 determines whether i and j are the same value. If it is the same value, the process proceeds to step S47, and if it is not the same value, the process proceeds to step S48.

ステップＳ４７では、事前分布学習部２４が、アトリビュートｉ単体の出現位置についての事前分布を表す行列Ｐ＿ｉを更新する。具体的には、事前分布学習部２４は、矩形Ｒ＿ｎｉ内の画素に対応する行列Ｐ＿ｉの要素を１インクリメントする。 In step S47, the prior distribution learning unit 24 updates the matrix P_i representing the prior distribution for the appearance position of the attribute i alone. Specifically, the prior distribution learning unit 24 increments the element of the matrix P_i corresponding to the pixel in the rectangle R_ni by 1.

一方、ステップＳ４８では、事前分布学習部２４が、アトリビュート間の相対位置関係についての事前分布を表すＰ＿ｉｊを更新する。具体的には、事前分布学習部２４は、アトリビュートｉの出現位置を基準とした他の種類のアトリビュートｊの相対的な出現位置についての事前分布を表す行列Ｐ＿ｉｊを求めるために、矩形Ｒ＿ｎｉの重心画素Ｇ（［ｘ＿ｉＧ，ｙ＿ｉＧ］）から、学習画像ｎの中心画素Ｃ（［ｘ＿ｉＣ，ｙ＿ｉＣ］）までの移動量Ｄを算出する。移動量Ｄは、［（ｘ＿ｉＣ−ｘ＿ｉＧ），（ｙ＿ｉＣ−ｙ＿ｉＧ）］である。 On the other hand, in step S48, the prior distribution learning unit 24 updates P_ij representing the prior distribution regarding the relative positional relationship between attributes. Specifically, the prior distribution learning unit 24 calculates the center of gravity of the rectangle R_ni in order to obtain a matrix P_ij representing the prior distribution of the relative appearance positions of other types of attributes j with reference to the appearance position of the attribute i. A movement amount D from the pixel G ([x_iG, y_iG]) to the center pixel C ([x_iC, y_iC]) of the learning image n is calculated. The movement amount D is [(x_iC-x_iG), (y_iC-y_iG)].

次に、ステップＳ４９で、事前分布学習部２４が、矩形Ｒ＿ｎｊ内の各画素から移動量Ｄだけ移動した位置の画素に対応する行列Ｐ＿ｉｊの要素を１インクリメントする。 Next, in step S49, the prior distribution learning unit 24 increments the element of the matrix P_ij corresponding to the pixel at the position moved by the movement amount D from each pixel in the rectangle R_nj by one.

次に、ステップＳ５０で、事前分布学習部２４が、ループ変数ｊがＭと同値であるか否かを判定する。すなわち、学習画像ｎに含まれるアトリビュートｉについて、アトリビュートｉ単体の出現位置についての事前分布を表す行列Ｐ＿ｉ、及びアトリビュートｊとの相対位置関係についての事前分布を表す行列Ｐ＿ｉｊの更新が終了したか否かを判定する。ｊとＭとが同値ではない場合には、ステップＳ５１へ移行し、ループ変数ｊを１インクリメントして、ステップＳ４６に戻り、ステップＳ４６以降の処理を繰り返す。ｊとＭとが同値の場合には、ステップＳ５２へ移行する。 Next, in step S50, the prior distribution learning unit 24 determines whether or not the loop variable j is the same value as M. That is, for the attribute i included in the learning image n, whether or not the update of the matrix P_i representing the prior distribution for the appearance position of the attribute i alone and the matrix P_ij representing the prior distribution for the relative positional relationship with the attribute j has been completed. Determine whether. If j and M are not the same value, the process proceeds to step S51, the loop variable j is incremented by 1, the process returns to step S46, and the processes after step S46 are repeated. If j and M are the same value, the process proceeds to step S52.

ステップＳ５２では、事前分布学習部２４が、ループ変数ｉがＭと同値であるか否かを判定する。すなわち、学習画像ｎに含まれる全てのアトリビュートについてＰ＿ｉ及びＰ＿ｉｊの更新を終了したか否かを判定する。ｉとＭとが同値ではない場合には、ステップＳ５３へ移行し、ループ変数ｉを１インクリメントして、ステップＳ４５に戻り、ステップＳ４５以降の処理を繰り返す。ｉとＭとが同値の場合には、ステップＳ５４へ移行する。 In step S52, the prior distribution learning unit 24 determines whether or not the loop variable i is the same value as M. That is, it is determined whether or not the updating of P_i and P_ij has been completed for all attributes included in the learning image n. If i and M are not the same value, the process proceeds to step S53, the loop variable i is incremented by 1, the process returns to step S45, and the processes after step S45 are repeated. If i and M are the same value, the process proceeds to step S54.

ステップＳ５４では、事前分布学習部２４が、ループ変数ｎがＮと同値であるか否かを判定する。すなわち、全ての学習画像に対してＰ＿ｉ及びＰ＿ｉｊの更新を終了したか否かを判定する。ｎとＮとが同値ではない場合には、ステップＳ５５へ移行し、ループ変数ｎを１インクリメントして、ステップＳ４４に戻り、ステップＳ４４以降の処理を繰り返す。ｎとＮとが同値の場合には、ステップＳ５６へ移行する。 In step S54, the prior distribution learning unit 24 determines whether or not the loop variable n is equal to N. That is, it is determined whether or not the update of P_i and P_ij has been completed for all learning images. If n and N are not the same value, the process proceeds to step S55, the loop variable n is incremented by 1, the process returns to step S44, and the processes after step S44 are repeated. If n and N are the same value, the process proceeds to step S56.

上記ステップＳ４３〜Ｓ５５の処理の具体例を、図４及び図５を参照して説明する。ここでは、ｎ＝１，２，３、学習画像のサイズ５×５の場合を例に説明する。まず、アトリビュート単体の出現位置についての事前分布を表す行列Ｐ＿ｉの更新について説明する。 A specific example of the processing in steps S43 to S55 will be described with reference to FIGS. Here, a case where n = 1, 2, 3 and a learning image size of 5 × 5 will be described as an example. First, the update of the matrix P_i representing the prior distribution of the appearance position of the single attribute will be described.

図４の左上の図に示すように、学習画像１（ｎ＝１）の画素［１，２］と画素［３，４］とを対角２点とする矩形領域が、矩形Ｒ＿１１として与えられたとする。なお、図４では、画像上の矩形内に含まれる画素を「１」、それ以外の画素を「０」として矩形を表している。以下、図５、図１１〜図１３についても同様である。この場合、ｎ＝１、ｉ＝１、ｊ＝１のループのステップＳ４７において、図４の左下の図に示すように、行列Ｐ＿１の要素［１，２］と要素［３，４］とを対角２点とする範囲に含まれる要素の各々を、１インクリメントする。 As shown in the upper left diagram of FIG. 4, a rectangular area having two diagonal points of the pixel [1, 2] and the pixel [3,4] of the learning image 1 (n = 1) is given as a rectangle R_11. Suppose. In FIG. 4, the rectangle is represented by “1” for the pixels included in the rectangle on the image and “0” for the other pixels. Hereinafter, the same applies to FIGS. 5 and 11 to 13. In this case, in step S47 of the loop of n = 1, i = 1, j = 1, as shown in the lower left diagram of FIG. 4, the elements [1,2] and [3,4] of the matrix P_1 are Each element included in the range having two diagonal points is incremented by one.

次に、図４の中央上の図に示すように、学習画像２（ｎ＝２）の画素［２，３］と画素［４，５］とを対角２点とする矩形領域が、矩形Ｒ＿２１として与えられたとする。この場合、ｎ＝２、ｉ＝１、ｊ＝１のループのステップＳ４７において、行列Ｐ＿１の要素［２，３］と要素［４，５］とを対角２点とする範囲に含まれる要素の各々を、１インクリメントする。従って、ｎ＝１の段階で値が１となっている要素の値は２になる。これにより、行列Ｐ＿１は、図４の中央下の図に示すように更新される。 Next, as shown in the upper center diagram of FIG. 4, a rectangular area having two diagonal points of the pixel [2, 3] and the pixel [4, 5] of the learning image 2 (n = 2) is a rectangle. Assume that R_21 is given. In this case, in step S47 of the loop of n = 2, i = 1, j = 1, elements included in a range in which element [2,3] and element [4,5] of matrix P_1 are two diagonal points Is incremented by one. Therefore, the value of the element whose value is 1 at the stage of n = 1 is 2. As a result, the matrix P_1 is updated as shown in the lower center diagram of FIG.

次に、図４の右上の図に示すように、学習画像３（ｎ＝３）の画素［２，１］と画素［４，３］とを対角２点とする矩形領域が、矩形Ｒ＿３１として与えられたとする。この場合、ｎ＝３、ｉ＝１、ｊ＝１のループのステップＳ４７において、行列Ｐ＿１の要素［２，１］と要素［４，３］とを対角２点とする範囲に含まれる要素の各々を、１インクリメントする。従って、ｎ＝２の段階で値が１となっている要素の値は２、値が２となっている要素の値は３になる。これにより、行列Ｐ＿１は、図４の右下の図に示すように更新される。 Next, as shown in the upper right diagram of FIG. 4, a rectangular region having two diagonal points of the pixel [2, 1] and the pixel [4, 3] of the learning image 3 (n = 3) is a rectangle R_31. Is given as In this case, in step S47 of the loop of n = 3, i = 1, j = 1, elements included in a range in which element [2,1] and element [4,3] of matrix P_1 are two diagonal points Is incremented by one. Therefore, the value of the element whose value is 1 at the stage of n = 2 is 2, and the value of the element whose value is 2 is 3. Thereby, the matrix P_1 is updated as shown in the lower right diagram of FIG.

このように、学習画像におけるアトリビュートの出現位置を示す矩形が重なる領域に対応する行列Ｐ＿１の要素は値が高くなる。 Thus, the value of the element of the matrix P_1 corresponding to the region where the rectangles indicating the appearance positions of the attributes in the learning image overlap is high.

次に、アトリビュート間の相対位置関係についての事前分布を表す行列Ｐ＿ｉｊの更新について説明する。ここでは、他の種類のアトリビュートが、アトリビュート２（ｊ＝２）である場合について説明する。 Next, update of the matrix P_ij representing the prior distribution regarding the relative positional relationship between attributes will be described. Here, a case where the other type of attribute is attribute 2 (j = 2) will be described.

図５の左上段の図に示すように、学習画像１（ｎ＝１）の画素［１，２］と画素［３，４］とを対角２点とする矩形領域が、矩形Ｒ＿１１として与えられ、図５の左中段の図に示すように、学習画像１の画素［３，１］と画素［５，１］とを対角２点とする矩形領域が、矩形Ｒ＿１２として与えられたとする。このとき、ｎ＝１、ｉ＝１、ｊ＝２のループのステップＳ４８において、矩形Ｒ＿１１の重心画素Ｇ（図５中に示す学習画像における太枠の画素）の画素位置は［（１＋３）／２，（２＋４）／２］＝［２，３］と求められる。５×５画素の学習画像の中心画素Ｃの画素位置は［３，３］であるので、移動量Ｄは、［（３−２），（３−３）］＝［＋１，０］と求められる。 As shown in the upper left diagram of FIG. 5, a rectangular area having two diagonal pixels [1, 2] and [3, 4] of the learning image 1 (n = 1) is given as a rectangle R_11. As shown in the middle left diagram of FIG. 5, it is assumed that a rectangular region having two diagonal pixels [3, 1] and [5, 1] of the learning image 1 is given as a rectangle R_12. . At this time, in step S48 of the loop of n = 1, i = 1, j = 2, the pixel position of the center-of-gravity pixel G of the rectangle R_11 (the thick frame pixel in the learning image shown in FIG. 5) is [(1 + 3) / 2, (2 + 4) / 2] = [2, 3]. Since the pixel position of the center pixel C of the 5 × 5 pixel learning image is [3, 3], the movement amount D is obtained as [(3-2), (3-3)] = [+ 1, 0]. It is done.

ここで、Ｐ＿１２は学習画像の倍のサイズ、すなわち、５×５画素のサイズの上、下、右、左方向の各々に２画素ずつ拡張したサイズ（９×９）である。なお、図５に示す行列Ｐ＿１２における太枠の要素は、学習画像の中心画素Ｃに対応する要素である。従って、矩形Ｒ＿１２内の画素を移動量Ｄだけ移動させた画素に対応するＰ＿１２の要素は、［３，１］＋［２，２］＋［＋１，０］＝［６，３］と、［５，１］＋［２，２］＋［＋１，０］＝［８，３］とを対角２点とする範囲に含まれる要素となる。そこで、ｎ＝１、ｉ＝１、ｊ＝２のループのステップＳ４９において、これらの要素の値が１インクリメントされる。 Here, P_12 is a size (9 × 9) that is twice the size of the learning image, that is, a size that is 5 × 5 pixels expanded by 2 pixels in each of the upper, lower, right, and left directions. In addition, the element of the thick frame in the matrix P_12 shown in FIG. 5 is an element corresponding to the center pixel C of the learning image. Therefore, the elements of P_12 corresponding to the pixel in which the pixel in the rectangle R_12 is moved by the movement amount D are [3,1] + [2,2] + [+ 1,0] = [6,3], [ 5,1] + [2,2] + [+ 1,0] = [8,3] are elements included in a range having two diagonal points. Therefore, the values of these elements are incremented by 1 in step S49 of the loop of n = 1, i = 1, j = 2.

同様に、ｎ＝２、ｉ＝１、ｊ＝２のループのステップＳ４８及びＳ４９において、図５の中央上段に示すような矩形Ｒ＿２１と、中央中段に示すような矩形Ｒ＿２２とに基づいて、中央下段に示すように、行列Ｐ＿１２が更新される。また同様に、ｎ＝３、ｉ＝１、ｊ＝２のループのステップＳ４８及びＳ４９において、図５の右上段に示すような矩形Ｒ＿３１と、右中段に示すような矩形Ｒ＿３２とに基づいて、右下段に示すように、行列Ｐ＿１２が更新される。 Similarly, in steps S48 and S49 of the loop of n = 2, i = 1, j = 2, the center R is based on the rectangle R_21 as shown in the upper center of FIG. 5 and the rectangle R_22 as shown in the middle middle. As shown in the lower part, the matrix P_12 is updated. Similarly, in steps S48 and S49 of the loop of n = 3, i = 1, j = 2, based on a rectangle R_31 as shown in the upper right part of FIG. 5 and a rectangle R_32 as shown in the right middle part, As shown in the lower right column, the matrix P_12 is updated.

このように、一方のアトリビュートから見た他のアトリビュートの出現位置を示す矩形が重なる領域に対応する行列Ｐ＿ｉｊの要素は値が高くなる。 Thus, the value of the element of the matrix P_ij corresponding to the region where the rectangles indicating the appearance positions of the other attributes viewed from one attribute overlap is high.

図３に示す事前分布学習処理の説明に戻る。次のステップＳ５６で、事前分布学習部２４が、図６に示すように、全てのｉについて、行列Ｐ＿ｉの全要素の合計値ｓ＿ｉを算出し、行列Ｐ＿ｉの各要素を合計値ｓ＿ｉで割ることにより、行列Ｐ＿ｉを正規化する。同様に、図７に示すように、全てのｉ及びｊについて、行列Ｐ＿ｉｊの全要素の合計値ｓ＿ｉｊを算出し、行列Ｐ＿ｉｊの各要素を合計値ｓ＿ｉｊで割ることにより、行列Ｐ＿ｉｊを正規化する。 Returning to the description of the prior distribution learning process shown in FIG. In the next step S56, the prior distribution learning unit 24 calculates the total value s_i of all elements of the matrix P_i for all i, and divides each element of the matrix P_i by the total value s_i, as shown in FIG. To normalize the matrix P_i. Similarly, as shown in FIG. 7, for all i and j, the total value s_ij of all elements of the matrix P_ij is calculated, and the matrix P_ij is normalized by dividing each element of the matrix P_ij by the total value s_ij. .

次に、ステップＳ５７で、事前分布学習部２４は、全てのｉ及びｊについて、行列Ｐ＿ｉ及びＰ＿ｉｊに対してぼかし処理を行う。学習データとして与えられるＲ＿ｎｍは、それが矩形領域として与えられる場合であっても、マスク画像として与えられる場合であっても、実用上アトリビュートの位置を正確に捉えることは困難である。そのため、学習データにおいて特定されるアトリビュートの位置情報にはゆらぎが存在する。すなわち、Ｒ＿ｎｍが、本来のアトリビュートの位置に対応する画素を含まない場合や、逆にアトリビュート以外に対応する画素を含む場合がある。これらのゆらぎを低減するために、ぼかし処理を行うことで、最終的な検出精度の向上に効果がある。 Next, in step S57, the prior distribution learning unit 24 performs a blurring process on the matrices P_i and P_ij for all i and j. Whether R_nm given as learning data is given as a rectangular area or a mask image, it is practically difficult to accurately grasp the position of the attribute. Therefore, fluctuation exists in the position information of the attribute specified in the learning data. That is, there are cases where R_nm does not include a pixel corresponding to the original attribute position, or conversely includes a pixel corresponding to other than the attribute. In order to reduce these fluctuations, blurring processing is effective in improving the final detection accuracy.

具体的には、ぼかし処理にガウシアンフィルタ（例えばσ＝１０、フィルタサイズを１０×１０等とする）を用いることができる。例えば、フィルタサイズは、アトリビュート毎に学習データの矩形のうち、最小面積を持つ矩形の短辺の１／１０程度を基準にすると、経験的に良好な結果が得られる。 Specifically, a Gaussian filter (for example, σ = 10, filter size 10 × 10, etc.) can be used for the blurring process. For example, if the filter size is based on about 1/10 of the short side of the rectangle having the smallest area among the rectangles of the learning data for each attribute, an empirically good result can be obtained.

次に、ステップＳ５８で、事前分布学習部２４は、行列Ｐ＿ｉをアトリビュートｉ単体の出現位置についての事前分布として、行列Ｐ＿ｉｊをアトリビュート間の相対位置関係についての事前分布として、事前分布ＤＢ３３に格納して、事前分布学習処理を終了する。 Next, in step S58, the prior distribution learning unit 24 stores the matrix P_i in the prior distribution DB 33 as a prior distribution for the appearance position of the attribute i alone and the matrix P_ij as a prior distribution for the relative positional relationship between the attributes. This completes the prior distribution learning process.

次に、上記の学習処理が実行されて、アトリビュート毎の検出器が検出器ＤＢ３２に格納され、アトリビュート毎の事前分布が事前分布ＤＢ３３に格納された状態で、画像処理装置１０が、図８に示す検出処理を実行する。 Next, the learning processing is executed, the detector for each attribute is stored in the detector DB 32, and the prior distribution for each attribute is stored in the prior distribution DB 33, the image processing apparatus 10 is shown in FIG. The detection process shown is executed.

図８に示す検出処理のステップＳ６０で、画像入力部４１が、入力画像３６の入力を受け付け、入力画像３６のサイズが、学習データの学習画像のサイズと異なる場合には、学習画像と同一サイズにリサイズする。そして、画像入力部４１は、入力画像３６を特徴抽出部４２へ出力する。 In step S60 of the detection process illustrated in FIG. 8, the image input unit 41 receives input of the input image 36, and when the size of the input image 36 is different from the size of the learning image of the learning data, the same size as the learning image Resize to Then, the image input unit 41 outputs the input image 36 to the feature extraction unit 42.

次に、ステップＳ７０で、特徴抽出部４２が、入力画像３６に対して、任意の大きさの矩形をずらしながら当てはめ、矩形内の領域から、学習部２０の特徴抽出部２２で抽出される特徴量と同様の特徴量を抽出する。そして、特徴抽出部４２は、抽出した特徴量と、特徴量を抽出した領域の位置、すなわち入力画像３６に当てはめた矩形の位置とを対応付けて、検出処理部４３へ出力する。 Next, in step S70, the feature extraction unit 42 fits the input image 36 while shifting a rectangle of an arbitrary size, and the features extracted by the feature extraction unit 22 of the learning unit 20 from the region within the rectangle. A feature quantity similar to the quantity is extracted. Then, the feature extraction unit 42 associates the extracted feature amount with the position of the region from which the feature amount has been extracted, that is, the position of the rectangle applied to the input image 36, and outputs it to the detection processing unit 43.

次に、ステップＳ８０で、検出処理部４３が、特徴抽出部４２から出力された特徴量及び矩形の位置を取得する。また、検出処理部４３は、検出器ＤＢ３２から、アトリビュートの種類毎の検出器を取得する。そして、検出処理部４３は、特徴抽出部４２で各矩形内の領域から抽出された特徴量を、アトリビュートの種類毎の検出器の各々に入力し、検出器の出力として、アトリビュートの種類毎の検出スコアを得る。 Next, in step S80, the detection processing unit 43 acquires the feature amount and the rectangular position output from the feature extraction unit. Further, the detection processing unit 43 acquires a detector for each attribute type from the detector DB 32. Then, the detection processing unit 43 inputs the feature amount extracted from the region in each rectangle by the feature extraction unit 42 to each detector for each attribute type, and outputs each detector type as an output of the detector. Obtain a detection score.

そして、検出処理部４３は、検出スコアに基づいて、アトリビュートの種類毎に所定個の検出候補を検出する。ここでは、検出スコアの上位Ｋ個に対応する矩形内の画像を検出候補とする場合について説明する。以下では、入力画像３６から検出されたアトリビュートｉの検出候補を「矩形Ｒ＿ｋｉ（ｋ＝［１，２，・・・，Ｋ］），ｉ＝［１，２，・・・，Ｍ］」と表記する。また、矩形Ｒ＿ｋｉについて得られた検出スコアを「検出スコアＳ＿ｋｉ」と表記する。 And the detection process part 43 detects a predetermined number of detection candidates for every kind of attribute based on a detection score. Here, a case where an image in a rectangle corresponding to the top K detection scores is set as a detection candidate will be described. In the following, the detection candidate of the attribute i detected from the input image 36 is “rectangle R_ki (k = [1, 2,..., K]), i = [1, 2,..., M]”. write. The detection score obtained for the rectangle R_ki is referred to as “detection score S_ki”.

検出処理部４３は、アトリビュートの種類毎の検出候補である矩形Ｒ＿ｋｉと、その検出候補の位置情報（矩形の位置情報）と、検出スコアＳ＿ｋｉとを対応付けて、事前分布反映部４４へ出力する。 The detection processing unit 43 associates the rectangle R_ki, which is a detection candidate for each attribute type, the position information of the detection candidate (rectangular position information), and the detection score S_ki, and outputs the associated information to the prior distribution reflecting unit 44. .

次に、ステップＳ９０で、事前分布反映部４４が、詳細を図９及び図１０に示す事前分布反映処理を実行し、検出候補の各々の検出スコアに事前分布を反映させる。 Next, in step S90, the prior distribution reflecting unit 44 executes the prior distribution reflecting process shown in detail in FIGS. 9 and 10, and reflects the prior distribution on each detection score of the detection candidates.

次に、ステップＳ１２０で、検出結果出力部４５が、事前分布反映部４４から出力されたアトリビュートの種類毎の最終検出結果Ｒｐ＿ｉの位置情報を入力画像３６に紐づけて、検出結果３７として出力し、検出処理は終了する。 Next, in step S120, the detection result output unit 45 associates the position information of the final detection result Rp_i for each attribute type output from the prior distribution reflection unit 44 with the input image 36, and outputs the result as the detection result 37. The detection process ends.

ここで、図９及び図１０を参照して、事前分布反映処理について詳述する。 Here, the prior distribution reflection processing will be described in detail with reference to FIGS. 9 and 10.

ステップＳ９１で、事前分布反映部４４が、事前分布ＤＢ３３から、Ｍ種類のアトリビュートの出現位置についての事前分布を表す行列Ｐ＿ｉ（ｉ＝［１，２，・・・，Ｍ］）、及び行列Ｐ＿ｉｊ（ｊ＝［１，２，・・・，Ｍ］，ｉ≠ｊ）を取得する。次に、ステップＳ９２で、事前分布反映部４４が、ループ変数ｉを１に初期化する。 In step S91, the prior distribution reflecting unit 44 receives from the prior distribution DB 33 a matrix P_i (i = [1, 2,..., M]) representing a prior distribution for the appearance positions of M types of attributes, and a matrix P_ij. (J = [1, 2,..., M], i ≠ j) is acquired. Next, in step S92, the prior distribution reflection unit 44 initializes the loop variable i to 1.

次に、ステップＳ９３で、事前分布反映部４４が、検出処理部４３から出力されたアトリビュートｉの検出候補であるＫ個の矩形Ｒ＿ｋｉ、及びその検出スコアＳ＿ｋｉを取得する。次に、ステップＳ９４で、事前分布反映部４４が、ループ変数ｋを１に、最大値スコアＳＰを０に初期化する。 Next, in step S93, the prior distribution reflection unit 44 acquires K rectangles R_ki that are detection candidates for the attribute i output from the detection processing unit 43, and the detection score S_ki thereof. Next, in step S94, the prior distribution reflection unit 44 initializes the loop variable k to 1 and the maximum value score SP to 0.

次に、ステップＳ９５で、事前分布反映部４４が、行列Ｐ＿ｉのＲ＿ｋｉに対応する要素の値を合計した値Ｐｓを算出し、Ｐｓに検出候補Ｒ＿ｋｉの検出スコアＳ＿ｋｉを掛けた値ＳＰ’を算出し、Ｓ＿ｋｉをＳＰ’に更新する。 Next, in step S95, the prior distribution reflection unit 44 calculates a value Ps obtained by summing values of elements corresponding to R_ki of the matrix P_i, and calculates a value SP ′ obtained by multiplying Ps by the detection score S_ki of the detection candidate R_ki. Then, S_ki is updated to SP ′.

次に、ステップＳ９６で、事前分布反映部４４が、上記ステップＳ９５で算出したＳＰ’と最大スコアＳＰとを比較し、ＳＰよりＳＰ’の方が大きいか否かを判定する。ＳＰ’の方が大きい場合には、ステップＳ９７へ移行し、最大スコアＳＰをＳＰ’に更新し、アトリビュートｉの最終検出結果Ｒｐ＿ｉをＲ＿ｋｉに更新し、ステップＳ９８へ移行する。一方、ＳＰ’の方が小さい場合には、ステップＳ２７をスキップして、ステップＳ９８へ移行する。 Next, in step S96, the prior distribution reflection unit 44 compares the SP ′ calculated in step S95 with the maximum score SP, and determines whether SP ′ is greater than SP. If SP 'is larger, the process proceeds to step S97, the maximum score SP is updated to SP', the final detection result Rp_i of attribute i is updated to R_ki, and the process proceeds to step S98. On the other hand, if SP 'is smaller, step S27 is skipped and the process proceeds to step S98.

ステップＳ９８では、事前分布反映部４４が、ループ変数ｋがＫと同値であるか否かを判定する。すなわち、アトリビュートｉについての全ての検出候補に対応する検出スコアＳ＿ｋｉを更新したか否かを判定する。ｋとＫとが同値ではない場合には、ステップＳ９９で、ループ変数ｋを１インクリメントし、ステップＳ９５に戻り、ステップＳ９５以降の処理を繰り返す。ｋとＫとが同値の場合には、ステップＳ１００へ移行する。 In step S98, the prior distribution reflection unit 44 determines whether or not the loop variable k is the same value as K. That is, it is determined whether or not the detection scores S_ki corresponding to all detection candidates for the attribute i have been updated. If k and K are not the same value, the loop variable k is incremented by 1 in step S99, the process returns to step S95, and the processes in and after step S95 are repeated. If k and K are the same value, the process proceeds to step S100.

ステップＳ１００では、事前分布反映部４４が、ループ変数ｉがＭと同値であるか否かを判定する。すなわち、全てのアトリビュートの種類について、最終検出結果Ｒｐが算出されたか否かを判定する。ｉとＭとが同値ではない場合には、ステップＳ１０１へ移行し、ループ変数ｉを１インクリメントして、ステップＳ９３に戻り、ステップＳ９３以降の処理を繰り返す。ｉとＭとが同値の場合には、ステップＳ１０２へ移行する。 In step S100, the prior distribution reflecting unit 44 determines whether or not the loop variable i is the same value as M. That is, it is determined whether or not the final detection result Rp has been calculated for all attribute types. If i and M are not the same value, the process proceeds to step S101, the loop variable i is incremented by 1, the process returns to step S93, and the processes after step S93 are repeated. If i and M are the same value, the process proceeds to step S102.

上記ステップＳ９２〜Ｓ１００の処理の具体例を、図１１及び図１２を参照して説明する。ここでは、ｋ＝１，２の場合を例に説明する。 A specific example of the processing in steps S92 to S100 will be described with reference to FIGS. Here, a case where k = 1 and 2 will be described as an example.

図１１の左図に示すように、アトリビュート１（ｉ＝１）の１番目（ｋ＝１）の検出候補として、入力画像の画素［２，２］と画素［３，４］とを対角２点とする矩形領域が、矩形Ｒ＿１１として与えられたとする。また、この矩形Ｒ＿１１の検出スコアＳ＿１１が０．７８で与えられているとする。この場合、図１１の右図に示すように、行列Ｐ＿１の要素［２，２］と要素［３，４］とを対角２点とする範囲（図１１中の破線内）に含まれる要素の各々の値の合計０．５２がＰｓとして求まる。そして、検出スコアＳ＿１１とＰｓとを掛け合わせた値ＳＰ’が０．４１と算出される。ｉ＝１、ｋ＝１のループにおけるステップＳ９５では、この値ＳＰ’＝０．４１が、矩形Ｒ＿１１の検出スコアＳ＿１１として更新される。 As shown in the left diagram of FIG. 11, pixel [2, 2] and pixel [3,4] of the input image are diagonally set as the first (k = 1) detection candidate of attribute 1 (i = 1). It is assumed that a rectangular area having two points is given as a rectangle R_11. Further, it is assumed that the detection score S_11 of the rectangle R_11 is given as 0.78. In this case, as shown in the right diagram of FIG. 11, the elements included in the range (within the broken line in FIG. 11) in which the elements [2, 2] and [3, 4] of the matrix P_1 are two diagonal points. A total of 0.52 of each value is obtained as Ps. Then, a value SP ′ obtained by multiplying the detection score S_11 by Ps is calculated as 0.41. In step S95 in the loop of i = 1 and k = 1, this value SP ′ = 0.41 is updated as the detection score S_11 of the rectangle R_11.

また、ｉ＝１、ｋ＝１のループにおけるステップＳ９６では、ＳＰ＝０、ＳＰ’＝０．４１であるため、肯定判定されて、ステップＳ９７で、ＳＰがＳＰ’＝０．４１に更新されると共に、アトリビュートｉについての最終検出結果Ｒｐ＿ｉがＲ＿１１に更新される。 In step S96 in the loop of i = 1 and k = 1, since SP = 0 and SP ′ = 0.41, an affirmative determination is made, and SP is updated to SP ′ = 0.41 in step S97. In addition, the final detection result Rp_i for the attribute i is updated to R_11.

次に、図１２の左図に示すように、アトリビュート１（ｉ＝１）の２番目（ｋ＝２）の検出候補として、入力画像の画素［３，３］と画素［４，５］とを対角２点とする矩形領域が、矩形Ｒ＿２１として与えられたとする。また、この矩形Ｒ＿２１の検出スコアＳ＿２１が０．８５で与えられているとする。この場合、図１２の右図に示すように、行列Ｐ＿１の要素［３，３］と要素［４，５］とを対角２点とする範囲（図１２中の破線内）に含まれる要素の各々の値の合計０．３７がＰｓとして求まる。そして、検出スコアＳ＿２１とＰｓとを掛け合わせた値ＳＰ’が０．３１と算出される。ｉ＝１、ｋ＝２のループにおけるステップＳ９５では、この値ＳＰ’＝０．３１が、矩形Ｒ＿２１の検出スコアＳ＿２１として更新される。 Next, as shown in the left diagram of FIG. 12, as the second (k = 2) detection candidate of attribute 1 (i = 1), pixel [3, 3] and pixel [4, 5] of the input image Is assumed to be given as a rectangle R_21. Further, it is assumed that the detection score S_21 of the rectangle R_21 is given by 0.85. In this case, as shown in the right diagram of FIG. 12, the elements included in the range (within the broken line in FIG. 12) in which the elements [3, 3] and [4, 5] of the matrix P_1 are two diagonal points. A total of 0.37 of the respective values is obtained as Ps. Then, a value SP ′ obtained by multiplying the detection score S_21 and Ps is calculated as 0.31. In step S95 in the loop of i = 1 and k = 2, this value SP ′ = 0.31 is updated as the detection score S_21 of the rectangle R_21.

また、ｉ＝１、ｋ＝２のループにおけるステップＳ９６では、ＳＰ＝０．４１、ＳＰ’＝０．３１であるため、否定判定されて、ステップＳ９７がスキップされる。すなわち、ＳＰは、前のループにける０．４１のままであり、アトリビュートｉについての最終検出結果Ｒｐ＿ｉもＲ＿１１のままである。 In step S96 in the loop of i = 1 and k = 2, since SP = 0.41 and SP ′ = 0.31, a negative determination is made and step S97 is skipped. That is, SP remains 0.41 in the previous loop, and the final detection result Rp_i for attribute i also remains R_11.

図９に示す事前分布学習処理の説明に戻る。次のステップＳ１０２で、事前分布反映部４４が、アトリビュート間の相対位置関係についての事前分布も検出結果に反映するか否かを判定する。この判定は、例えば、予め定めた設定に基づいて判定してもよいし、Ｍ＞１の場合には反映させると判定するようにしてもよい。反映させる場合は、図１０のステップＳ１０３へ移行し、反映させない場合には、図１０のステップＳ１１６へ移行する。 Returning to the description of the prior distribution learning process shown in FIG. In next step S102, the prior distribution reflecting unit 44 determines whether or not the prior distribution regarding the relative positional relationship between attributes is also reflected in the detection result. This determination may be made based on, for example, a predetermined setting, or may be determined to be reflected when M> 1. When reflecting, it transfers to step S103 of FIG. 10, and when not reflecting, it transfers to step S116 of FIG.

図１０のステップＳ１０３では、事前分布反映部４４が、ループ変数ｉを１に初期化する。次に、ステップＳ１０４で、事前分布反映部４４が、ループ変数ｊを１に初期化する。次に、ステップＳ１０５で、事前分布反映部４４が、ループ変数ｊがｉと同値であるか否かを判定する。ｉとｊとが同値の場合には、ステップＳ１０６へ移行し、ループ変数ｊを１インクリメントして、ステップＳ１０５に戻る。ｉとｊとが同値ではない場合には、ステップＳ１０７へ移行し、事前分布反映部４４が、ループ変数ｋを１に初期化する。 In step S103 of FIG. 10, the prior distribution reflection unit 44 initializes the loop variable i to 1. Next, in step S104, the prior distribution reflection unit 44 initializes the loop variable j to 1. Next, in step S105, the prior distribution reflection unit 44 determines whether or not the loop variable j is the same value as i. If i and j are the same value, the process proceeds to step S106, the loop variable j is incremented by 1, and the process returns to step S105. If i and j are not the same value, the process proceeds to step S107, and the prior distribution reflection unit 44 initializes the loop variable k to 1.

次に、ステップＳ１０８で、事前分布反映部４４が、アトリビュートｊ単体の事前分布を反映させた結果得られた最終検出結果Ｒｐ＿ｊの重心画素Ｇから、入力画像３６の中心画素Ｃまでの移動量Ｄを算出する。 Next, in step S108, the prior distribution reflecting unit 44 moves from the centroid pixel G of the final detection result Rp_j obtained as a result of reflecting the prior distribution of the attribute j alone to the center pixel C of the input image 36. Is calculated.

次に、ステップＳ１０９で、事前分布反映部４４が、矩形Ｒ＿ｋｉ内の各画素から移動量Ｄだけ移動した位置の画素に対応する行列Ｐ＿ｉｊの要素の値を合計した値Ｐｓを算出する。また、事前分布反映部４４は、Ｐｓに検出候補Ｒ＿ｋｉの検出スコアＳ＿ｋｉを掛けた値ＳＰ’を算出し、Ｓ＿ｋｉをＳＰ’で更新する。 Next, in step S109, the prior distribution reflection unit 44 calculates a value Ps obtained by summing the values of the elements of the matrix P_ij corresponding to the pixel at the position moved by the movement amount D from each pixel in the rectangle R_ki. Further, the prior distribution reflection unit 44 calculates a value SP ′ obtained by multiplying Ps by the detection score S_ki of the detection candidate R_ki, and updates S_ki with SP ′.

上記ステップＳ１０８及びＳ１０９の処理の具体例を、図１３を参照して説明する。ここでは、ｋ＝１、ｉ＝１、ｊ＝２の場合を例に説明する。 A specific example of the processing in steps S108 and S109 will be described with reference to FIG. Here, a case where k = 1, i = 1, and j = 2 will be described as an example.

図１３の左上の図に示すように、アトリビュート１（ｉ＝１）の１番目（ｋ＝１）の検出候補である矩形Ｒ＿１１の現段階での検出スコアＳ＿１１が０．４１であるとする。ここで、矩形Ｒ＿１１について、アトリビュート２の位置から見たアトリビュート１の検出位置としての妥当性を反映させることを考える。そこで、矩形Ｒ＿１１を、アトリビュート２の出現位置を基準としたアトリビュート１の相対的な出現位置についての事前分布を表す行列Ｐ＿２１へマッピングする。 As shown in the upper left diagram of FIG. 13, it is assumed that the detection score S_11 at the current stage of the rectangle R_11 that is the first (k = 1) detection candidate of the attribute 1 (i = 1) is 0.41. Here, it is considered that the validity of the detection position of the attribute 1 as seen from the position of the attribute 2 is reflected on the rectangle R_11. Therefore, the rectangle R_11 is mapped to the matrix P_21 representing the prior distribution of the relative appearance position of the attribute 1 with the appearance position of the attribute 2 as a reference.

ここで、Ｐ＿２１は入力画像と同サイズの学習画像の倍のサイズ、すなわち、５×５画素のサイズの上、下、右、左の各方向へ２画素ずつ拡張したサイズ（９×９）である。なお、図１３に示す行列Ｐ＿２１に対応する画素位置Ｒ’＿１１、及び行列Ｐ＿２１おける太枠の要素は、入力画像の中心画素Ｃに対応する要素である。従って、矩形Ｒ＿１１内の画素を移動量Ｄだけ移動させた画素に対応するＰ＿２１の要素は、［２，２］＋［２，２］＋［−２，＋１］＝［２，５］と、［３，４］＋［２，２］＋［−２，＋１］＝［３，７］とを対角２点とする範囲（図１３中の破線内）に含まれる要素となる。そこで、ｋ＝１、ｉ＝１、ｊ＝２のループのステップＳ１０９において、これらの要素の値の合計０．４２がＰｓとして求まる。そして、検出スコアＳ＿１１とＰｓとを掛け合わせた値ＳＰ’が０．１７と算出され、矩形Ｒ＿１１の検出スコアＳ＿１１が０．１７に更新される。 Here, P_21 is a size (9 × 9) which is twice the size of the learning image of the same size as the input image, that is, a size of 5 × 5 pixels and two pixels expanded in the upper, lower, right and left directions. is there. Note that the pixel position R′_11 corresponding to the matrix P_21 illustrated in FIG. 13 and the thick frame element in the matrix P_21 are elements corresponding to the center pixel C of the input image. Therefore, the element of P_21 corresponding to the pixel obtained by moving the pixel in the rectangle R_11 by the movement amount D is [2, 2] + [2, 2] + [− 2, +1] = [2, 5]. [3, 4] + [2,2] + [− 2, + 1] = [3, 7] are elements included in a range having two diagonal points (within a broken line in FIG. 13). Therefore, in step S109 of the loop of k = 1, i = 1, j = 2, a total of 0.42 of these element values is obtained as Ps. Then, a value SP ′ obtained by multiplying the detection score S_11 and Ps is calculated as 0.17, and the detection score S_11 of the rectangle R_11 is updated to 0.17.

図１０に示す事前分布反映処理の説明に戻る。次のステップＳ１１０で、事前分布反映部４４が、ループ変数ｋがＫと同値であるか否かを判定する。すなわち、アトリビュートｉの全ての検出候補に対応する検出スコアＳ＿ｋｉに対して、アトリビュートｊとの相対位置関係についての事前分布を反映させたか否かを判定する。ｋとＫとが同値ではない場合には、ステップＳ１１１へ移行し、ループ変数ｋを１インクリメントして、ステップＳ１０８に戻り、ステップＳ１０８以降の処理を繰り返す。ｋとＫとが同値の場合には、ステップＳ１１２へ移行する。 Returning to the description of the prior distribution reflection process shown in FIG. In next step S110, the prior distribution reflection unit 44 determines whether or not the loop variable k is equal to K. That is, it is determined whether or not the prior distribution regarding the relative positional relationship with the attribute j is reflected on the detection scores S_ki corresponding to all detection candidates of the attribute i. If k and K are not the same value, the process proceeds to step S111, the loop variable k is incremented by 1, the process returns to step S108, and the processes after step S108 are repeated. If k and K are the same value, the process proceeds to step S112.

ステップＳ１１２では、事前分布反映部４４が、ループ変数ｊがＭと同値であるか判定する。すなわち、アトリビュートｉに対して他の全てのアトリビュートとの相対位置関係についての事前分布を反映させたか否かを判定する。ｊとＭとが同値ではない場合には、ステップＳ１０６へ移行し、ループ変数ｊを１インクリメントして、ステップＳ１０５に戻り、ステップＳ１０５以降の処理を繰り返す。ｊとＭとが同値の場合には、ステップＳ１１３へ移行する。 In step S112, the prior distribution reflection unit 44 determines whether the loop variable j is the same value as M. That is, it is determined whether or not the prior distribution of the relative positional relationship with all other attributes is reflected on the attribute i. If j and M are not the same value, the process proceeds to step S106, the loop variable j is incremented by 1, the process returns to step S105, and the processes after step S105 are repeated. If j and M are the same value, the process proceeds to step S113.

ステップＳ１１３では、事前分布反映部４４が、検出候補である矩形Ｒ＿ｋｉのうち、検出スコアＳ＿ｋｉが最大となる矩形Ｒ＿ｋｉを、アトリビュート間の相対位置関係を反映させたアトリビュートｉについての最終検出結果Ｒｐ＿ｉとする。なお、検出スコアＳ＿ｋｉが最大となる矩形Ｒ＿ｋｉを最終検出結果Ｒｐ＿ｉとする場合に限らず、検出スコアＳ＿ｋｉが予め定めた閾値以上となる矩形Ｒ＿ｋｉを最終検出結果Ｒｐ＿ｉとしてもよい。 In step S113, the prior distribution reflecting unit 44 sets the rectangle R_ki having the maximum detection score S_ki among the rectangles R_ki that are detection candidates as the final detection result Rp_i for the attribute i that reflects the relative positional relationship between the attributes. To do. Note that the rectangle R_ki having the maximum detection score S_ki is not limited to the final detection result Rp_i, and a rectangle R_ki having the detection score S_ki equal to or greater than a predetermined threshold may be used as the final detection result Rp_i.

次に、ステップＳ１１４で、事前分布反映部４４が、ループ変数ｉがＭと同値であるか否かを判定する。すなわち、全ての種類のアトリビュートについて、検出候補の検出スコアにアトリビュート間の相対位置関係を反映させたか否かを判定する。ｉとＭとが同値ではない場合には、ステップＳ１１５へ移行し、ループ変数ｉを１インクリメントして、ステップＳ１０４に戻り、ステップＳ１０４以降の処理を繰り返す。ｉとＭとが同値の場合には、ステップＳ１１６へ移行する。 Next, in step S114, the prior distribution reflection unit 44 determines whether or not the loop variable i is the same value as M. That is, for all types of attributes, it is determined whether or not the relative positional relationship between the attributes is reflected in the detection score of the detection candidate. If i and M are not the same value, the process proceeds to step S115, the loop variable i is incremented by 1, the process returns to step S104, and the processes after step S104 are repeated. If i and M are the same value, the process proceeds to step S116.

ステップＳ１１６では、事前分布反映部４４が、上記ステップＳ９７またはＳ１１３で算出した最終検出結果Ｒｐ＿ｉ（ｉ＝［１，２，・・・，Ｍ］）の位置情報を、検出結果出力部４５へ出力し、事前分布反映処理を終了する。 In step S116, the prior distribution reflection unit 44 outputs the position information of the final detection result Rp_i (i = [1, 2,..., M]) calculated in step S97 or S113 to the detection result output unit 45. Then, the prior distribution reflection process is terminated.

以上説明したように、第１の実施の形態に係る画像処理装置によれば、画像から抽出される特徴量だけでなく、その特徴量が検出された位置に対して、画像上での検出対象の出現位置についての事前分布を反映させた検出結果を得る。これにより、検出対象が物体のパーツや要素のように画像上での領域が小さい場合でも、検出対象を精度良く検出することができる。 As described above, according to the image processing apparatus according to the first embodiment, not only the feature amount extracted from the image but also the detection target on the image with respect to the position where the feature amount is detected. The detection result reflecting the prior distribution about the appearance position of is obtained. As a result, even when the detection target is a small area on the image, such as an object part or element, the detection target can be detected with high accuracy.

また、検出結果に反映させる検出対象の出現位置についての事前分布として、異なる種類の検出対象間の画像上での相対位置関係についての事前分布も用いることで、より高精度に検出対象を検出することができる。 In addition, the detection target can be detected with higher accuracy by using the prior distribution of the relative positional relationship on the image between the detection objects of different types as the prior distribution of the appearance position of the detection target to be reflected in the detection result. be able to.

＜第２の実施の形態＞
次に、第２の実施の形態について説明する。第１の実施の形態に係る画像処理装置１０では、入力画像３６全体を走査して、特徴量の抽出及び検出候補の検出を行い、検出候補に対して事前分布を反映させて最終的な検出結果を得る場合について説明した。第２の実施の形態では、特徴抽出部の前段の処理として、事前分布を反映させる場合について説明する。なお、第１の実施の形態に係る画像処理装置１０と同一の構成については、同一符号を付して、詳細な説明を省略する。 <Second Embodiment>
Next, a second embodiment will be described. In the image processing apparatus 10 according to the first embodiment, the entire input image 36 is scanned, feature amounts are extracted and detection candidates are detected, and final detection is performed by reflecting the prior distribution on the detection candidates. The case of obtaining the result has been described. In the second embodiment, a case where a prior distribution is reflected as the previous process of the feature extraction unit will be described. Note that the same components as those of the image processing apparatus 10 according to the first embodiment are denoted by the same reference numerals, and detailed description thereof is omitted.

第２の実施の形態に係る画像処理装置２１０は、ＣＰＵと、ＲＡＭと、後述する学習処理ルーチン及び検出処理ルーチンを実行するためのプログラムを記憶したＲＯＭとを備えたコンピュータで構成される。また、画像処理装置２１０は、機能的には、図１４に示すように、学習部２０と、検出部２４０とを含んだ構成で表すことができる。検出部２４０は、画像入力部４１と、事前分布反映部２４４と、特徴抽出部２４２と、検出処理部２４３と、検出結果出力部４５とを含む。なお、事前分布反映部２４４は、本発明の特定手段の一例であり、特徴抽出部２４２及び検出処理部２４３は、本発明の検出手段の一例である。 An image processing apparatus 210 according to the second embodiment is configured by a computer including a CPU, a RAM, and a ROM that stores a program for executing a learning processing routine and a detection processing routine described later. Further, the image processing apparatus 210 can be functionally represented by a configuration including a learning unit 20 and a detection unit 240 as shown in FIG. The detection unit 240 includes an image input unit 41, a prior distribution reflection unit 244, a feature extraction unit 242, a detection processing unit 243, and a detection result output unit 45. The prior distribution reflecting unit 244 is an example of the specifying unit of the present invention, and the feature extracting unit 242 and the detection processing unit 243 are examples of the detecting unit of the present invention.

学習部２０については、第１の実施の形態に係る画像処理装置１０の学習部２０と同様であるため、以下では、検出部２４０の各部について詳述する。 Since the learning unit 20 is the same as the learning unit 20 of the image processing apparatus 10 according to the first embodiment, each unit of the detection unit 240 will be described in detail below.

事前分布反映部２４４は、事前分布ＤＢ３３に格納された事前分布を取得し、画像入力部４１から出力された入力画像３６に、取得した事前分布を反映させて、入力画像３６から検出対象を検出するための走査範囲を特定する。例えば、事前分布反映部４４は、事前分布を表す行列Ｐ＿ｉにおいて、値が０の要素に対応する入力画像３６の画素以外の画素を含む領域を走査範囲として特定することができる。また、行列Ｐ＿ｉの要素のうち、値が予め定めた閾値以上となる要素に対応する入力画像３６の画素を含む領域を走査範囲として特定することができる。閾値は、例えば、行列Ｐ＿ｉの要素の最大値の半分の値とすることができる。閾値を高く設定すれば走査範囲をより狭めることができ、閾値を設定しない場合には、第１の実施の形態と同様に、入力画像３６全体を走査することになる。 The prior distribution reflecting unit 244 acquires the prior distribution stored in the prior distribution DB 33, reflects the acquired prior distribution in the input image 36 output from the image input unit 41, and detects a detection target from the input image 36. The scanning range for performing the operation is specified. For example, the prior distribution reflecting unit 44 can specify an area including pixels other than the pixels of the input image 36 corresponding to elements having a value of 0 in the matrix P_i representing the prior distribution as the scanning range. In addition, it is possible to specify a region including pixels of the input image 36 corresponding to an element whose value is equal to or greater than a predetermined threshold among the elements of the matrix P_i as the scanning range. The threshold value can be, for example, a half value of the maximum value of the elements of the matrix P_i. If the threshold value is set high, the scanning range can be narrowed. If the threshold value is not set, the entire input image 36 is scanned as in the first embodiment.

特徴抽出部２４２は、事前分布反映部２４４で特定された走査範囲に対して、第１の実施の形態における特徴抽出部４２と同様に、任意の大きさの矩形をずらしながら当てはめ、矩形内の領域から特徴量を抽出する。特徴抽出部２４２は、抽出した特徴量と、特徴量を抽出した領域の位置、すなわち入力画像３６に当てはめた矩形の位置とを対応付けて、検出処理部２４３へ出力する。 The feature extraction unit 242 applies a rectangular of an arbitrary size to the scanning range specified by the prior distribution reflection unit 244 while shifting the rectangle in the same manner as the feature extraction unit 42 in the first embodiment. Extract features from the region. The feature extraction unit 242 associates the extracted feature quantity with the position of the region from which the feature quantity has been extracted, that is, the position of the rectangle applied to the input image 36, and outputs it to the detection processing unit 243.

検出処理部２４３は、特徴抽出部２４２から出力された特徴量と、検出器ＤＢ３２から取得したアトリビュートの種類毎の検出器とに基づいて、各矩形について、アトリビュートの各種類に対する検出スコアを得る。そして、検出処理部２４３は、検出スコアが最大の矩形の位置情報を、検出結果出力部４５へ出力する。 The detection processing unit 243 obtains a detection score for each type of attribute for each rectangle based on the feature amount output from the feature extraction unit 242 and the detector for each type of attribute acquired from the detector DB 32. Then, the detection processing unit 243 outputs the position information of the rectangle with the maximum detection score to the detection result output unit 45.

次に、第２の実施の形態に係る画像処理装置２１０の作用について説明する。画像処理装置２１０は、アトリビュートの種類毎の検出器及び事前分布を学習する学習処理と、入力画像から検出対象であるアトリビュートを検出する検出処理を実行する。学習処理については、第１の実施の形態における学習処理と同様であるため、以下では、検出処理について説明する。なお、第１の実施の形態における検出処理と同様の処理については、同一符号を付して、詳細な説明を省略する。 Next, the operation of the image processing apparatus 210 according to the second embodiment will be described. The image processing apparatus 210 executes a learning process for learning a detector and a prior distribution for each attribute type, and a detection process for detecting an attribute that is a detection target from the input image. Since the learning process is the same as the learning process in the first embodiment, the detection process will be described below. In addition, about the process similar to the detection process in 1st Embodiment, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted.

図１５に示す検出処理のステップＳ６０で、画像入力部４１が、入力画像３６の入力を受け付け、事前分布反映部２４４へ出力する。 In step S <b> 60 of the detection process shown in FIG. 15, the image input unit 41 receives an input of the input image 36 and outputs it to the prior distribution reflecting unit 244.

次に、ステップＳ２９０で、事前分布反映部２４４が、詳細を図１６に示す事前分布反映処理を実行し、事前分布を反映させた走査範囲を特定する。 Next, in step S290, the prior distribution reflecting unit 244 executes the prior distribution reflecting process shown in detail in FIG. 16, and specifies a scanning range in which the prior distribution is reflected.

ここで、図１６を参照して、事前分布反映処理について詳述する。 Here, the prior distribution reflection process will be described in detail with reference to FIG.

ステップＳ２９１で、事前分布反映部２４４が、Ｎ枚の入力画像３６を取得する。次に、ステップＳ２９２で、事前分布反映部２４４が、事前分布ＤＢ３３から、Ｍ種類のアトリビュートそれぞれのアトリビュート単体の出現位置についての事前分布を表す行列Ｐ＿ｉ（ｉ＝［１，２，・・・，Ｍ］）を取得する。 In step S291, the prior distribution reflection unit 244 acquires N input images 36. Next, in step S292, the prior distribution reflecting unit 244 receives from the prior distribution DB 33 a matrix P_i (i = [1, 2,... M]).

次に、ステップＳ２９３で、事前分布反映部２４４が、走査範囲を特定するための、行列Ｐ＿ｉの各要素に対する閾値Ｔを設定する。次に、ステップＳ２９４で、事前分布反映部２４４が、ループ変数ｍを１に初期化する。 Next, in step S293, the prior distribution reflection unit 244 sets a threshold value T for each element of the matrix P_i for specifying the scanning range. Next, in step S294, the prior distribution reflection unit 244 initializes the loop variable m to 1.

次に、ステップＳ２９５で、事前分布反映部２４４が、行列Ｐ＿ｍの各要素の値と閾値Ｔとを比較し、値が閾値Ｔ以下の要素の集合を、集合Ｒ＿ｍとして取得する。 Next, in step S295, the prior distribution reflection unit 244 compares the value of each element of the matrix P_m with the threshold T, and acquires a set of elements whose value is equal to or less than the threshold T as a set R_m.

次に、ステップＳ２９６で、事前分布反映部２４４が、ループ変数ｎを１に初期化する。次に、ステップＳ２９７で、事前分布反映部２４４が、集合Ｒ＿ｍに含まれる要素に対応するｎ枚目の入力画像３６の画素をマスクしたマスク画像Ｉ＿ｎｍを生成する。マスク画像は、例えば、マスクされた画素の値を０、それ以外の画素の値を１にした画像である。 Next, in step S296, the prior distribution reflection unit 244 initializes the loop variable n to 1. Next, in step S297, the prior distribution reflection unit 244 generates a mask image I_nm that masks the pixels of the nth input image 36 corresponding to the elements included in the set R_m. The mask image is, for example, an image in which the value of the masked pixel is 0 and the values of the other pixels are 1.

次に、ステップＳ２９８で、事前分布反映部２４４が、ループ変数ｎがＮと同値であるか否かを判定する。ｎとＮとが同値ではない場合には、ステップＳ２９９へ移行し、ループ変数ｎを１インクリメントして、ステップＳ２９７に戻り、ステップＳ２９７移行の処理を繰り返す。ｎとＮとが同値の場合には、ステップＳ３００へ移行する。 Next, in step S298, the prior distribution reflection unit 244 determines whether or not the loop variable n is equal to N. If n and N are not the same value, the process proceeds to step S299, the loop variable n is incremented by 1, the process returns to step S297, and the process of step S297 is repeated. When n and N are the same value, the process proceeds to step S300.

ステップＳ３００では、事前分布反映部２４４が、ループ変数ｍがＭと同値であるか否かを判定する。ｍとＭとが同値ではない場合には、ステップＳ３０１へ移行し、ループ変数ｍを１インクリメントして、ステップＳ２９５に戻り、ステップＳ２９５以降の処理を繰り返す。ｍとＭとが同値の場合には、ステップＳ３０２へ移行する。 In step S300, the prior distribution reflection unit 244 determines whether or not the loop variable m has the same value as M. If m and M are not the same value, the process proceeds to step S301, the loop variable m is incremented by 1, the process returns to step S295, and the processes after step S295 are repeated. If m and M are the same value, the process proceeds to step S302.

ステップＳ３０２では、事前分布反映部２４４が、上記の処理で得られたｎ×ｍ枚のマスク画像Ｉ＿ｎｍを出力し、事前分布反映処理を終了し、図１５に示す検出処理に戻る。 In step S302, the prior distribution reflection unit 244 outputs n × m mask images I_nm obtained by the above processing, ends the prior distribution reflection processing, and returns to the detection processing illustrated in FIG.

次に、図１５のステップＳ２７０で、特徴抽出部２４２が、事前分布反映部２４４で特定された走査範囲に対して、任意の大きさの矩形をずらしながら当てはめ、矩形内の領域から特徴量を抽出する。具体的には、特徴抽出部２４２は、事前分布反映部２４４から出力されたマスク画像Ｉ＿ｎｍ（ｍ＝［１，２，・・・，Ｍ］）の各々を適用したｎ枚目の入力画像３６から、特徴量Ｆ＿ｎｍ（ｍ＝［１，２，・・・，Ｍ］）を抽出する。なお、入力画像３６にマスク画像Ｉ＿ｎｍを適用するとは、入力画像３６とマスク画像Ｉ＿ｎｍとを対応させたときに、マスク画像Ｉ＿ｎｍでマスクされない領域を走査範囲とすることである。 Next, in step S270 of FIG. 15, the feature extraction unit 242 applies a rectangular amount of an arbitrary size to the scanning range specified by the prior distribution reflection unit 244 while shifting the feature amount from the region in the rectangle. Extract. Specifically, the feature extraction unit 242 applies the nth input image 36 to which each of the mask images I_nm (m = [1, 2,..., M]) output from the prior distribution reflection unit 244 is applied. Then, the feature amount F_nm (m = [1, 2,..., M]) is extracted. Note that applying the mask image I_nm to the input image 36 means that when the input image 36 and the mask image I_nm are associated with each other, a region not masked by the mask image I_nm is set as a scanning range.

特徴抽出部２４２は、Ｎ枚の入力画像３６の全てから、上記のように特徴量Ｆ＿ｎｍを抽出する。そして、特徴抽出部２４２は、抽出した特徴量Ｆ＿ｎｍと、特徴量Ｆ＿ｎｍを抽出した領域の位置、すなわち入力画像３６に当てはめた矩形の位置とを対応付けて、検出処理部２４３へ出力する。 The feature extraction unit 242 extracts the feature amount F_nm from all the N input images 36 as described above. Then, the feature extraction unit 242 associates the extracted feature amount F_nm with the position of the region where the feature amount F_nm is extracted, that is, the position of the rectangle applied to the input image 36, and outputs it to the detection processing unit 243.

次に、ステップＳ２８０で、検出処理部２４３が、特徴抽出部２４２から出力された特徴量と、検出器ＤＢ３２から取得したアトリビュートの種類毎の検出器とに基づいて、各矩形について、アトリビュートの各種類に対する検出スコアを得る。すなわち、検出処理部２４３は、アトリビュートｍについての検出器に特徴量Ｆ＿ｎｍを入力し、ｎ枚目の入力画像３６に当てはめた各矩形内の領域が、アトリビュートｍであることの尤もらしさを示す検出スコアを得る。そして、検出処理部２４３は、入力画像毎、かつアトリビュートの種類毎に、検出スコアが最大の矩形の位置情報を、検出結果出力部４５へ出力する。なお、検出スコアが最大となる矩形に限らず、検出スコアが予め定めた閾値以上となる矩形の位置情報を出力するようにしてもよい。
Next, in step S280, the detection processing unit 243 determines each attribute value for each rectangle based on the feature amount output from the feature extraction unit 242 and the detector for each attribute type acquired from the detector DB 32. Get detection score for type. That is, the detection processing unit 243 inputs the feature amount F_nm to the detector for the attribute m, and detects the likelihood that the area in each rectangle applied to the nth input image 36 is the attribute m. Get a score. Then, the detection processing unit 243 outputs, to the detection result output unit 45, the position information of the rectangle having the maximum detection score for each input image and for each attribute type. Note that the position information of a rectangle with which the detection score is not less than a predetermined threshold value may be output, not limited to the rectangle with the maximum detection score.

次に、ステップＳ１２０で、検出結果出力部４５が、検出処理部２４３から出力された検出結果が最大の矩形の位置情報を入力画像３６に紐づけて、検出結果３７として出力し、検出処理は終了する。 Next, in step S120, the detection result output unit 45 links the position information of the rectangle with the maximum detection result output from the detection processing unit 243 to the input image 36 and outputs it as the detection result 37. finish.

以上説明したように、第２の実施の形態に係る画像処理装置によれば、検出対象の出現位置についての事前分布を反映させて、走査範囲を限定することで、誤検出を低減できる。また、検出器に特徴量を入力して検出スコアを得る回数を減らせることで、高速な処理を実現することができる。なお、第２の実施の形態では、元の入力画像に対してマスク画像Ｉ＿ｎｍの各々を適用したｍ倍の画像に対して特徴量の抽出処理を行うことになる。しかし、実用上事前分布を表す行列Ｐ＿ｉはゼロ要素が多いため、入力画像全体を走査する場合に比べ、走査範囲を少なくすることができるため、高速な検出を行うことができる。 As described above, according to the image processing apparatus according to the second embodiment, false detection can be reduced by limiting the scanning range by reflecting the prior distribution of the appearance position of the detection target. In addition, high-speed processing can be realized by reducing the number of times that the detection score is obtained by inputting the feature amount to the detector. In the second embodiment, feature amount extraction processing is performed on an m-fold image obtained by applying each of the mask images I_nm to the original input image. However, since the matrix P_i representing the prior distribution in practice has many zero elements, the scanning range can be reduced as compared with the case where the entire input image is scanned, so that high-speed detection can be performed.

なお、第２の実施の形態では、アトリビュート単体の出現位置についての事前分布を反映して走査範囲を特定する場合につい説明したが、アトリビュート間の相対位置関係についての事前分布も合わせて反映させるようにしてもよい。この場合、他の種類のアトリビュートｊからみたアトリビュートｉの出現位置についての事前分布Ｐ＿ｊｉ（ｊ＝［１，２，・・・，Ｍ］，ｉ≠ｊ）の各要素に基づいて、アトリビュートｉの走査範囲を特定する。例えば、全てのＰ＿ｊｉで値が０になる要素や、全てのＰ＿ｊｉのうちの半数以上で値が０になる要素や、値の平均値が予め定めた閾値以下となる要素に対応する入力画像の画素を除外した範囲を、アトリビュートｉの走査範囲として特定することができる。 In the second embodiment, the case where the scanning range is specified by reflecting the prior distribution of the appearance position of the single attribute has been described. However, the prior distribution of the relative positional relationship between the attributes is also reflected. It may be. In this case, based on each element of the prior distribution P_ji (j = [1, 2,..., M], i ≠ j) regarding the appearance position of the attribute i viewed from other types of attributes j, the attribute i Specify the scan range. For example, an input image corresponding to an element whose value is 0 for all P_ji, an element whose value is 0 when more than half of all P_ji, or an element whose average value is equal to or less than a predetermined threshold value. A range excluding pixels can be specified as the scanning range of attribute i.

また、第２の実施の形態においても、第１の実施の形態のように、検出結果に事前分布を反映させて最終的な検出結果を得るようにしてもよい。 Also in the second embodiment, the final detection result may be obtained by reflecting the prior distribution in the detection result as in the first embodiment.

また、上記各実施の形態における、学習部２０と検出部４０とを別々の装置として構成してもよい。また、上記の実施の形態では、学習処理装置内に事前分布ＤＢ及び検出器ＤＢを保持する場合について説明したが、検出器ＤＢ及び事前分布ＤＢは、外部の記憶装置に記憶しておいてもよい。この場合、検出処理の際に、画像処理装置が、外部装置からアトリビュートの種類毎の検出器及び事前分布を表す行列を読み込めばよい。 Moreover, you may comprise the learning part 20 and the detection part 40 in said each embodiment as a separate apparatus. In the above embodiment, the case where the prior distribution DB and the detector DB are held in the learning processing apparatus has been described. However, the detector DB and the prior distribution DB may be stored in an external storage device. Good. In this case, at the time of the detection process, the image processing apparatus may read a matrix representing the detector and the prior distribution for each attribute type from the external apparatus.

また、本願明細書中において、プログラムが予めインストールされている形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。コンピュータ読み取り可能な記録媒体とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを提供する形態としてもよい。 In the present specification, the program has been described as being preinstalled. However, the program may be provided by being stored in a computer-readable recording medium. The computer-readable recording medium refers to a storage device such as a portable medium such as a flexible disk, a magneto-optical disk, a ROM, and a CD-ROM, and a hard disk built in the computer system. Further, the program may be provided via a network such as the Internet or a communication line such as a telephone line.

また、上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized by using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

また、上述の画像処理装置は、内部にコンピュータシステムを有しているが、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。 Further, the above-described image processing apparatus has a computer system therein, but the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used.

なお、本発明は、上述した各実施の形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。本実施の形態の主要な特徴を満たす範囲内において、任意の用途と構成を取ることができる。 The present invention is not limited to the above-described embodiments, and various modifications and applications can be made without departing from the gist of the present invention. Arbitrary uses and configurations can be adopted within a range that satisfies the main characteristics of the present embodiment.

１０、２１０画像処理装置
２０学習部
２１学習データ入力部
２２特徴抽出部
２３検出器学習部
２４事前分布学習部
３１学習データ・データベース（ＤＢ）
３２検出器ＤＢ
３３事前分布ＤＢ
３６入力画像
３７検出結果
４０、２４０検出部
４１画像入力部
４２、２４２特徴抽出部
４３、２４３検出処理部
４４、２４４事前分布反映部
４５検出結果出力部 DESCRIPTION OF SYMBOLS 10,210 Image processing apparatus 20 Learning part 21 Learning data input part 22 Feature extraction part 23 Detector learning part 24 Prior distribution learning part 31 Learning data database (DB)
32 Detector DB
33 Prior distribution DB
36 Input image 37 Detection result 40, 240 Detection unit 41 Image input unit 42, 242 Feature extraction unit 43, 243 Detection processing unit 44, 244 Prior distribution reflection unit 45 Detection result output unit

Claims

複数種類の検出対象を含む複数の学習画像の各々から抽出された前記検出対象の種類毎の特徴と前記検出対象の種類とを対応付けて学習した検出器と、入力画像から抽出された特徴とに基づいて、前記入力画像から前記検出対象の候補を種類毎に検出する検出手段と、
前記入力画像における前記検出対象の候補の各々の出現位置に対して、前記複数の学習画像の各々における前記検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布を反映させた結果に基づいて、前記検出対象の候補から検出対象を種類毎に特定する特定手段と、
を含む画像処理装置。 A detector learned in association with characteristics and kinds of the detection target of each type of the plurality of the detection object extracted from each of the learning image including a plurality of types of detection target, the features extracted from the input image Based on the detection means for detecting the detection target candidate for each type from the input image,
Prior distribution of learning the appearance position for each type of the detection target in each of the plurality of learning images with respect to the appearance position of each of the detection target candidates in the input image , and the relative between the detection targets of different types Based on the result of reflecting the prior distribution in which the positional relationship has been learned, specifying means for specifying the detection target for each type from the detection target candidates;
An image processing apparatus.

複数種類の検出対象を含む複数の学習画像の各々における前記検出対象の種類毎の出現位置を学習した事前分布、及び異なる種類の検出対象間の相対位置関係を学習した事前分布に基づいて、入力画像から前記検出対象を種類毎に検出する範囲を特定する特定手段と、
前記複数の学習画像の各々から抽出された前記検出対象の種類毎の特徴と前記検出対象の種類とを対応付けて学習した検出器と、前記特定手段により特定された前記範囲から抽出された特徴とに基づいて、前記入力画像から前記検出対象を種類毎に検出する検出手段と、
を含む画像処理装置。 Input based on a prior distribution in which the appearance position of each type of detection target in each of a plurality of types of learning images including a plurality of types of detection targets is learned, and a prior distribution in which a relative positional relationship between different types of detection targets is learned. A specifying means for specifying a range for detecting the detection target for each type from an image;
A feature that is extracted from each of the plurality of learning images, a feature that is learned by associating a feature for each type of the detection target with a type of the detection target , and a feature that is extracted from the range specified by the specifying unit And detecting means for detecting the detection object for each type from the input image,
An image processing apparatus.

前記事前分布を、前記学習画像の各位置における前記検出対象の種類毎の出現確率を値として有する行列、及び一の種類の検出対象から見た他の種類の検出対象の出現確率を値として有する行列に対して、ガウシアンフィルタを用いたぼかし処理を施した行列で表す請求項１また請求項２記載の画像処理装置。 A matrix having as a value an appearance probability for each type of the detection target at each position of the learning image , and an appearance probability of another type of detection target viewed from one type of detection target as the value. 3. The image processing apparatus according to claim 1, wherein the image processing apparatus is represented by a matrix obtained by performing a blurring process using a Gaussian filter on the matrix.

コンピュータを、請求項１〜請求項３のいずれか１項記載の画像処理装置を構成する各手段として機能させるための画像処理プログラム。 The image processing program for functioning a computer as each means which comprises the image processing apparatus of any one of Claims 1-3.