JP4964798B2

JP4964798B2 - Image dictionary generating device, image dictionary generating method, image dictionary generating program and recording medium thereof

Info

Publication number: JP4964798B2
Application number: JP2008031210A
Authority: JP
Inventors: 泳青孫; 聡嶌田; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2008-02-13
Filing date: 2008-02-13
Publication date: 2012-07-04
Anticipated expiration: 2028-02-13
Also published as: JP2009193183A

Description

本発明は，画像に対して自動的に画像の意味を示すラベルを付与するようなときに用いる画像辞書を生成する技術に係り，特に，画像の領域情報を用いて画像辞書を生成する装置，方法，プログラムおよびそのプログラムの記録媒体に関する。 The present invention relates to a technique for generating an image dictionary used when automatically giving a label indicating the meaning of an image to an image, and in particular, an apparatus for generating an image dictionary using image area information, The present invention relates to a method, a program, and a recording medium for the program.

従来の画像辞書生成方法として，次のような方法がある。 As a conventional image dictionary generation method, there are the following methods.

（１）まず，ある意味に関する画像群を学習データとして収集する。次に，学習データから色，テクスチャ，形状などの特徴量（Ｌ個）を別々に抽出する。最後に，学習手法を用いて，個々の特徴空間において特徴識別モデル（Ｌ個）を構築する。以上の処理により，学習データから求めた特徴識別モデル（Ｌ個）と各モデルの重み付け係数で構成した画像辞書を生成することができる（非特許文献１参照）。 (1) First, an image group related to a certain meaning is collected as learning data. Next, feature quantities (L) such as color, texture, and shape are separately extracted from the learning data. Finally, a feature identification model (L pieces) is constructed in each feature space using a learning method. By the above processing, an image dictionary composed of feature identification models (L) obtained from learning data and the weighting coefficients of each model can be generated (see Non-Patent Document 1).

（２）画像における画像の意味を表す基本単位は領域と考えられるので，まず，収集した学習データに対して領域分割を行う。次に，学習データの領域群におけるクラスタリングをし，領域数が最大となる領域クラスタを画像の意味を表す最大領域クラスタとして抽出する。最後に，最大領域クラスタから求めた領域モデルを学習することで画像辞書を生成することができる（非特許文献２参照）。 (2) Since the basic unit representing the meaning of an image in an image is considered to be a region, first, region collection is performed on the collected learning data. Next, clustering is performed on the learning data area group, and the area cluster having the maximum number of areas is extracted as the maximum area cluster representing the meaning of the image. Finally, an image dictionary can be generated by learning a region model obtained from the maximum region cluster (see Non-Patent Document 2).

なお，非特許文献３には，本発明の実施例で用いることができる画像を領域分割する方法の一例が記載されている。
A.Yanagawa，S.-F.Chang，L.Kennedy ，and W.Hsu ，“Columbia University's Baseline detectors for 374 LSCOM Semantic Visual Concepts ”，Columbia University ADVENT Technical Report #222-2006-8 ，March 20，2007. Yongqing Sun，Satoshi Shimada ，Masashi Morimoto，“Visual pattern discovery using web images ”，ACM MIR workshop，2006． Yongqing Sun，Ｓhinji Ozawa ，“HIRBIR: A Hierarchical Approach for Region-based Image Retrieval”，ACM Multimedia Systems Journal，10(6): 559-569 (2005) ． Non-Patent Document 3 describes an example of a method for dividing an image that can be used in an embodiment of the present invention.
A.Yanagawa, S.-F.Chang, L.Kennedy, and W.Hsu, “Columbia University's Baseline detectors for 374 LSCOM Semantic Visual Concepts”, Columbia University ADVENT Technical Report # 222-2006-8, March 20, 2007. Yongqing Sun, Satoshi Shimada, Masashi Morimoto, “Visual pattern discovery using web images”, ACM MIR workshop, 2006. Yongqing Sun, Shinji Ozawa, “HIRBIR: A Hierarchical Approach for Region-based Image Retrieval”, ACM Multimedia Systems Journal, 10 (6): 559-569 (2005).

上記の非特許文献１に示されるような画像辞書生成方法は，学習データの画像全体に対する色，テクスチャ，形状などの物理特徴量を用いて画像辞書を求めるので，画像の物理的な特徴と画像の意味との対応関係を明確に規定できない。そのため，精度が低いという問題がある。 In the image dictionary generation method as shown in Non-Patent Document 1 described above, an image dictionary is obtained using physical features such as color, texture, and shape for the entire learning data image. The correspondence with the meaning of cannot be clearly defined. Therefore, there is a problem that accuracy is low.

また，上記の非特許文献２にされるような画像辞書生成方法は，一つの領域クラスタだけで画像をモデル化しているので，複数の領域で構成される画像に対して，精度が悪くなる。例えば，“ビーチ”という画像の意味は，“海”，“太陽”，“砂”などの複数の代表オブジェクトを表した複数の代表領域の集合で表現されるので，一つのオブジェクト（例えば，海）に対応する領域モデルだけで“ビーチ”の意味を表現することは不十分である。 In addition, the image dictionary generation method as described in Non-Patent Document 2 models an image with only one area cluster, so that the accuracy is deteriorated for an image composed of a plurality of areas. For example, the meaning of the image “beach” is expressed by a set of a plurality of representative areas representing a plurality of representative objects such as “sea”, “sun”, “sand”, and so on. It is not sufficient to express the meaning of “beach” only with the area model corresponding to).

本発明は上記問題点の解決を図り，精度のよい画像辞書を生成する手段を提供することを目的とする。 It is an object of the present invention to provide means for solving the above problems and generating an accurate image dictionary.

画像の意味の表現には，以下の二つの観点がある。
（１）画像の意味を表した基本的な単位は画像の領域と考えられる。ここで，画像の領域は，実世界のオブジェクト（例：芝生，人，山など）に対応する。
（２）画像の意味は，画像における代表オブジェクトを表した複数の代表領域で表現される。 There are the following two viewpoints for expressing the meaning of images.
(1) A basic unit representing the meaning of an image is considered to be an image area. Here, the image area corresponds to a real-world object (eg, lawn, person, mountain, etc.).
(2) The meaning of an image is expressed by a plurality of representative areas representing representative objects in the image.

上記の点を踏まえて，本発明は，画像辞書の精度が低いという従来技術の問題を解決するために，学習データから意味をよく表現できる代表オブジェクトに対応する代表領域クラスタを抽出し，個々の代表領域クラスタに対して代表領域クラスタモデルを構築して，各代表領域クラスタモデルに適切な重み付け係数を求めることにより，複数の代表領域クラスタモデルとそれに対応する重み付け係数で構成した画像辞書を生成する手段を設ける。これにより，画像辞書の精度を向上させる。ここで，オブジェクトとは，画像中の意味的にまとまりのある撮像対象のことをいう。 Based on the above points, the present invention extracts representative area clusters corresponding to representative objects that can express their meaning well from learning data in order to solve the problem of the prior art that the accuracy of the image dictionary is low. Generate an image dictionary composed of multiple representative area cluster models and corresponding weighting coefficients by constructing a representative area cluster model for the representative area cluster and finding an appropriate weighting coefficient for each representative area cluster model Means are provided. This improves the accuracy of the image dictionary. Here, an object refers to an imaging target that is semantically organized in an image.

具体的には，本発明は，ある意味に関する画像群を学習データとして取得する学習データ取得手段と，前記収集した学習データに対して領域を分割する領域分割手段と，前記領域分割手段で得られた学習データの領域群に対してクラスタリングを行って，意味をよく表現できる複数オブジェクトに対応する複数個の代表領域クラスタを抽出する代表領域クラスタ抽出手段と，前記抽出した各代表領域クラスタに対し，代表領域クラスタに含まれる画像サンプルを学習データとして代表領域クラスタモデルを構築する代表領域クラスタモデル学習手段と，前記複数の代表領域クラスタモデルの意味に対する重要度を表す重み付け係数を，代表領域クラスモデル毎に算出する重み付け係数算出手段とを用い，これらにより算出した情報から画像辞書を生成する。
Specifically, the present invention is obtained by learning data acquisition means for acquiring an image group relating to a certain meaning as learning data, area dividing means for dividing an area for the collected learning data, and the area dividing means. Clustering is performed on a group of learning data, and representative area cluster extracting means for extracting a plurality of representative area clusters corresponding to a plurality of objects whose meaning can be well expressed, and for each of the extracted representative area clusters, A representative region cluster model learning means for constructing a representative region cluster model using image samples included in the representative region cluster as learning data, and a weighting coefficient indicating the importance of the meanings of the plurality of representative region cluster models for each representative region class model using the weighting coefficient calculating means for calculating the image dictionary from the calculated information by these To generate.

本発明において，前記重み付け係数算出手段では，各代表領域クラスタに属する，代表領域クラスタの領域数と代表領域クラスタにおける分布のばらつきに基づいて，重み付け係数を算出することができる。 In the present invention, the weighting coefficient calculating means can calculate the weighting coefficient based on the number of representative area clusters belonging to each representative area cluster and the variation in distribution in the representative area cluster.

本発明では，学習データから画像の意味を表すオブジェクトに対応する複数の代表領域クラスタを抽出し，個々の代表領域クラスタに対して代表領域クラスタモデルを構築して，各代表領域クラスタに適切な重み付け係数を求めることにより，複数の代表領域クラスタモデルとそれに対応する重み付け係数で構成した画像辞書を生成する手段を設けることで，高精度な画像辞書を生成することができる。 In the present invention, a plurality of representative area clusters corresponding to objects representing the meaning of an image are extracted from the learning data, a representative area cluster model is constructed for each representative area cluster, and an appropriate weight is assigned to each representative area cluster. By obtaining a coefficient, it is possible to generate a highly accurate image dictionary by providing means for generating an image dictionary composed of a plurality of representative area cluster models and corresponding weighting coefficients.

本発明の実施の形態を以下に説明する。本発明の一実施形態に係る画像辞書生成装置の構成例を図１に示す。同図における画像辞書生成装置１０は，学習データ記憶部１００と，学習データ取得部１０１と，領域分割部１０２と，代表領域クラスタ抽出部１０３と，代表領域クラスタモデル学習部１０４と，重み付け係数算出部１０５と，画像辞書記憶部１０６とから構成される。各部の処理内容について，以下に説明する。 Embodiments of the present invention will be described below. FIG. 1 shows a configuration example of an image dictionary generation apparatus according to an embodiment of the present invention. The image dictionary generating apparatus 10 in FIG. 1 includes a learning data storage unit 100, a learning data acquisition unit 101, a region dividing unit 102, a representative region cluster extracting unit 103, a representative region cluster model learning unit 104, and weighting coefficient calculation. Section 105 and an image dictionary storage section 106. The processing contents of each part will be described below.

学習データ記憶部１００は，あらかじめ意味ラベルとそれに関連する画像を手動で収集し，それらを格納する。すなわち，学習データ記憶部１００には，多数の画像と各画像に対して人間が付与した意味ラベルとの対応情報が格納されている。学習データ記憶部１００は，学習データ取得部１０１から意味ラベルを指定した学習データ取得要求に対し，格納している画像群の中から，指定された意味ラベルを持つ画像を収集して，学習データとして学習データ取得部１０１へ出力する。 The learning data storage unit 100 manually collects a semantic label and an image associated therewith in advance and stores them. In other words, the learning data storage unit 100 stores correspondence information between a large number of images and semantic labels assigned by humans to the images. The learning data storage unit 100 collects images having the specified semantic label from the stored image group in response to the learning data acquisition request specifying the semantic label from the learning data acquisition unit 101, and acquires the learning data To the learning data acquisition unit 101.

学習データ取得部１０１は，学習データ記憶部１００に意味ラベルを指定して学習データ取得要求を出すことにより，学習データ記憶部１００から同じ意味ラベルを持つ代表画像を学習データとして取得する。取得した学習データを領域分割部１０２へ出力する。 The learning data acquisition unit 101 acquires a representative image having the same meaning label from the learning data storage unit 100 as learning data by specifying a semantic label in the learning data storage unit 100 and issuing a learning data acquisition request. The acquired learning data is output to the region dividing unit 102.

領域分割部１０２は，学習データ取得部１０１から意味に関する学習データを受け取ると，個々の画像に対して，領域分割を行う。全部の学習データから得られた領域で構成した領域群を代表領域クラスタ抽出部１０３へ出力する。 When the region dividing unit 102 receives learning data about meaning from the learning data acquisition unit 101, the region dividing unit 102 performs region division on each image. A region group composed of regions obtained from all learning data is output to the representative region cluster extraction unit 103.

代表領域クラスタ抽出部１０３は，領域分割部１０２から学習データの領域群を受け取ると，領域群に対してクラスタリングを行い，画像の意味をよく表現できる代表的なオブジェクトに対応する複数の領域クラスタを代表領域クラスタとして抽出する。抽出した複数の代表領域クラスタを代表領域クラスタモデル学習部１０４へ出力する。この処理の詳細については，図３を用いて後述する。 When the representative region cluster extracting unit 103 receives the region group of the learning data from the region dividing unit 102, the representative region cluster extracting unit 103 performs clustering on the region group, and selects a plurality of region clusters corresponding to representative objects that can express the meaning of the image well. Extract as a representative region cluster. The extracted representative region clusters are output to the representative region cluster model learning unit 104. Details of this processing will be described later with reference to FIG.

代表領域クラスタモデル学習部１０４は，代表領域クラスタ抽出部１０３から複数の代表領域クラスタを受け取ると，学習手法により個々の代表領域クラスタに対して，代表領域クラスタモデルを求める。代表領域クラスタと求めた代表領域クラスタモデルを重み付け係数算出部１０５へ出力する。 When the representative region cluster model learning unit 104 receives a plurality of representative region clusters from the representative region cluster extraction unit 103, the representative region cluster model learning unit 104 obtains a representative region cluster model for each representative region cluster by a learning method. The representative area cluster and the obtained representative area cluster model are output to the weighting coefficient calculation unit 105.

重み付け係数算出部１０５は，代表領域クラスタモデル学習部１０４から代表領域クラスタと求めた代表領域クラスタモデルを受け取る。複数の代表領域クラスタを用いて，各代表領域クラスタの重要度に応じた重み付け係数を算出する。代表領域クラスタモデルと算出した重み付け係数を画像辞書記憶部１０６に出力する。この処理の詳細については，図４を用いて後述する。 The weighting coefficient calculation unit 105 receives the representative region cluster model obtained as the representative region cluster from the representative region cluster model learning unit 104. Using a plurality of representative area clusters, a weighting coefficient corresponding to the importance of each representative area cluster is calculated. The representative area cluster model and the calculated weighting coefficient are output to the image dictionary storage unit 106. Details of this processing will be described later with reference to FIG.

画像辞書記憶部１０６は，重み付け係数算出部１０５より受け取った複数の代表領域クラスタモデルとそれらに対応付ける重み付け係数を画像辞書として記憶する。以上の構成により，画像辞書が生成できる。 The image dictionary storage unit 106 stores a plurality of representative area cluster models received from the weighting factor calculation unit 105 and weighting factors associated with them as an image dictionary. With the above configuration, an image dictionary can be generated.

次に，上記の構成における基本動作を説明する。図２は，本発明の一実施形態に係る画像辞書生成装置１０の基本動作を示すフローチャートである。 Next, the basic operation in the above configuration will be described. FIG. 2 is a flowchart showing the basic operation of the image dictionary generation apparatus 10 according to an embodiment of the present invention.

（１）ステップＳ２０１：学習データ取得部１０１は，ある意味に関する学習データを学習データ記憶部１００から取得する。 (1) Step S201: The learning data acquisition unit 101 acquires learning data related to a certain meaning from the learning data storage unit 100.

（２）ステップＳ２０２：次に，領域分割部１０２は，ステップＳ２０１で学習データ取得部１０１が取得した学習データに対して，領域分割を行う。領域分割の手法については，例えば非特許文献３に記載されているような従来の技術を用いればよい。領域分割の手法については種々の方法が知られているので，ここでの詳細な説明は省略する。 (2) Step S202: Next, the area dividing unit 102 performs area division on the learning data acquired by the learning data acquiring unit 101 in step S201. As a method of area division, for example, a conventional technique as described in Non-Patent Document 3 may be used. Since various methods are known for the region division method, a detailed description thereof is omitted here.

（３）ステップＳ２０３：代表領域クラスタ抽象部１０３は，ステップＳ２０２で得られた学習データの領域群において，意味をよく表現できる複数の代表的なオブジェクトに対応する代表領域クラスタを抽出する（図３により後述）。抽出された代表領域クラスタの個数をＭとする。また，抽出された各代表領域クラスタの番号（インデックス）をｍ（ｍ＝１，２，…，Ｍ）とする。 (3) Step S203: The representative region cluster abstraction unit 103 extracts representative region clusters corresponding to a plurality of representative objects that can express the meaning well in the learning data region group obtained in step S202 (FIG. 3). Later). Let M be the number of representative region clusters extracted. Also, the number (index) of each extracted representative area cluster is m (m = 1, 2,..., M).

（４）ステップＳ２０４：代表領域クラスタモデル学習部１０４は，まずｍ＝１として，最初の代表領域クラスタを処理対象として選択する。 (4) Step S204: The representative region cluster model learning unit 104 first sets m = 1 and selects the first representative region cluster as a processing target.

（５）ステップＳ２０５：代表領域クラスタモデル学習部１０４は，代表領域クラスタ毎に，クラスタに属する領域の特徴空間での分布をモデル化する。モデル化の実施例として，学習手法は“Gassian Bayes Classifier”を用いればよい。Gassian Bayes Classifierで求めた代表領域クラスタモデルに関するモデルパラメータは，
・特徴空間における学習データの平均ベクトルυ，
・特徴空間における学習データの分散共分散行列Σ，
であり，次のように算出される。 (5) Step S205: The representative area cluster model learning unit 104 models the distribution in the feature space of the area belonging to the cluster for each representative area cluster. As an example of modeling, “Gassian Bayes Classifier” may be used as a learning method. The model parameters for the representative region cluster model obtained by Gassian Bayes Classifier are
・ Average vector υ of learning data in feature space,
・ Distribution covariance matrix Σ of learning data in feature space,
And is calculated as follows.

ｍ番目の代表領域クラスタに属する領域の個数をＬ個とする。これらの各領域の学習データから得られた特徴量をＸ_jとする（ｊ＝１，２，…，Ｌ）。特徴量Ｘ_jは，ｎ次元の特徴空間におけるベクトルデータとして表されるものである。 Let L be the number of regions belonging to the mth representative region cluster. A feature amount obtained from the learning data of each of these areas is assumed to be X _j (j = 1, 2,..., L). The feature amount X _j is expressed as vector data in an n-dimensional feature space.

平均ベクトルυ：
υ＝Σ_j=1 ^L（Ｘ_j）／Ｌ
分散共分散行列Σ：
Σ＝｛Σ_j=1 ^L（Ｘ_j−υ）（Ｘ_j−υ）^T｝／Ｌ
ここで，Σ_j=1 ^Lｆ（ｊ）は，ｊ＝１からｊ＝Ｌまでのｆ（ｊ）の総和を表す。 Average vector υ:
υ = Σ _{j = 1} ^L (X _j ) / L
Variance covariance matrix Σ:
Σ = {Σ _{j = 1} ^L (X _j −υ) (X _j −υ) ^T } / L
Here, Σ _{j = 1} ^L f (j) represents the total sum of f (j) from j = 1 to j = L.

（６）ステップＳ２０６：すべてのＭ個の代表領域クラスタにおいて，ステップＳ２０５の処理を行ったかを判定する。行っていなければ，ｍ＝ｍ＋１とし，次の代表領域クラスタについて，ステップＳ２０５の処理を繰り返す。Ｍ個の代表領域クラスタについて処理を終えたならば，ステップＳ２０７へ移行する。 (6) Step S206: It is determined whether or not the process of step S205 has been performed for all M representative area clusters. If not, m = m + 1 is set, and the process of step S205 is repeated for the next representative area cluster. When the process is completed for M representative area clusters, the process proceeds to step S207.

（７）ステップＳ２０７：重み付け係数算出部１０５は，代表領域クラスタモデル学習部１０４がステップＳ２０５で求めた複数の代表領域クラスタモデルに対して，各代表領域クラスタモデルに対応付ける重み付け係数を算出する。具体的な算出方法の例は，図４を参照して後述する。 (7) Step S207: The weighting coefficient calculation unit 105 calculates a weighting coefficient associated with each representative area cluster model for the plurality of representative area cluster models obtained by the representative area cluster model learning unit 104 in step S205. An example of a specific calculation method will be described later with reference to FIG.

（８）ステップＳ２０８：ステップＳ２０７で得られたＭ個の代表領域クラスタモデルとそれらに各々対応付ける重み付け係数を，ある意味に関する画像辞書のデータベースとして画像辞書記憶部１０６に格納する。 (8) Step S208: The M representative area cluster models obtained in step S207 and the weighting coefficients corresponding to them are stored in the image dictionary storage unit 106 as an image dictionary database relating to a certain meaning.

図３は，代表領域クラスタ抽出部１０３の処理フローチャートであり，図２のステップＳ２０３の詳細な処理を示している。 FIG. 3 is a process flowchart of the representative area cluster extraction unit 103, and shows the detailed process of step S203 of FIG.

（１）ステップＳ３０１：領域分割部１０２で検出した学習データの領域群を読み込む。 (1) Step S301: A region group of learning data detected by the region dividing unit 102 is read.

（２）ステップＳ３０２：領域毎に特徴量を抽出する。例えば，色ヒストグラムを領域の特徴量として抽出すればよい。 (2) Step S302: A feature amount is extracted for each region. For example, a color histogram may be extracted as a region feature amount.

（３）ステップＳ３０３：クラスタリングを精度よく行うために，学習データの各領域から抽出した特徴量の正規化を行う。ここで，正規化は以下の手法で行えばよい。特徴空間はｎ次元であるとする。 (3) Step S303: In order to perform clustering with high accuracy, the feature quantity extracted from each area of the learning data is normalized. Here, normalization may be performed by the following method. It is assumed that the feature space is n-dimensional.

学習データの画像を領域分割して得られたＲ個の各領域ｉ（ｉ＝１，２，…，Ｒ）の特徴量を（ｘ_i1，…，ｘ_is，…，ｘ_in），ｓ＝１，２，…，ｎとすると，それを正規化した特徴量（ｘ′_i1，…，ｘ′_is…，ｘ′_in）は，次式で求められる。 The feature quantities of R regions i (i = 1, 2,..., R) obtained by dividing the learning data image into regions (x _i1 ,..., X _is ,..., X _in ), s = Assuming that 1, 2,..., N, the feature values (x ′ _i1 ,..., X ′ _is ..., X ′ _in ) normalized thereto are obtained by the following equations.

ｘ′_is＝｛ｘ_is−ｘ_min（ｓ）｝／｛ｘ_max（ｓ）−ｘ_min（ｓ）｝
ここで，ｘ_max（ｓ）は，Ｒ個の領域ｉの中でのｓ番目の特徴量の最大値，ｘ_min（ｓ）はＲ個の領域ｉの中でのｓ番目の特徴量の最小値である。 x ′ _is = {x _is −x _min (s)} / {x _max (s) −x _min (s)}
Here, x _max (s) is the maximum value of the s-th feature quantity in the R areas i, and x _min (s) is the minimum value of the s-th feature quantity in the R areas i. Value.

（４）ステップＳ３０４：正規化後の特徴空間において学習データの領域群に対して，適当なクラスタリング手法を用いてＮ個の領域クラスタに分類する。クラスタリング手法の一例として，従来技術のＦｕｚｚｙＫ−ｍｅａｎｓというアルゴリズムを用いることができる。クラスタリング手法としては，この他にも周知の種々の方法を用いることができる。クラスタリングは，基本的には特徴量が類似する領域群を反覆的または階層的に統合する処理である。 (4) Step S304: The region group of the learning data is classified into N region clusters using an appropriate clustering method in the normalized feature space. As an example of the clustering method, a conventional algorithm called Fuzzy K-means can be used. As the clustering technique, various other well-known methods can be used. Clustering is basically processing for recursively or hierarchically integrating regions having similar feature quantities.

（５）ステップＳ３０５：次に，ステップＳ３０６からＳ３０９までを，各領域クラスタに対して繰り返すことにより，Ｎ個の領域クラスタから代表領域クラスタを選定する。このため，まず，ｎ＝１として，最初の領域クラスタを処理対象として選択する。 (5) Step S305: Next, steps S306 to S309 are repeated for each area cluster to select a representative area cluster from the N area clusters. Therefore, first, n = 1 is set and the first area cluster is selected as a processing target.

（６）ステップＳ３０６：ｎ番目の領域クラスタにおける領域の数が予め設定しておいた一定の閾値以上になっているかを判定し，閾値以上になっている場合には，ステップＳ３０７へ移行する。そうでなければ，ステップＳ３０９に移行する。 (6) Step S306: It is determined whether or not the number of areas in the nth area cluster is equal to or greater than a predetermined threshold value. If it is equal to or greater than the threshold value, the process proceeds to step S307. Otherwise, the process proceeds to step S309.

（７）ステップＳ３０７：領域クラスタにおける領域の平均面積が予め設定しておいた一定の閾値以上になるかを判定し，閾値以上になる場合，ステップＳ３０８へ移行する。そうでなければ，ステップＳ３０９へ移行する。 (7) Step S307: It is determined whether the average area of the regions in the region cluster is equal to or greater than a predetermined threshold value. If the average area is greater than the threshold value, the process proceeds to step S308. Otherwise, the process proceeds to step S309.

（８）ステップＳ３０８：現在処理対象となっているｎ番目の領域クラスタを，代表領域クラスタとして選定する。 (8) Step S308: The nth area cluster currently being processed is selected as a representative area cluster.

（９）ステップＳ３０９：すべてのＮ個の領域クラスタにおいて上記ステップＳ３０６からＳ３０８までの処理を行ったかを判定する。未処理の領域クラスタがあれば，ｎ＝ｎ＋１として，ステップＳ３０６へ移行し，次の領域クラスタについて同様に処理を繰り返す。すべての領域クラスタに対して処理を行ったならば代表領域クラスタの抽出処理を終了する。 (9) Step S309: It is determined whether or not the processing from steps S306 to S308 has been performed for all N area clusters. If there is an unprocessed area cluster, n = n + 1 is set, the process proceeds to step S306, and the same process is repeated for the next area cluster. If the processing is performed for all the region clusters, the representative region cluster extraction processing is terminated.

以上のステップＳ３０１からＳ３０９に至る処理により，Ｎ個の領域クラスタに対して領域クラスタの領域数と領域平均面積が予め設定した閾値以上になるＭ個の領域クラスタを代表領域クラスタとして抽出することができる。なお，ステップＳ３０７においては，領域平均面積ではなく，領域クラスタにおける全領域の面積を所定の閾値との比較対象としてもよい。また，領域の面積は，領域の画素数を単位とした値でもよく，また画像の全面積に対する領域の面積の割合として算出した値でもよい。 By the processing from step S301 to step S309 described above, M region clusters in which the number of region clusters and the region average area are greater than or equal to a preset threshold value for N region clusters can be extracted as representative region clusters. it can. In step S307, the area of all regions in the region cluster may be compared with a predetermined threshold instead of the region average area. Further, the area area may be a value in units of the number of pixels in the area, or may be a value calculated as a ratio of the area of the area to the total area of the image.

ここでは，代表領域クラスタを選定する条件として，領域クラスタにおける領域の数および領域の面積を用いたが，意味をよく表現できる複数オブジェクトに対応する領域クラスタを代表領域クラスタとするための条件として，さらに他の条件を用いてもよい。 Here, the number of regions and the area of the regions in the region cluster are used as the conditions for selecting the representative region cluster. However, as the conditions for making the region cluster corresponding to multiple objects that can express the meaning well as the representative region cluster, Still other conditions may be used.

図４は，重み付け係数算出部１０５の処理フローチャートであり，図２のステップＳ２０７の詳細な処理を示している。 FIG. 4 is a processing flowchart of the weighting coefficient calculation unit 105, and shows detailed processing in step S207 of FIG.

（１）ステップＳ４０１：重み付け係数算出部１０５は，代表領域クラスタ（Ｍ個）を読み込む。 (1) Step S401: The weighting coefficient calculation unit 105 reads representative area clusters (M).

（２）ステップＳ４０２：Ｍ個の各代表領域クラスタに対する重み付け係数を算出するため，まずｍ＝１として，最初の代表領域クラスタを処理対象として選択する。 (2) Step S402: In order to calculate a weighting coefficient for each of the M representative area clusters, first, m = 1 is set and the first representative area cluster is selected as a processing target.

（３）ステップＳ４０３：代表領域クラスタにおける特徴量の分布のばらつきδ_mを求める。ばらつきδ_mの値の算出例としては，例えば以下の方法が挙げられる。 (3) Step S403: The distribution δ _m of the distribution of the feature amount in the representative area cluster is obtained. The calculation example of the values of variation [delta] _m, for example, the following method.

ｍ番目（ｍ＝１，２，…，Ｍ）の代表領域クラスタにおける各領域ｊ（ｊ＝１，２，…，Ｌ）の特徴量Ｘ_jを，（ｘ_j1，…，ｘ_js，…，ｘ_jn），ｓ＝１，２，…，ｎとする。 The feature quantity X _j of each region j (j = 1, 2,..., L) in the m-th (m = 1, 2,..., M) representative region cluster is _expressed as (x _j1 ,..., x _js,. x _jn ), s = 1, 2 _,.

ｍ番目の代表領域クラスタのばらつきδ_mは，次式により算出される。 The variation δ _m of the m-th representative region cluster is calculated by the following equation.

δ_m＝｛Σ_s=1 ⁿΣ_j=1 ^L（ｘ_js−μ_s）²｝／（ｎ×Ｌ）
μ_s＝Σ_j=1 ^L（ｘ_js）／Ｌ
（ただし，Σ_s=1 ⁿはｓ＝１からｎまでの総和，Σ_j=1 ^Lはｊ＝１からＬまでの総和を表す。）
なお，このばらつきδ_mの算出方法は一例であり，他に分散や標準偏差値等を用いることもできる。ばらつきδ_mは，代表領域クラスタに含まれる各領域の特徴量の平均値からのズレの量を表しているものであればよい。 δ _m = {Σ _{s = 1} ⁿ Σ _{j = 1} ^L (x _js −μ _s ) ² } / (n × L)
μ _s = Σ _{j = 1} ^L (x _js ) / L
(Where Σ _{s = 1} ⁿ represents the sum from s = 1 to n, and Σ _{j = 1} ^L represents the sum from j = 1 to L.)
Note that the method of calculating the variation δ _m is an example, and other variations, standard deviation values, and the like can also be used. The variation δ _m only needs to represent the amount of deviation from the average value of the feature values of each region included in the representative region cluster.

（４）ステップＳ４０４：代表領域クラスタモデルの重み付け係数を算出する。代表領域クラスタの重み付けを算出するときに，二つの観点がある。
〔観点１〕代表領域クラスタに属する領域数が多ければ，画像の意味を表現するのに重要となる再現性の高いオブジェクトに対応するクラスタであると考えられる。
〔観点２〕ばらつきが小さい代表領域クラスタは，画像の意味の表現に重要となる代表的なオブジェクトに対応するクラスタと考えられる。例えば，“ｔｉｇｅｒ”という意味に関する画像群において，虎の頭，体というオブジェクトは再現性が高く，画像間の類似性が高いと考えられる。 (4) Step S404: The weighting coefficient of the representative area cluster model is calculated. There are two viewpoints when calculating the weight of the representative area cluster.
[Viewpoint 1] If the number of regions belonging to the representative region cluster is large, it is considered that the cluster corresponds to an object with high reproducibility that is important for expressing the meaning of the image.
[Viewpoint 2] A representative region cluster with small variation is considered to be a cluster corresponding to a representative object that is important for expressing the meaning of an image. For example, in an image group related to the meaning of “tiger”, an object such as a tiger's head or body is considered to have high reproducibility and high similarity between images.

以上の観点を鑑みると，ｍ番目（ｍ＝１，２，…，Ｍ）の代表領域クラスタに対応付ける重み付け係数ｗ_mは，ｍ番目の代表領域クラスタの領域数Ｌ_mとばらつきδ_mを用いて次式で算出できる。 In view of the above viewpoint, the weighting coefficient w _m associated with the m-th (m = 1, 2,..., M) representative region cluster is determined using the number L _m of regions of the m-th representative region cluster and the variation δ _m. It can be calculated by the following formula.

ｗ_m＝（Ｌ_m／Σ_m=1 ^MＬ_m）×（ｅの−δ_m乗）
（５）ステップＳ４０５：すべてのＭ個の代表領域クラスタに対して，ステップＳ４０３，Ｓ４０４の処理を行ったかを判定する。行っていなければ，ｍ＝ｍ＋１として，ステップＳ４０３へ戻り，次の代表領域クラスタに対して同様に処理を繰り返す。すべて行ったならば，重み付け係数の算出処理を終了する。 w _m = (L _m / Σ _{m = 1} ^M L _m ) × (e to the power of −δ _m )
(5) Step S405: It is determined whether or not the processes in steps S403 and S404 have been performed on all M representative area clusters. If not, m = m + 1 is set, the process returns to step S403, and the same processing is repeated for the next representative area cluster. If all the processes have been performed, the weighting coefficient calculation process is terminated.

図５は，学習データ記憶部１００に格納されている学習データの例を示している。学習データ記憶部１００には，予め収集されたある意味に関する画像データが多数格納されている。例えば，「虎」を表す意味ラベル“ｔｉｇｅｒ”毎に，図５（Ａ），（Ｂ）に示されるような種々の虎の画像データ（通常はカラー画像）が学習データ記憶部１００に多数格納されている。 FIG. 5 shows an example of learning data stored in the learning data storage unit 100. The learning data storage unit 100 stores a large number of image data relating to a certain meaning collected in advance. For example, a large number of various tiger image data (usually color images) as shown in FIGS. 5A and 5B are stored in the learning data storage unit 100 for each semantic label “tiger” representing “tiger”. Has been.

図６は，図５（Ａ），（Ｂ）の意味ラベル“ｔｉｇｅｒ”を持つ画像データから，学習データ取得部１０１，領域分割部１０２，代表領域クラスタ抽出部１０３の処理によって得られた代表領域クラスタの例を示している。 FIG. 6 shows representative regions obtained by processing of the learning data acquisition unit 101, region dividing unit 102, and representative region cluster extraction unit 103 from the image data having the meaning label “tiger” shown in FIGS. An example of a cluster is shown.

図６（Ａ）は，図５（Ａ）の画像から得られた代表領域クラスタであり，黒で塗りつぶした部分以外の領域が，代表領域クラスタである。また，図６（Ｂ１），（Ｂ２）は，図５（Ｂ）の画像から得られた代表領域クラスタである。図５（Ｂ）の学習データでは，１枚の画像から複数の代表領域クラスタが得られている。 FIG. 6A shows a representative area cluster obtained from the image of FIG. 5A, and an area other than the blacked-out area is a representative area cluster. FIGS. 6B1 and 6B2 are representative area clusters obtained from the image of FIG. In the learning data of FIG. 5B, a plurality of representative area clusters are obtained from one image.

この代表領域クラスタの例から明らかなように，本発明では，代表領域クラスタを用い，画像の意味を“画像の中で広い領域を占め，かつ，頻繁に出てくる画像の構成要素の組み合わせ”で表現することを主要な特徴としている。 As is clear from the example of the representative area cluster, the present invention uses the representative area cluster, and the meaning of the image is “a combination of image components that occupy a wide area in the image and appear frequently”. The main feature is to express in

図７は，画像辞書記憶部１０６に記憶される画像辞書のデータ構造の例を示している。図７（Ａ）のように，画像辞書記憶部１０６には，代表オブジェクト数：Ｍと，Ｍ個の代表オブジェクトモデルｉ（ｉ＝１，２，…，Ｍ）のデータが格納される。各代表オブジェクトモデルｉのデータは，代表領域クラスタモデル学習部１０４で算出された平均ベクトルυ_iと分散共分散行列Σ_iのモデルパラメータである。また，重み付け係数ｗ_iは，重み付け係数算出部１０５で算出された重み付け係数である。 FIG. 7 shows an example of the data structure of the image dictionary stored in the image dictionary storage unit 106. As shown in FIG. 7A, the image dictionary storage unit 106 stores data of the number of representative objects: M and M representative object models i (i = 1, 2,..., M). The data of each representative object model i is a model parameter of the average vector υ _i and the variance-covariance matrix Σ _i calculated by the representative region cluster model learning unit 104. The weighting coefficient w _i is a weighting coefficient calculated by the weighting coefficient calculation unit 105.

代表オブジェクト数は，ある意味に関する画像群から得られた代表領域クラスタの数である。代表領域クラスタは，画像中の特徴的な領域のかたまりであり，画像中に撮影されている何らかの意味のある対象（オブジェクト）に対応していると考えられる。そこで，ここでは代表領域クラスタの画像情報を代表オブジェクトと称している。 The number of representative objects is the number of representative area clusters obtained from an image group related to a certain meaning. The representative region cluster is a group of characteristic regions in the image, and is considered to correspond to some meaningful object (object) captured in the image. Therefore, here, the image information of the representative area cluster is referred to as a representative object.

図６に示した代表領域クラスタの例では，“ｔｉｇｅｒ”に関する画像辞書として，画像辞書記憶部１０６には，図７（Ｂ）に示すような代表オブジェクト数（代表領域クラスタ数）と，図６（Ａ）の代表領域クラスタのモデルパラメータυ₁，Σ₁および重み付け係数ｗ₁と，図６（Ｂ１）の代表領域クラスタのモデルパラメータυ₂，Σ₂および重み付け係数ｗ₂と，図６（Ｂ２）の代表領域クラスタのモデルパラメータυ₃，Σ₃および重み付け係数ｗ₃などの情報が格納されることになる。 In the example of the representative area cluster shown in FIG. 6, the image dictionary storage unit 106 stores the number of representative objects (representative area cluster number) as shown in FIG. The model parameters υ ₁ and Σ ₁ and the weighting coefficient w ₁ of the representative area cluster in (A), the model parameters υ ₂ and Σ ₂ and the weighting coefficient w ₂ of the representative area cluster in FIG. 6 (B1), and FIG. ), Such as model parameters υ ₃ and Σ ₃ and weighting coefficient w ₃ of the representative area cluster are stored.

画像辞書生成装置１０で生成した画像辞書は，未知画像に対する意味ラベルの付与などに用いることができる。図８に，画像辞書を用いて未知画像に対して意味ラベルを付与する処理のフローチャートを示す。以下，図８に従って未知画像への意味ラベル付与の処理の流れを説明する。 The image dictionary generated by the image dictionary generating apparatus 10 can be used for giving a semantic label to an unknown image. FIG. 8 shows a flowchart of processing for assigning a semantic label to an unknown image using an image dictionary. Hereinafter, the flow of the process of assigning a semantic label to an unknown image will be described with reference to FIG.

（１）ステップＳ５０１：意味ラベル付与の対象となる新しい画像（未知画像という）を入力する。 (1) Step S501: A new image (referred to as an unknown image) to which a semantic label is attached is input.

（２）ステップＳ５０２：入力した未知画像を，画像辞書生成時における領域分割部１０２と同じ手法により領域分割する（図２のステップＳ２０２の説明を参照）。 (2) Step S502: The input unknown image is divided into regions by the same method as the region dividing unit 102 at the time of image dictionary generation (refer to the description of step S202 in FIG. 2).

（３）ステップＳ５０３：未知画像をＴ個の領域に分割したとする。その分割した各領域のｎ次元特徴空間における特徴量Ｒ_t（ｔ＝１，２，…，Ｔ）を抽出する。 (3) Step S503: It is assumed that the unknown image is divided into T areas. A feature value R _t (t = 1, 2,..., T) in the n-dimensional feature space of each divided region is extracted.

（４）ステップＳ５０４：意味ラベルに応じて図７（Ａ）に示されるような情報が格納された，ある意味に関する画像辞書から代表オブジェクトモデルのモデルパラメータである平均ベクトルυ_m，分散共分散行列Σ_mおよび重み付け係数ｗ_m（ｍ＝１，２，…，Ｍ）を読み出す。 (4) Step S504: The average vector υ _m , which is the model parameter of the representative object model, from the image dictionary relating to a certain meaning in which information as shown in FIG. 7A is stored according to the meaning label, the variance-covariance matrix Read out Σ _m and weighting coefficient w _m (m = 1, 2,..., M).

（５）ステップＳ５０５：読み出した代表オブジェクトモデルと未知画像との類似度Ｓｉｍを，次式に従って算出する。未知画像における各領域の特徴量を，Ｒ₁，Ｒ₂，…，Ｒ_t，…，Ｒ_Tとする。また，ある意味に関する画像辞書から読み出されたモデル情報が，（υ₁，Σ₁，ｗ₁），（υ₂，Σ₂，ｗ₃），…，（υ_m，Σ_m，ｗ_m），…，（υ_M，Σ_M，ｗ_M）であったとする。 (5) Step S505: The similarity Sim between the read representative object model and the unknown image is calculated according to the following equation. The feature amount of each region in the unknown _{_{image, R 1, R 2, ...}} , R t, ..., and R _T. Also, model information read from the image dictionary related to a certain meaning is (υ ₁ , Σ ₁ , w ₁ ), (υ ₂ , Σ ₂ , w ₃ ),..., (Υ _m , Σ _m , w _m ). , ..., (υ _M , Σ _M , w _M ).

式中，‖Σ_m‖はΣ_mのノルム，（Ｒ_t−υ_m）^Tは（Ｒ_t−υ_m）の転置行列，Σ_m ^-1はΣ_mの逆行列を表している。

In the equation, ‖Σ _m ‖ represents the norm of Σ _m , (R _t −υ _m ) ^T represents the transposed matrix of (R _t −υ _m ), and Σ _m ⁻¹ represents the inverse matrix of Σ _m .

（６）ステップＳ５０６：算出した類似度Ｓｉｍと予め設定された閾値とを比較し，類似度Ｓｉｍが閾値より大きければ，現在の画像辞書が持つ意味ラベルを未知画像に付与する。類似度Ｓｉｍが閾値より小さければ，意味ラベルは付与しない。 (6) Step S506: The calculated similarity Sim is compared with a preset threshold value. If the similarity Sim is larger than the threshold value, the semantic label of the current image dictionary is assigned to the unknown image. If the similarity Sim is smaller than the threshold value, no semantic label is assigned.

（７）ステップＳ５０７：他の意味ラベルを持つ画像辞書がある場合，ステップＳ５０４へ戻り，その画像辞書について同様に処理を繰り返す。 (7) Step S507: If there is an image dictionary having another meaning label, the process returns to step S504, and the processing is repeated in the same manner for the image dictionary.

なお，上記処理において，意味ラベルに関連付けられたすべての画像辞書について類似度Ｓｉｍを算出し，その中で最大の類似度Ｓｉｍとなる画像辞書の意味ラベルを未知画像に付与するようにしてもよいし，最大の類似度Ｓｉｍがある閾値以上である場合にだけ，その意味ラベルを付与するような実施も可能である。 In the above process, the similarity Sim may be calculated for all image dictionaries associated with the semantic label, and the semantic label of the image dictionary having the maximum similarity Sim among them may be assigned to the unknown image. However, it is also possible to implement the meaning label only when the maximum similarity Sim is equal to or greater than a certain threshold value.

ここでは，画像辞書を未知画像への意味ラベル付与に用いる例を説明したが，例えば大量な画像群の中から「虎」の画像を検索するというような画像検索に，本発明により生成した画像辞書を用いることもできる。このときにも上記類似度Ｓｉｍの算出を行い，目的とする画像であるかどうかの判定を行う。 Here, an example in which an image dictionary is used for assigning a semantic label to an unknown image has been described. For example, an image generated by the present invention is used for an image search such as searching for a “tiger” image from a large group of images. A dictionary can also be used. Also at this time, the similarity Sim is calculated to determine whether the image is the target image.

以上の画像辞書の生成処理は，コンピュータとソフトウェアプログラムとによって実現することができ，そのプログラムをコンピュータ読み取り可能な記録媒体に記録して提供することも，ネットワークを通して提供することも可能である。 The above image dictionary generation processing can be realized by a computer and a software program, and the program can be provided by being recorded on a computer-readable recording medium or provided through a network.

本発明の一実施形態に係る画像辞書生成装置の構成例を示す図である。It is a figure which shows the structural example of the image dictionary production | generation apparatus which concerns on one Embodiment of this invention. 画像辞書生成装置の基本動作を示すフローチャートである。It is a flowchart which shows the basic operation | movement of an image dictionary production | generation apparatus. 代表領域クラスタ抽出部の処理フローチャートである。It is a process flowchart of a representative area cluster extraction part. 重み付け係数算出部の処理フローチャートである。It is a process flowchart of a weighting coefficient calculation part. 学習データの例を示す図である。It is a figure which shows the example of learning data. 代表領域クラスタの例を示す図である。It is a figure which shows the example of a representative area cluster. 画像辞書のデータ構造の例を示す図である。It is a figure which shows the example of the data structure of an image dictionary. 画像辞書の利用例を示すフローチャートである。It is a flowchart which shows the usage example of an image dictionary.

符号の説明Explanation of symbols

１０画像辞書生成装置
１００学習データ記憶部
１０１学習データ取得部
１０２領域分割部
１０３代表領域クラスタ抽出部
１０４代表領域クラスタモデル学習部
１０５重み付け係数算出部
１０６画像辞書記憶部 DESCRIPTION OF SYMBOLS 10 Image dictionary production | generation apparatus 100 Learning data memory | storage part 101 Learning data acquisition part 102 Area division part 103 Representative area cluster extraction part 104 Representative area cluster model learning part 105 Weighting coefficient calculation part 106 Image dictionary memory | storage part

Claims

画像の意味の解析に用いる画像辞書を生成する画像辞書生成装置であって，
ある意味に関する意味ラベルが付与された画像群を学習データとして取得する学習データ取得手段と，
取得した学習データの画像を，画像の特徴量に基づいて複数の領域に分割する領域分割手段と，
前記領域分割手段で得られた学習データの領域群に対して類似する特徴量を持つ領域を統合することによりクラスタリングを行い，クラスタリング結果の領域クラスタの中から，少なくとも領域クラスタにおける領域の数または領域の面積を条件として含む所定の代表領域クラスタの選定条件に基づき，複数個の代表領域クラスタを抽出する代表領域クラスタ抽出手段と，
抽出した各代表領域クラスタに対し，代表領域クラスタに含まれる画像サンプルを学習データとして代表領域クラスタモデルを構築し，構築した代表領域クラスタモデルを表すモデルパラメータを出力する代表領域クラスタモデル学習手段と，
前記各代表領域クラスタモデルの意味に対する重要度を表す重み付け係数を，前記代表領域クラスタに属する領域群の特徴量に基づいて，代表領域クラスモデル毎に算出する重み付け係数算出手段とを備え，
前記学習データの画像群に付与された意味ラベル毎に，前記各代表領域クラスタモデルのモデルパラメータと前記重み付け係数とを画像辞書として格納する
ことを特徴とする画像辞書生成装置。 An image dictionary generation device for generating an image dictionary used for analyzing the meaning of an image,
Learning data acquisition means for acquiring, as learning data, an image group to which a semantic label relating to a certain meaning is attached;
Area dividing means for dividing the acquired learning data image into a plurality of areas based on the feature amount of the image;
Clustering is performed by integrating regions having similar feature quantities with respect to the region group of the learning data obtained by the region dividing means, and at least the number of regions or regions in the region cluster are selected from the region clusters of the clustering result. Representative region cluster extracting means for extracting a plurality of representative region clusters based on a selection condition of a predetermined representative region cluster including the area of
For each extracted representative region cluster, a representative region cluster model learning means for constructing a representative region cluster model using the image samples included in the representative region cluster as learning data and outputting model parameters representing the constructed representative region cluster model;
A weighting coefficient calculating means for calculating a weighting coefficient representing the importance of the meaning of each representative area cluster model for each representative area class model based on a feature amount of an area group belonging to the representative area cluster;
The image dictionary generation device, wherein the model parameter and the weighting coefficient of each representative region cluster model are stored as an image dictionary for each semantic label assigned to the learning data image group.

前記重み付け係数算出手段は，
前記各代表領域クラスタに属する領域群の領域の数と，該代表領域クラスタに属する領域群の特徴量の分布のばらつきに基づいて，前記領域の数が多いほど重み付け係数の値が大きく，前記ばらつきが大きいほど重み付け係数の値が小さくなるように重み付け係数を算出する
ことを特徴とする請求項１記載の画像辞書生成装置。 The weighting coefficient calculating means includes
Based on the number of regions of the group of regions belonging to each representative region cluster and the variation in the distribution of feature values of the region group belonging to the representative region cluster, the larger the number of regions, the larger the value of the weighting coefficient. The image dictionary generation device according to claim 1, wherein the weighting coefficient is calculated so that the value of the weighting coefficient becomes smaller as the value becomes larger.

画像辞書生成装置が画像の意味の解析に用いる画像辞書を生成する画像辞書生成方法であって，
ある意味に関する意味ラベルが付与された画像群を学習データとして取得する学習データ取得過程と，
取得した学習データの画像を，画像の特徴量に基づいて複数の領域に分割する領域分割過程と，
前記領域分割過程で得られた学習データの領域群に対して類似する特徴量を持つ領域を統合することによりクラスタリングを行い，クラスタリング結果の領域クラスタの中から，少なくとも領域クラスタにおける領域の数または領域の面積を条件として含む所定の代表領域クラスタの選定条件に基づき，複数個の代表領域クラスタを抽出する代表領域クラスタ抽出過程と，
抽出した各代表領域クラスタに対し，代表領域クラスタに含まれる画像サンプルを学習データとして代表領域クラスタモデルを構築し，構築した代表領域クラスタモデルを表すモデルパラメータを出力する代表領域クラスタモデル学習過程と，
前記各代表領域クラスタモデルの意味に対する重要度を表す重み付け係数を，前記代表領域クラスタに属する領域群の特徴量に基づいて，代表領域クラスモデル毎に算出する重み付け係数算出過程とを有し，
前記学習データの画像群に付与された意味ラベル毎に，前記各代表領域クラスタモデルのモデルパラメータと前記重み付け係数とを画像辞書として格納する
ことを特徴とする画像辞書生成方法。 An image dictionary generation method for generating an image dictionary used by an image dictionary generation device for analyzing the meaning of an image,
A learning data acquisition process for acquiring, as learning data, a group of images with a semantic label related to a certain meaning,
A region dividing process for dividing the acquired learning data image into a plurality of regions based on the feature amount of the image;
Clustering is performed by integrating regions having similar feature quantities with respect to the region group of the learning data obtained in the region dividing process, and at least the number of regions or regions in the region cluster are selected from the region clusters of the clustering result. A representative region cluster extraction process for extracting a plurality of representative region clusters based on a predetermined representative region cluster selection condition including the area of
For each extracted representative region cluster, a representative region cluster model learning process for constructing a representative region cluster model using the image samples included in the representative region cluster as learning data and outputting model parameters representing the constructed representative region cluster model;
A weighting coefficient calculation step for calculating a weighting coefficient representing importance for the meaning of each representative area cluster model for each representative area class model based on a feature amount of the area group belonging to the representative area cluster,
A method for generating an image dictionary, comprising storing, as an image dictionary, model parameters and weighting factors of each representative area cluster model for each semantic label assigned to an image group of the learning data.

前記重み付け係数算出過程では，
前記各代表領域クラスタに属する領域群の領域の数と，該代表領域クラスタに属する領域群の特徴量の分布のばらつきに基づいて，前記領域の数が多いほど重み付け係数の値が大きく，前記ばらつきが大きいほど重み付け係数の値が小さくなるように重み付け係数を算出する
ことを特徴とする請求項３記載の画像辞書生成方法。 In the weighting coefficient calculation process,
Based on the number of regions of the group of regions belonging to each representative region cluster and the variation in the distribution of feature values of the region group belonging to the representative region cluster, the larger the number of regions, the larger the value of the weighting coefficient. The image dictionary generation method according to claim 3, wherein the weighting coefficient is calculated so that the value of the weighting coefficient becomes smaller as the value becomes larger.

請求項３または請求項４記載の画像辞書生成方法を，コンピュータに実行させるための画像辞書生成プログラム。 An image dictionary generation program for causing a computer to execute the image dictionary generation method according to claim 3 or 4.

請求項５記載の画像辞書生成プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium on which the image dictionary generation program according to claim 5 is recorded.