JP4937395B2

JP4937395B2 - Feature vector generation apparatus, feature vector generation method and program

Info

Publication number: JP4937395B2
Application number: JP2010225304A
Authority: JP
Inventors: ゾランステイチ
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2010-10-05
Filing date: 2010-10-05
Publication date: 2012-05-23
Anticipated expiration: 2030-10-05
Also published as: JP2012079187A

Description

本発明は、特徴ベクトル生成装置、特徴ベクトル生成方法及びプログラムに関する。 The present invention relates to a feature vector generation device, a feature vector generation method, and a program.

画像を検索キーとして入力し、画像の特徴量（配色、テクスチャ、形状等の画像の特徴を数値化して表現したもの）を比較することにより、検索キーである画像（以下「クエリ画像」という）に類似する画像を検索する技術が知られている。ユーザがクエリ画像を入力すると、クエリ画像から特徴量を抽出して、検索対象の画像の特徴量との類似度を算出することで、類似画像を検索する（例えば、特許文献１）。 By inputting an image as a search key and comparing image feature amounts (represented by quantifying image features such as color scheme, texture, and shape), an image that is a search key (hereinafter referred to as a “query image”) A technique for searching for an image similar to is known. When the user inputs a query image, the feature amount is extracted from the query image, and the similarity with the feature amount of the search target image is calculated to search for a similar image (for example, Patent Document 1).

また、画像内の部分的な領域に注目した類似画像の検索技術として、ビジュアルキーワードという手法が知られている。これは、１枚の画像が複数の部分画像により構成されていると捉えることにより考案された手法であり、次のような処理により特徴ベクトルが生成される。 Further, as a similar image search technique that focuses on a partial area in an image, a technique called a visual keyword is known. This is a technique devised by considering that one image is composed of a plurality of partial images, and a feature vector is generated by the following processing.

即ち、画像から複数の部分画像を抽出して、予め画像がクラスタリングされて形成された基準となるクラスタ（以下適宜「基準クラスタ」という）に対して、その部分画像を特徴量に基づいて分類し、各部分画像が属する基準クラスタの数に基づいて特徴ベクトルが生成される。このように、ビジュアルキーワードを用いることで、画像全体から抽出される特徴量ではなく、画像を細かな領域として捉えた特徴量により、精度のよい画像検索が可能になる。 That is, a plurality of partial images are extracted from the image, and the partial image is classified based on the feature amount with respect to a reference cluster formed by clustering the images in advance (hereinafter referred to as “reference cluster” as appropriate). A feature vector is generated based on the number of reference clusters to which each partial image belongs. As described above, by using the visual keyword, it is possible to perform an image search with high accuracy not by the feature amount extracted from the entire image but by the feature amount obtained by capturing the image as a fine area.

特開２００１−５２１７５号公報JP 2001-52175 A

しかし、上述の従来技術には、次ぎのような課題があった。
先ず、特許文献１の技術により１枚の画像から抽出される特徴量は、その画像全体の特徴を示すものであるため、全体的な構成が類似している画像を検索する際に有効である。しかし、一般に１枚の画像の中に含まれるオブジェクトは多種多様であり、それらを特許文献１の特徴量で表現するのは困難である。 However, the above-described conventional technology has the following problems.
First, since the feature amount extracted from one image by the technique of Patent Document 1 indicates the feature of the entire image, it is effective when searching for an image having a similar overall configuration. . However, there are generally a wide variety of objects included in one image, and it is difficult to express them with the feature values of Patent Document 1.

一枚の画像が複数の領域に分割し、その分割した部分画像毎の特徴量によって画像の特徴を表すこともできるが、比較対象の画像間で部分画像毎の特徴量を比較するには、例えば、１００分割した２枚の画像の比較であれば、１００×１００通りのパターンで特徴量を比較し類似度を算出するように、計算量が膨大となる。 Although one image can be divided into a plurality of regions and the feature of the image can be represented by the feature amount of each divided partial image, in order to compare the feature amount of each partial image between the comparison target images, For example, if two images divided into 100 are compared, the amount of calculation becomes enormous as the feature amount is compared in 100 × 100 patterns and the similarity is calculated.

これに対し、ビジュアルキーワードあれば、各基準クラスタに分類された部分画像の数によって表現された特徴ベクトルにより類似度の算出が可能になる。しかし、各部分画像と、各基準クラスタとの間の距離を全て算出する必要があり、１枚の画像から抽出される部分画像は数百〜数千、分類対象であるクラスタの数は数万〜数十万となると、これらの全ての組み合わせでの距離算出の計算コストは膨大となってしまった。 On the other hand, in the case of a visual keyword, the similarity can be calculated from the feature vector expressed by the number of partial images classified into each reference cluster. However, it is necessary to calculate all the distances between each partial image and each reference cluster, and hundreds to thousands of partial images are extracted from one image, and the number of clusters to be classified is tens of thousands. When it is ~ hundreds of thousands, the calculation cost of the distance calculation in all these combinations becomes enormous.

本発明は、上述の課題に鑑みて為されたものであり、その目的とするところは、ビジュアルキーワードを用いた特徴ベクトル生成のための計算コストを低減することである。 The present invention has been made in view of the above-described problems, and an object thereof is to reduce the calculation cost for generating a feature vector using a visual keyword.

上記目的を達成するため、第１の発明は、画像内から抽出した複数の部分画像を複数の基準クラスタの何れかに分類して、各基準クラスタへの分類数に基づく特徴ベクトルを生成する特徴ベクトル生成装置において、前記抽出された複数の部分画像を、該部分画像の特徴量に基づいてクラスタリングするクラスタリング手段と、前記クラスタリングによって形成されたクラスタを対象クラスタとして、該対象クラスタと前記各基準クラスタとの間の距離を算出する距離算出手段と、前記対象クラスタに属する部分画像を、該対象クラスタとの前記距離が最も近い基準クラスタに分類する分類手段と、を備えることを特徴としている。 To achieve the above object, the first invention classifies a plurality of partial images extracted from an image into any one of a plurality of reference clusters, and generates a feature vector based on the number of classifications to each reference cluster. In the vector generation device, clustering means for clustering the plurality of extracted partial images based on feature amounts of the partial images, and using the cluster formed by the clustering as a target cluster, the target cluster and each of the reference clusters A distance calculating means for calculating a distance between the target cluster and a classifying means for classifying the partial image belonging to the target cluster into a reference cluster having the closest distance to the target cluster.

第１の発明によれば、画像から抽出した部分画像を一旦クラスタリングして纏め上げ、このクラスタリングに形成される対象クラスタと、基準クラスタとの間の距離に基づいて、対象クラスタに属する部分画像をまとめて基準クラスタに分類する。このため、部分画像と、基準クラスタとの間の距離を全て算出する計算コストが削減される。従って、ビジュアルキーワードを用いた特徴ベクトル生成のための計算コストを低減することができる。 According to the first invention, the partial images extracted from the images are once clustered and collected, and based on the distance between the target cluster formed by this clustering and the reference cluster, the partial images belonging to the target cluster are Collectively classify into reference clusters. For this reason, the calculation cost for calculating all the distances between the partial image and the reference cluster is reduced. Accordingly, the calculation cost for generating the feature vector using the visual keyword can be reduced.

また、第２の発明における前記分類手段は、前記個々の部分画像と、前記画像内から抽出した部分画像の分類が行われた基準クラスタとの間の距離を算出し、この距離が最も近い基準クラスタに分類先を変更することを特徴としている。 In the second invention, the classification means calculates a distance between the individual partial images and a reference cluster on which the partial images extracted from the images are classified, and a reference having the closest distance. It is characterized by changing the classification destination to a cluster.

第２の発明によれば、個々の部分画像と、分類が行われた基準クラスタとの間の距離が最も近い基準クラスタに分類先を変更するため、クラスタ単位でまとめて分類された部分画像の中でも、基準クラスタとの距離が遠い部分画像については適切な基準クラスタに分類先を変更することができる。従って、上記の効果に加えて、特徴ベクトルの精度を向上させることができる。 According to the second invention, since the classification destination is changed to the reference cluster having the shortest distance between the individual partial images and the classified reference clusters, the partial images grouped in units of clusters are classified. Among these, for a partial image that is far from the reference cluster, the classification destination can be changed to an appropriate reference cluster. Therefore, in addition to the above effect, the accuracy of the feature vector can be improved.

また、第３の発明において、前記分類が行われた基準クラスタと、該基準クラスタに分類された前記部分画像との間の距離を算出するクラスタ距離算出手段を更に備え、前記分類手段は、前記クラスタ距離算出手段により算出された距離が所定値以上の部分画像を選定し、その選定した部分画像と該部分画像の分類が行われた基準クラスタとの間の距離を算出して、分類先の変更を行うことを特徴としている。 Further, in the third invention, the apparatus further comprises a cluster distance calculating unit that calculates a distance between the reference cluster on which the classification is performed and the partial image classified into the reference cluster, Select a partial image whose distance calculated by the cluster distance calculation means is a predetermined value or more, calculate the distance between the selected partial image and the reference cluster on which the partial image is classified, It is characterized by making changes.

第３の発明によれば、分類された部分画像と、分類先の基準クラスタとの間の距離が一定値以上となる部分画像について、該部分画像と基準クラスタとの間の距離に基づいて分類先を変更する。このため、対象クラスタとの距離が遠い部分画像について、分類先の変更を行える。従って、分類先の見直しを行う部分画像を絞り込むことができ、特徴ベクトルの精度向上のための計算コストを抑えることができる。 According to the third invention, a partial image in which the distance between the classified partial image and the reference cluster to be classified is a certain value or more is classified based on the distance between the partial image and the reference cluster. Change the destination. For this reason, the classification destination can be changed for a partial image that is far from the target cluster. Therefore, it is possible to narrow down the partial images for which the classification destination is to be reviewed, and it is possible to reduce the calculation cost for improving the accuracy of the feature vector.

本発明によれば、ビジュアルキーワードを用いた特徴ベクトル生成のための計算コストを低減することができる。 According to the present invention, the calculation cost for generating a feature vector using a visual keyword can be reduced.

画像検索装置の機能構成を示すブロック図。The block diagram which shows the function structure of an image search device. 特徴ベクトル生成処理のフローチャート。The flowchart of a feature vector generation process. 特徴ベクトルを生成するまでの処理を説明するための概念図。The conceptual diagram for demonstrating the process until it produces | generates a feature vector. マッピング先のビジュアルキーワードを変更する処理を説明するための概念図。The conceptual diagram for demonstrating the process which changes the visual keyword of a mapping destination.

［画像検索装置の構成］
以下、本発明の特徴ベクトル生成装置を、図１に示す画像検索装置１内の特徴ベクトル生成部２０及びインデクシング部７０に適用した場合の実施の形態を図面に基づいて説明する。 [Configuration of image search device]
Hereinafter, an embodiment in which the feature vector generation device of the present invention is applied to the feature vector generation unit 20 and the indexing unit 70 in the image search device 1 shown in FIG. 1 will be described with reference to the drawings.

図１は、画像検索装置１の機能ブロック図である。画像検索装置１は、通信ネットワークを介して接続されたインターネットに接続され、該インターネットを介してウェブ上から画像データを収集可能となっている。この収集したデータをデータベース（ＤＢ）に蓄積して、検索対象の画像を作成する。 FIG. 1 is a functional block diagram of the image search apparatus 1. The image search device 1 is connected to the Internet connected via a communication network, and can collect image data from the web via the Internet. The collected data is accumulated in a database (DB) to create a search target image.

画像検索装置１は、通信ネットワークを介して接続されたパーソナルコンピュータや携帯端末等のクライアント端末から送信されるクエリ画像を検索要求として受信する。そして、その検索要求に応じた類似画像検索を行って、類似度順にランキングした検索結果をクライアント端末に返送する。 The image search apparatus 1 receives a query image transmitted from a client terminal such as a personal computer or a mobile terminal connected via a communication network as a search request. Then, a similar image search is performed according to the search request, and search results ranked in the order of similarity are returned to the client terminal.

本実施形態における画像検索装置１は、ビジュアルキーワードの手法を用いて画像の特徴ベクトルを生成してインデックス化する。ビジュアルキーワードによる画像検索とは、１枚の画像を複数の画像領域の集合として表現し、各画像を構成する画像領域（以下、適宜「部分画像」という）から得られる特徴量に基づいて画像のインデックス（特徴ベクトル）を生成する技術であり、テキスト中のキーワードから文章の特徴量を求めるテキスト検索技術の応用といえる。 The image search apparatus 1 according to the present embodiment generates and indexes an image feature vector using a visual keyword technique. An image search using a visual keyword represents a single image as a set of a plurality of image areas, and the image search is performed based on feature amounts obtained from image areas (hereinafter referred to as “partial images” as appropriate) constituting each image. This is a technique for generating an index (feature vector), and can be said to be an application of a text search technique for obtaining a feature amount of a sentence from a keyword in the text.

このため、ビジュアルキーワードによる画像検索では、画像中の画像領域をキーワードとして扱うことでテキスト検索技術（転置インデックスやベクトル空間モデル、単語の出現頻度等）における技術を画像領域検索へ適用して、大規模且つ高速性を実現することができる。 For this reason, in image search using visual keywords, text search technology (transposition index, vector space model, word appearance frequency, etc.) is applied to image region search by treating the image region in the image as a keyword. Scale and high speed can be realized.

ビジュアルキーワードによる画像検索についての参考技術文献としては、
・Sivic and Zisserman:“Efficient visual search for objects in videos”, Proceedings of the IEEE, Vol.96,No.4.,pp.548-566,Apr 2008.
・Yang and Hauptmann:“A text categorization approach to video scene classification using keypoint features”,Carnegie Mellon University Technical Report,pp.25,Oct 2006.
・Jiang and Ngo:“Bag-of-visual-words expansion using visual relatedness for video indexing”,Proc.31^st ACM SIGIR Conf.,pp.769-770,Jul 2008.
・Jiang, Ngo, andYang:“Towards optimal bag-of-features for object categorization and semantic video retrieval”,Proc.6th ACM CIVR Conf.,pp.494-501,Jul.2007.
・Yang, Jiang, Hauptmann, and Ngo:“Evaluating bag-of-visual-words representations in scene classification”,Proc.15^th ACM MM Conf., Workshop onMMIR,pp.197-206,Sep. 2007.
等が挙げられる。 As reference technical literature on image search using visual keywords,
Sivic and Zisserman: “Efficient visual search for objects in videos”, Proceedings of the IEEE, Vol.96, No.4, pp.548-566, Apr 2008.
・ Yang and Hauptmann: “A text categorization approach to video scene classification using keypoint features”, Carnegie Mellon University Technical Report, pp. 25, Oct 2006.
・ Jiang and Ngo: “Bag-of-visual-words expansion using visual relatedness for video indexing”, Proc. 31 ^st ACM SIGIR Conf., Pp.769-770, Jul 2008.
・ Jiang, Ngo, and Yang: “Towards optimal bag-of-features for object categorization and semantic video retrieval”, Proc. 6th ACM CIVR Conf., Pp.494-501, Jul. 2007.
・ Yang, Jiang, Hauptmann, and Ngo: “Evaluating bag-of-visual-words representations in scene classification”, Proc. 15 ^th ACM MM Conf., Workshop on MMIR, pp.197-206, Sep. 2007.
Etc.

また、ある一つの画像を複数の部分画像の集合として表現することによって、一般的な類似画像検索とは異なり、画像中の一部分を任意の大きさや位置で切り出した画像をクエリ画像とした検索が可能となる。このため、ユーザは、所望の検索結果を得るために、画像の一部分を指定するといった操作により、より直接・正確にクエリを表現することができる。 In addition, by expressing a single image as a set of a plurality of partial images, unlike general similar image search, search using an image obtained by cutting a part of an image in an arbitrary size or position as a query image is possible. It becomes possible. For this reason, in order to obtain a desired search result, the user can express a query more directly and accurately by an operation such as designating a part of an image.

図１に示すように、画像検索装置１は、クエリ画像受付部１０、特徴ベクトル生成部２０、類似度算出部３０、検索結果出力部５０、ビジュアルキーワード生成部６０、ビジュアルキーワードＤＢ６５、インデクシング部７０、インデックスＤＢ７５、領域管理ＤＢ８０及び検索対象画像ＤＢ９０を備えて構成される。 As shown in FIG. 1, the image search device 1 includes a query image reception unit 10, a feature vector generation unit 20, a similarity calculation unit 30, a search result output unit 50, a visual keyword generation unit 60, a visual keyword DB 65, and an indexing unit 70. , An index DB 75, a region management DB 80, and a search target image DB 90.

これらの機能部は、所謂コンピュータにより構成され、演算／制御装置としてのＣＰＵ（Central Processing Unit）、記憶媒体としてのＲＡＭ（Random Access Memory）及びＲＯＭ（Read Only Memory）、通信インターフェイス等が連関することで実現される。 These functional units are configured by so-called computers, and are associated with a CPU (Central Processing Unit) as an arithmetic / control device, a RAM (Random Access Memory) and a ROM (Read Only Memory) as a storage medium, a communication interface, and the like. It is realized with.

クエリ画像受付部１０は、クライアント端末から送信される類似画像検索の検索キーとなるクエリ画像を受信して受け付ける。このクエリ画像は、検索対象画像ＤＢ９０に格納されている画像や、その画像データの一部分の領域を指定する操作により切り出された画像、新たに受信した画像がある。また、クエリ画像としては、１つの画像であってもよいし、複数の画像の組み合わせでもよい。 The query image receiving unit 10 receives and receives a query image that is a search key for similar image search transmitted from a client terminal. This query image includes an image stored in the search target image DB 90, an image cut out by an operation for designating a partial area of the image data, and a newly received image. The query image may be a single image or a combination of a plurality of images.

特徴ベクトル生成部２０は、クエリ画像から部分画像を抽出し、その部分画像の特徴量に基づいて特徴ベクトルを生成する特徴ベクトル生成処理（図２参照）を行って、クエリ画像から特徴ベクトルを生成する。特徴ベクトル生成部２０は、図１に示すように、部分画像抽出部２２と、クラスタリング手段としてのクラスタリング部２４と、距離算出手段及び分類手段としてのマッピング部２６とを備えて構成され、これらが協働することにより、後述する特徴ベクトル生成処理を実現する。 The feature vector generation unit 20 extracts a partial image from the query image, performs a feature vector generation process (see FIG. 2) that generates a feature vector based on the feature amount of the partial image, and generates a feature vector from the query image. To do. As shown in FIG. 1, the feature vector generation unit 20 includes a partial image extraction unit 22, a clustering unit 24 as a clustering unit, and a mapping unit 26 as a distance calculation unit and a classification unit. By collaborating, a feature vector generation process to be described later is realized.

類似度算出部３０は、インデックスＤＢ７５に記憶された検索対象画像毎の特徴ベクトルと、クエリ画像から生成した特徴ベクトルとの間の類似度を算出する。この類似度の算出には、コサイン距離やBhattacharyya距離等の公知技術が用いられる。尚、比較領域の切り出しや、類似度の算出の詳細については後述する。 The similarity calculation unit 30 calculates the similarity between the feature vector for each search target image stored in the index DB 75 and the feature vector generated from the query image. For calculating the similarity, a known technique such as a cosine distance or a Bhattacharyya distance is used. The details of extracting the comparison area and calculating the similarity will be described later.

検索結果出力部５０は、類似度算出部３０により算出された類似度に基づいて、検索対象の画像をランク付けしたデータを生成する。この検索結果出力部５０が出力するデータは、例えば、検索対象画像の画像ＩＤを類似度に基づいてソートしたデータである。画像ＩＤには、検索対象画像ＤＢ９０にアクセスするためのアドレス（ＵＲＬ）を付加してもよい。 The search result output unit 50 generates data obtained by ranking the images to be searched based on the similarity calculated by the similarity calculation unit 30. The data output by the search result output unit 50 is, for example, data obtained by sorting the image IDs of search target images based on the similarity. An address (URL) for accessing the search target image DB 90 may be added to the image ID.

ビジュアルキーワード生成部６０は、画像データの特徴ベクトルを生成する際に、画像内の部分画像をマッピングする対象の分類先（基準クラスタ）を生成する。ビジュアルキーワード生成部６０は、画像検索に用いる画像や学習用に予め用意された画像データから複数の部分画像を抽出し、その部分画像の有する特徴量に基づいてそれらをクラスタリングする。尚、クラスタリングの標準的な手法としては、k-means, Hierarchical Agglomerative Clustering(HAC)などが用いられる。 When generating a feature vector of image data, the visual keyword generation unit 60 generates a classification destination (reference cluster) to which a partial image in the image is mapped. The visual keyword generation unit 60 extracts a plurality of partial images from images used for image search and image data prepared in advance for learning, and clusters them based on the feature amounts of the partial images. As a standard method of clustering, k-means, Hierarchical Agglomerative Clustering (HAC) or the like is used.

特徴ベクトル生成部２０は、画像から検出した部分画像を、ビジュアルキーワード生成部６０のクラスタリングにより形成される基準クラスタにマッピング（分類）することで、特徴ベクトルを生成する。この基準クラスタを、画像を視覚的なキーワードの集まりとして表現するための特徴量空間として「ビジュアルキーワード」という。 The feature vector generation unit 20 generates a feature vector by mapping (classifying) the partial image detected from the image to a reference cluster formed by clustering of the visual keyword generation unit 60. This reference cluster is called a “visual keyword” as a feature space for expressing an image as a collection of visual keywords.

ビジュアルキーワードＤＢ６５は、ビジュアルキーワード生成部６０のクラスタリングにより形成されたクラスタを識別するビジュアルキーワードＩＤ（ＶＫＩＤ）と、そのクラスタの特徴量空間（多次元空間）での中心点の座標である中心座標と、該クラスタの範囲を示す半径とを対応付けて記憶するデータベースである。 The visual keyword DB 65 includes a visual keyword ID (VKID) for identifying a cluster formed by clustering by the visual keyword generation unit 60, and a center coordinate that is a coordinate of a center point in the feature amount space (multidimensional space) of the cluster. , A database that stores the radius indicating the cluster range in association with each other.

中心座標は、各クラスタに属する画像の特徴量の平均値を示す値であり、特徴量空間上での多次元の座標により示される。半径は、例えば、クラスタに属する画像のうちの、中心座標から最遠の画像との距離により求められる。 The center coordinate is a value indicating an average value of feature amounts of images belonging to each cluster, and is represented by multidimensional coordinates on the feature amount space. The radius is obtained, for example, by the distance from the image farthest from the center coordinate among the images belonging to the cluster.

インデクシング部７０は、図２の特徴ベクトル生成処理に基づいて検索対象画像ＤＢ９０に記憶された画像データについての特徴ベクトルを生成して、この生成した特徴ベクトルを画像データのインデックスとしてインデックスＤＢ７５に対応付けて記憶する。インデクシング部７０は、部分画像抽出部７２と、クラスタリング部７４と、マッピング部７６とを備えて構成され、これらが協働することにより、後述する特徴ベクトル生成処理を実現する。 The indexing unit 70 generates a feature vector for the image data stored in the search target image DB 90 based on the feature vector generation process of FIG. 2, and associates the generated feature vector with the index DB 75 as an index of the image data. Remember. The indexing unit 70 includes a partial image extraction unit 72, a clustering unit 74, and a mapping unit 76, and implements a feature vector generation process to be described later by cooperating with each other.

また、インデクシング部７０は、画像データから検出した部分画像に領域ＩＤを割り振り、その部分画像をマッピングしたビジュアルキーワードのＶＫＩＤを画像ＩＤと領域ＩＤとに対応付けて領域管理ＤＢ８０に記憶する。この領域ＩＤは、画像内でのＸＹ座標であってもよいし、領域分割した際の行番号・列番号であってもよい。 Further, the indexing unit 70 assigns a region ID to the partial image detected from the image data, and stores the VKID of the visual keyword mapping the partial image in the region management DB 80 in association with the image ID and the region ID. The area ID may be an XY coordinate in the image, or may be a row number / column number when the area is divided.

インデックスＤＢ７５は、検索対象画像ＤＢ９０に記憶された画像データの画像ＩＤと、この画像データから生成した特徴ベクトル（ビジュアルキーワード毎の部分画像の出現頻度）とを対応付けて記憶するデータベースである。 The index DB 75 is a database that stores an image ID of image data stored in the search target image DB 90 and a feature vector (frequency of appearance of partial images for each visual keyword) generated from the image data in association with each other.

領域管理ＤＢ８０は、検索対象画像内の部分画像をマッピングしたビジュアルキーワードの対応関係を管理するデータベースであり、図１に示すように、検索対象画像の画像ＩＤと、領域ＩＤと、ＶＫＩＤとを対応付けて記憶する。 The area management DB 80 is a database that manages the correspondence between visual keywords that map partial images in a search target image, and corresponds to the image ID, area ID, and VKID of the search target image as shown in FIG. Add and remember.

検索対象画像ＤＢ９０は、類似画像の検索対象としてインターネット上から収集した画像データ（「検索対象画像」という）を蓄積記憶するデータベースであって、図１に示すように、画像ＩＤと、画像データとを対応付けて記憶する。画像ＩＤは、各画像データを固有に識別するための識別情報であって、キーワード及び画像データを記憶する際に、割り振られる。 The search target image DB 90 is a database for accumulating and storing image data collected from the Internet as a search target for similar images (referred to as “search target image”). As shown in FIG. 1, an image ID, image data, Are stored in association with each other. The image ID is identification information for uniquely identifying each image data, and is assigned when a keyword and image data are stored.

〔特徴ベクトル生成処理〕
次に、特徴ベクトル生成処理について、図２のフローチャートと、図３〜図４の概念図とを参照しながら説明する。特徴ベクトル生成処理は、特徴ベクトル生成部２０がクエリ画像に対して、インデクシング部７０が検索対象画像に対して行うが、以下の説明では、特徴ベクトル生成部２０が行う場合を取り上げて説明する。 [Feature vector generation processing]
Next, the feature vector generation process will be described with reference to the flowchart of FIG. 2 and the conceptual diagrams of FIGS. The feature vector generation process is performed on the query image by the feature vector generation unit 20 and the search target image by the indexing unit 70. In the following description, the case of the feature vector generation unit 20 will be described.

先ず、部分画像抽出部２２が、クエリ画像の画像データから複数の部分画像を抽出する（ステップＳ１１）。この部分画像の抽出方法としては、画像中の特徴的な領域（特徴領域）を抽出する手法と、画像を所定領域で分割することで抽出する手法とがある。 First, the partial image extraction unit 22 extracts a plurality of partial images from the image data of the query image (step S11). As a method for extracting the partial image, there are a method for extracting a characteristic region (feature region) in the image and a method for extracting the image by dividing the image into predetermined regions.

特徴領域を検出する手法としては、
・Ｈａｒｒｉｓ−ａｆｆｉｎｅ
・Ｈｅｓｓｉａｎ−ａｆｆｉｎｅ
・Ｍａｘｉｍａｌｌｙｓｔａｂｌｅｅｘｔｒｅｍａｌｒｅｇｉｏｎｓ（ＭＳＥＲ）
・ＤｉｆｆｅｒｅｎｃｅｏｆＧａｕｓｓｉａｎｓ（ＤｏＧ）
・ＬａｐｌａｃｉａｎｏｆＧａｕｓｓｉａｎ（ＬｏＧ）
・ＤｅｔｅｒｍｉｎａｎｔｏｆＨｅｓｓｉａｎ（ＤｏＨ）
等がある。 As a technique for detecting feature regions,
・ Harris-affine
・ Hessian-affine
・ Maximally stable extremal regions (MSER)
・ Difference of Gaussians (DoG)
・ Laplacian of Gaussian (LoG)
・ Determinant of Hessian (DoH)
Etc.

また、特徴領域の検出技術については、“Local Invariant Feature Detectors: A Survey”（Foundations and Trends in Computer Graphics and Vision,Vol.3,No.3,pp.177-280,2007.）等において公開されており、適宜公知技術を採用可能である。 The feature region detection technology is published in “Local Invariant Feature Detectors: A Survey” (Foundations and Trends in Computer Graphics and Vision, Vol.3, No.3, pp.177-280, 2007.). Any known technique can be adopted as appropriate.

また、画像を所定領域で分割して検出する手法としては、例えば、予め定めたＭ×Ｎブロックに分割したり、分割後のブロックの大きさが予め定めたｍ×ｎ画素となるように分割したりする手法がある。例えば、画像を１０×１０のブロックに分割する場合、画像の大きさが６４０×４８０画素であれば、１ブロックの大きさは６４×４８画素となる。 In addition, as a method of detecting an image by dividing it into predetermined areas, for example, the image is divided into predetermined M × N blocks, or divided so that the size of the divided blocks becomes predetermined m × n pixels. There is a technique to do. For example, when an image is divided into 10 × 10 blocks, if the size of the image is 640 × 480 pixels, the size of one block is 64 × 48 pixels.

次に、抽出した部分画像が有する特徴量を算出する（ステップＳ１２）。尚、特徴領域を抽出している場合には、スケール変化や回転、角度変化等のアフィン変換に耐性を持つ局所特徴量を抽出する。局所特徴量の一例としては、例えば次のものが挙げられる。 Next, the feature amount of the extracted partial image is calculated (step S12). When the feature region is extracted, a local feature amount resistant to affine transformation such as scale change, rotation, and angle change is extracted. Examples of the local feature amount include the following.

・ＳＩＦＴ
・ｇｒａｄｉｅｎｔｌｏｃａｔｉｏｎａｎｄｏｒｉｅｎｔａｔｉｏｎｈｉｓｔｏｇｒａｍ
・ｓｈａｐｅｃｏｎｔｅｘｔ
・ＰＣＡ−ＳＩＦＴ
・ｓｐｉｎｉｍａｇｅｓ
・ｓｔｅｅｒａｂｌｅｆｉｌｔｅｒｓ
・ｄｉｆｆｅｒｅｎｔｉａｌｉｎｖａｒｉａｎｔｓ
・ｃｏｍｐｌｅｘｆｉｌｔｅｒｓ
・ｍｏｍｅｎｔｉｎｖａｒｉａｎｔｓ・ SIFT
・ Gradient location and orientation histogram
・ Shape context
・ PCA-SIFT
・ Spin images
・ Steerable filters
・ Differential inverters
・ Complex filters
・ Moment inviteants

局所特徴量の抽出については、“A performance evaluation of local descriptors”（IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No.10,pp.1615-1630,2005.）等において公開されており、適宜公知技術を採用可能である。 The extraction of local features is published in “A performance evaluation of local descriptors” (IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.27, No.10, pp.1615-1630, 2005.) A known technique can be adopted as appropriate.

この特徴領域から抽出した特徴量に基づいて生成した特徴ベクトルは、オブジェクト（物体）の存在する可能性の高い特徴領域から生成されるため、画像中のオブジェクトの特徴を示す指標として有効である。 Since the feature vector generated based on the feature amount extracted from the feature region is generated from the feature region where the object (object) is highly likely to exist, it is effective as an index indicating the feature of the object in the image.

また、領域分割により部分画像を抽出している場合には、画像の配色やテクスチャ、形状等の各画像の特徴を数値化して表現した画像特徴量を用いる。この領域分割により検出した領域画像の特徴量から生成した特徴ベクトルは、画像を構成する各部分から生成されるため、画像の全体的な構成を示す指標として有効である。 Further, when partial images are extracted by area division, image feature amounts expressed by quantifying the features of each image such as the color scheme, texture, and shape of the image are used. Since the feature vector generated from the feature amount of the region image detected by the region division is generated from each part constituting the image, it is effective as an index indicating the overall configuration of the image.

そして、クラスタリング部２４が、画像データから抽出した複数の部分画像を、該部分画像の特徴量に基づいてクラスタリングする（ステップＳ１３）。このクラスタリングにより形成されるクラスタを適宜「対象クラスタ」という。クラスタリングの標準的な手法としては、k-means, Hierarchical Agglomerative Clustering(HAC)などが用いられる。また、クラスタ数Ｍは、部分画像の数の１／５（Ｌ＝５）〜１／１０（Ｌ＝１０）程度に設定する。 Then, the clustering unit 24 clusters a plurality of partial images extracted from the image data based on the feature amounts of the partial images (Step S13). A cluster formed by this clustering is appropriately referred to as a “target cluster”. As a standard method of clustering, k-means, Hierarchical Agglomerative Clustering (HAC), etc. are used. The number of clusters M is set to about 1/5 (L = 5) to 1/10 (L = 10) of the number of partial images.

例えば、部分画像抽出部２２は、図３に示すように画像Ｇ１からＫ個の部分画像を抽出したとする。この部分画像をクラスタリング部２４は、部分画像の特徴量に基づいてクラスタリングすることにより、図３のようにＭ個の対象クラスタを形成する。各対象クラスタには、類似する特徴量を有する部分画像が纏め上げられることになる。 For example, it is assumed that the partial image extraction unit 22 extracts K partial images from the image G1 as shown in FIG. The clustering unit 24 clusters the partial images based on the feature amounts of the partial images, thereby forming M target clusters as shown in FIG. In each target cluster, partial images having similar feature amounts are collected.

次に、マッピング部２６は、ステップＳ１３のクラスタリングにより形成された対象クラスタの中から１つを選択し（ステップＳ１４）、その対象クラスタと、各ビジュアルキーワード（ビジュアルキーワード生成部６０により予め形成された基準クラスタ）との間の距離を算出する（ステップＳ１５）。この距離は、特徴量空間上におけるクラスタの中心座標と、ビジュアルキーワードの中心座標との間の距離によって求められる。クラスタの中心座標に用いる値としては、クラスタに属するデータ（特徴量）の平均値としてのcentroidの他に、中心に最も近いデータを採用するmedoidや、データを昇順に並べたときに中央に位置するデータを採用するmedian等を適用することができる。 Next, the mapping unit 26 selects one of the target clusters formed by the clustering in step S13 (step S14), and the target cluster and each visual keyword (previously formed by the visual keyword generating unit 60). The distance to the reference cluster is calculated (step S15). This distance is obtained by the distance between the center coordinates of the cluster in the feature amount space and the center coordinates of the visual keyword. As the value used for the center coordinates of the cluster, in addition to the centroid as the average value of the data (features) belonging to the cluster, the medoid that adopts the data closest to the center, or the center position when the data are arranged in ascending order Median that employs data to be applied can be applied.

そして、算出した距離が最も近いビジュアルキーワードに、選択した対象クラスタ内の部分画像をマッピング（分類）する（ステップＳ１６）。この際、マッピング部２６は、インデックスＤＢ７５の特徴ベクトルの各ビジュアルキーワードのスカラ値にその部分画像の数を加算する。また、領域管理ＤＢ８０の部分画像の領域ＩＤに、該部分画像をマッピングしたビジュアルキーワードのＶＫＩＤの割り当てを行い、各々を対応付けて記憶する。 Then, the partial image in the selected target cluster is mapped (classified) to the visual keyword having the closest calculated distance (step S16). At this time, the mapping unit 26 adds the number of partial images to the scalar value of each visual keyword of the feature vector of the index DB 75. Further, the VKID of the visual keyword that maps the partial image is assigned to the region ID of the partial image in the region management DB 80, and each is associated and stored.

例えば、図３において、クラスタ＃１と、ビジュアルキーワードＶＫ１〜ＶＫＮとの間の距離を算出した結果、ビジュアルキーワードＶＫ３が最も近いと判定されたとする。この場合は、クラスタ＃１に属する部分画像Ｇ１１〜Ｇ１５がビジュアルキーワードＶＫ３にマッピングされることとなる。 For example, in FIG. 3, it is assumed that as a result of calculating the distance between the cluster # 1 and the visual keywords VK1 to VKN, it is determined that the visual keyword VK3 is the closest. In this case, the partial images G11 to G15 belonging to the cluster # 1 are mapped to the visual keyword VK3.

次に、マッピング部２６は、ステップＳ１４において全ての対象クラスタを選択したか否かを判定し（ステップＳ１７）、未選択の対象クラスタがある場合には（ステップＳ１７；Ｎｏ）、ステップＳ１４に処理を移行して、ステップＳ１６までの処理を繰り返す。 Next, the mapping unit 26 determines whether or not all target clusters have been selected in step S14 (step S17). If there is an unselected target cluster (step S17; No), the process proceeds to step S14. And the process up to step S16 is repeated.

また、全ての対象クラスタを選択したと判定した場合には（ステップＳ１７；Ｙｅｓ）、各部分画像と、ステップＳ１６でマッピングが行われたビジュアルキーワードとの間の距離を算出して、その距離に基づいて部分画像の再マッピングを行う。 If it is determined that all target clusters have been selected (step S17; Yes), the distance between each partial image and the visual keyword mapped in step S16 is calculated, and the distance is calculated. Based on this, the partial image is remapped.

即ち、部分画像を１つ選択して（ステップＳ１８）、その部分画像と、部分画像をマッピングしているビジュアルキーワードとの間の距離を算出する（ステップＳ１９）。この距離算出は、選択した部分画像をマッピングしたビジュアルキーワードだけでなく、ステップＳ１６においてマッピングが行われた全てのビジュアルキーワードに対して行う。また、ビジュアルキーワードとの間の距離は、部分画像の有する特徴量の特徴量空間上での座標と、ビジュアルキーワードの中心座標との間の距離により求められる。 That is, one partial image is selected (step S18), and the distance between the partial image and the visual keyword mapping the partial image is calculated (step S19). This distance calculation is performed not only for the visual keywords to which the selected partial image is mapped, but also for all the visual keywords that have been mapped in step S16. The distance between the visual keyword and the visual keyword is obtained from the distance between the coordinate of the feature amount of the partial image on the feature amount space and the central coordinate of the visual keyword.

そして、選択した部分画像との間の距離が最も近いビジュアルキーワードを選定して、部分画像にそのビジュアルキーワードのＶＫＩＤを割り当てることで、該ビジュアルキーワードに再マッピングする（ステップＳ２０）。 Then, the visual keyword having the shortest distance from the selected partial image is selected, and the VKID of the visual keyword is assigned to the partial image, thereby remapping to the visual keyword (step S20).

例えば、ステップＳ１６において、図４に示すように、クラスタ＃１に属する部分画像Ｇ１１〜Ｇ１５をビジュアルキーワードＶＫ３にマッピングしたとする。この際、部分画像Ｇ１５と、各ビジュアルキーワードとの間の距離を算出した結果、ビジュアルキーワードＶＫ５との距離が最も近いと判定され、部分画像Ｇ１５は、ビジュアルキーワードＶＫ３からビジュアルキーワードＶＫ５に再マッピングされることとなる。従って、クラスタとして一纏まりで部分画像がマッピングされたとして、適切なビジュアルキーワードに再マッピングすることができる。 For example, in step S16, it is assumed that the partial images G11 to G15 belonging to the cluster # 1 are mapped to the visual keyword VK3 as shown in FIG. At this time, as a result of calculating the distance between the partial image G15 and each visual keyword, it is determined that the distance from the visual keyword VK5 is the closest, and the partial image G15 is remapped from the visual keyword VK3 to the visual keyword VK5. The Rukoto. Therefore, if partial images are mapped as a cluster, they can be remapped to appropriate visual keywords.

マッピング部２６は、ステップＳ１８において全ての部分画像を選択したか否かを判定して（ステップＳ２１）、未選択のものがあると判定した場合には（ステップＳ２１；Ｎｏ）、ステップＳ１８に処理を移行する。 The mapping unit 26 determines whether or not all partial images have been selected in step S18 (step S21), and if it is determined that there is an unselected one (step S21; No), the process proceeds to step S18. To migrate.

また、全ての部分画像を選択したと判定した場合には（ステップＳ２１；Ｙｅｓ）、マッピングしたビジュアルキーワードに基づいて特徴ベクトルを生成する（ステップＳ２２）。即ち、マッピングした結果のビジュアルキーワード毎での部分画像の出現頻度により多次元で表される特徴ベクトルを生成する。これにより、図３のように画像Ｇ１についての特徴ベクトルＶが生成される。 If it is determined that all partial images have been selected (step S21; Yes), a feature vector is generated based on the mapped visual keyword (step S22). That is, a feature vector expressed in multi-dimensions is generated according to the appearance frequency of partial images for each visual keyword as a result of mapping. As a result, a feature vector V for the image G1 is generated as shown in FIG.

特徴ベクトル生成部２０は、上記の処理により生成した特徴ベクトルを類似度算出部３０に出力すると、類似度算出部３０が、検索対象画像の特徴ベクトルとの間の類似度を算出する。また、インデクシング部７０は、上記の処理により生成した特徴ベクトルを、検索対象画像の画像ＩＤに対応付けてインデックスＤＢ７５に対応付けて記憶する。 When the feature vector generation unit 20 outputs the feature vector generated by the above processing to the similarity calculation unit 30, the similarity calculation unit 30 calculates the similarity with the feature vector of the search target image. The indexing unit 70 stores the feature vector generated by the above processing in association with the index DB 75 in association with the image ID of the search target image.

以上のように、本実施形態によれば、画像から抽出した部分画像をクラスタリングすることにより、類似する部分画像を纏め上げて、クラスタ単位でビジュアルキーワードにマッピングするため、ビジュアルキーワードを用いた特徴ベクトル生成のための計算コストを低減することができる。 As described above, according to the present embodiment, by clustering partial images extracted from images, similar partial images are collected and mapped to visual keywords in units of clusters. Therefore, feature vectors using visual keywords are used. The calculation cost for generation can be reduced.

より具体的には、抽出した部分画像の数がＫ個、ビジュアルキーワードの数がＮ個とすると、従来の各部分画像とビジュアルキーワードとの間の距離に基づいてマッピング手法であれば、Ｋ×Ｎの計算コストがかかる。これに対し、本実施形態の手法であれば、Ｋ個の部分画像をＭ個のクラスタにクラスタリングする計算コストＫ×Ｍと、Ｍ個のクラスタとＮ個のビジュアルキーワードとの距離算出によりマッピングする計算コストＭ×Ｎとの合計となる。 More specifically, assuming that the number of extracted partial images is K and the number of visual keywords is N, if the mapping method is based on the distance between each conventional partial image and the visual keyword, K × N calculation cost is required. On the other hand, according to the method of the present embodiment, mapping is performed by calculating K × M for clustering K partial images into M clusters, and calculating the distance between M clusters and N visual keywords. This is the sum of the calculation costs M × N.

例えば、一枚の画像から抽出される部分画像が1,000個であり、クラスタ数が200個（部分画像の数の１／５）であり、ビジュアルキーワードが450,000個であるとする。この場合、従来のマッピング手法であれば、計算コストは、450,000,000（＝1,000×450,000）となり、本実施形態の手法であれば、計算コストは、90,200,000（＝1,000×200＋200×450,000）となる。このように、本実施形態の手法であれば、従来の手法に対して、計算コストを約80％削減することができる。 For example, assume that the number of partial images extracted from one image is 1,000, the number of clusters is 200 (1/5 of the number of partial images), and the number of visual keywords is 450,000. In this case, the calculation cost is 450,000,000 (= 1,000 × 450,000) in the case of the conventional mapping method, and the calculation cost is 90,200,000 (= 1,000 × 200 + 200 × 450,000) in the method of the present embodiment. Thus, with the method of the present embodiment, the calculation cost can be reduced by about 80% compared to the conventional method.

また、ステップＳ１８〜Ｓ２１の処理により一旦マッピングした部分画像を再マッピングするために距離算出を行っているが、この距離の算出は、全てのビジュアルキーワードではなく、ステップＳ１６でマッピングされたビジュアルキーワードとしている。即ち、予め生成しておいたＮ個のビジュアルキーワードのうち、部分画像がマッピングされたビジュアルキーワードはＱ個（≦Ｎ）となるから、このＱ個のビジュアルキーワードとの間の距離算出を行えばよく、この計算コストはＫ×Ｑとなる。従って、特徴ベクトルの精度向上のための計算コストも抑えることができる。 In addition, distance calculation is performed in order to remap the partial image that has been mapped once by the processing in steps S18 to S21. However, the calculation of the distance is not performed for all visual keywords but for the visual keywords mapped in step S16. Yes. That is, among the N visual keywords generated in advance, the number of visual keywords to which the partial images are mapped is Q (≦ N). Therefore, if the distance between the Q visual keywords is calculated, This calculation cost is often K × Q. Therefore, the calculation cost for improving the accuracy of the feature vector can be suppressed.

尚、上述の実施形態は本発明を適用した一例であって、その適用可能な範囲は上述に限られない。例えば、本実施形態において、ステップＳ１６のマッピングの後に、部分画像とビジュアルキーワードとの距離に基づいて再マッピングを行うこととして説明したが、この処理を行わずにマッピングを完了させることとしてもよい。但し、上述したように、再マッピングすることで、生成される特徴ベクトルの精度が向上するのは無論である。 The above-described embodiment is an example to which the present invention is applied, and the applicable range is not limited to the above. For example, in the present embodiment, it has been described that the remapping is performed based on the distance between the partial image and the visual keyword after the mapping in step S16, but the mapping may be completed without performing this process. However, as described above, it goes without saying that the accuracy of the generated feature vector is improved by remapping.

また、上述の実施形態では、クラスタ内の部分画像を、該クラスタと最も距離の近いビジュアルキーワードにマッピングすることとして説明したが、例えば、クラスタ内でも該クラスタの中心座標と距離が一定値以上離れている部分画像については、個別にビジュアルキーワードとの距離を測ってマッピングすることとしてもよい。 In the above-described embodiment, the partial image in the cluster has been described as being mapped to the visual keyword having the closest distance to the cluster. However, for example, the center coordinate and the distance of the cluster are separated by a certain value or more in the cluster. The partial images may be mapped by measuring the distance from the visual keyword individually.

即ち、クラスタ距離算出手段としてのクラスタリング部２４（７４）は、クラスタリングの際に、クラスタの中心座標と、部分画像との距離を保持しておき、その距離が一定値以上離れている場合には、クラスタ内の部分画像をビジュアルキーワードにマッピングするステップＳ１６の処理の対象から外す。そして、マッピング部２６（７６）は、そのマッピング対象から外した部分画像については、個々にビジュアルキーワードとの間の距離を算出して、最も近いビジュアルキーワードに対してマッピングを行う。 That is, the clustering unit 24 (74) serving as the cluster distance calculating means retains the distance between the center coordinates of the cluster and the partial image during clustering, and when the distance is a predetermined value or more, Then, the partial images in the cluster are excluded from the processing target in step S16 for mapping to the visual keyword. Then, the mapping unit 26 (76) calculates the distance between the visual keywords individually for the partial images excluded from the mapping target, and performs mapping on the closest visual keyword.

これにより、クラスタリングされた部分画像の大部分を、クラスタとビジュアルキーワードとの距離に基づいてマッピングすることができるので、マッピングの計算コストを低減させつつ、特徴ベクトルの精度も向上させることができる。 As a result, since most of the clustered partial images can be mapped based on the distance between the cluster and the visual keyword, the accuracy of the feature vector can be improved while reducing the calculation cost of mapping.

また、ステップＳ１８〜Ｓ２１において、画像から抽出した全ての部分画像についてビジュアルキーワードとの距離を算出することしたが、例えば、ステップＳ１６においてマッピングした部分画像のうち、そのマッピング先のビジュアルキーワードの中心との距離が所定値以上の部分画像を選択して距離算出及び再マッピングすることとしてもよい。 In steps S18 to S21, the distance from the visual keyword is calculated for all partial images extracted from the image. For example, among the partial images mapped in step S16, the center of the mapping-target visual keyword is calculated. It is also possible to select a partial image whose distance is greater than or equal to a predetermined value, and calculate and remap the distance.

具体的には、図４において、部分画像Ｇ１１〜Ｇ１５と、ビジュアルキーワードＶＫ３の中心Ｃ３との距離を算出して、その距離が所定値以上となった部分画像Ｇ１５について他のビジュアルキーワードとの距離を算出して、再マッピングする。従って、再マッピングする対象の部分画像を絞り込むことができる。 Specifically, in FIG. 4, the distance between the partial images G11 to G15 and the center C3 of the visual keyword VK3 is calculated, and the distance between the partial image G15 whose distance is equal to or greater than a predetermined value and other visual keywords. Is calculated and remapped. Therefore, it is possible to narrow down the partial images to be remapped.

また、テキスト検索における単語の重み付け手法であるＴＦ／ＩＤＦ（term frequency-inverse document frequency）により、特徴ベクトルに重み付けを行うこととしてもよい。 Further, the feature vectors may be weighted by TF / IDF (term frequency-inverse document frequency) which is a word weighting method in text search.

ＴＦ／ＩＤＦに関する参考資料としては、
C.D.Manning, P.Raghavan and H.Schutze:" Introduction to Information Retrieval",Cambridge University Press.2008.
が知られている。 For reference materials on TF / IDF,
CDManning, P. Raghavan and H. Schutze: "Introduction to Information Retrieval", Cambridge University Press. 2008.
It has been known.

ＴＦ／ＩＤＦは、文章中の特徴的な単語を抽出するためのアルゴリズムであり、単語の出現頻度であるＴＦと、逆出現頻度であるＩＤＦとの二つの指標により算出される。具体的には、次式により求められる。
ＴＦ／ＩＤＦ＝ＴＦ（ｉ,ｊ）／Ｔ（ｉ）＊ＩＤＦ（ｊ）
ＩＤＦ（ｉ）＝ｌｏｇ（Ｎ／ＤＦ（ｉ）） TF / IDF is an algorithm for extracting a characteristic word in a sentence, and is calculated by two indexes, ie, TF that is the appearance frequency of the word and IDF that is the reverse appearance frequency. Specifically, it is calculated | required by following Formula.
TF / IDF = TF (i, j) / T (i) * IDF (j)
IDF (i) = log (N / DF (i))

ここで、
ＴＦ（ｉ，ｊ）は、キーワード抽出対象のドキュメントｉ中でのキーワードｊの出現数
Ｔ（ｉ）は、ドキュメントｉ中の全ての単語の数
Ｎは、全てのドキュメント数
ＤＦ（ｊ）は、キーワードｊが含まれるドキュメントの数
である。 here,
TF (i, j) is the number of occurrences of keyword j in document i to be extracted, T (i) is the number of all words in document i N is the number of all documents DF (j) is The number of documents containing the keyword j.

これを、ドキュメントを画像、単語を同一のビジュアルキーワードに属する部分画像として捉え、各画像のビジュアルキーワード毎にＴＦ／ＩＤＦ値を求めて、このＴＦ／ＩＤＦ値を、部分画像をマッピングしたビジュアルキーワードに加算することで、特徴ベクトルを生成する。 This is regarded as a partial image belonging to the same visual keyword with a document as an image and a word, and a TF / IDF value is obtained for each visual keyword of each image, and this TF / IDF value is converted into a visual keyword mapping the partial image. A feature vector is generated by addition.

このとき、画像ＩＤをｉ、各ビジュアルキーワードｋとして、各ビジュアルキーワードの重み値であるＴＦ／ＩＤＦ（ｉ,ｋ）は以下の式により算出する。 At this time, assuming that the image ID is i and each visual keyword k, TF / IDF (i, k) that is a weight value of each visual keyword is calculated by the following equation.

ＴＦ／ＩＤＦ（ｉ,ｋ）＝ＴＦ（ｉ,ｋ）／Ｔ（ｉ）＊ＩＤＦ（ｋ）
ＩＤＦ（ｋ）＝ｌｏｇ（Ｎ／ＤＦ（ｋ）） TF / IDF (i, k) = TF (i, k) / T (i) * IDF (k)
IDF (k) = log (N / DF (k))

尚、ＴＦ（ｉ,ｋ）は、画像ｉから抽出した部分画像がビジュアルキーワードｋで出現する数に重み付けを行ったものであり、各ビジュアルキーワードｋ内に属する（出現する）部分画像と、ビジュアルキーワードｋの中心点との距離に基づく上述した重み値（０〜１）となる。 TF (i, k) is obtained by weighting the number of partial images extracted from the image i that appear in the visual keyword k, and the partial images belonging to (appearing in) each visual keyword k and visual The weight value (0 to 1) described above based on the distance from the center point of the keyword k is obtained.

また、Ｔ（ｉ）は、画像ｉから抽出した部分画像の総数に、ビジュアルキーワードとの距離に基づく重み付けをした値であり、画像ｉから抽出した各部分画像が属するクラスタとの距離に基づいた重み値を合計したものである。 T (i) is a value obtained by weighting the total number of partial images extracted from the image i based on the distance from the visual keyword, and is based on the distance from the cluster to which each partial image extracted from the image i belongs. This is the sum of the weight values.

また、ＤＦ（ｋ）は、各ビジュアルキーワードｋに分類した部分画像が、各ビジュアルキーワードｋに出現する数に、ビジュアルキーワードとの距離に基づく重み付けを行った値である。また、Ｎは、検索対象画像ＤＢ９０の画像総数である。 DF (k) is a value obtained by weighting the number of partial images classified into each visual keyword k appearing in each visual keyword k based on the distance from the visual keyword. N is the total number of images in the search target image DB 90.

このように、ＴＦ／ＩＤＦにおけるドキュメントを画像とみなし、ドキュメント内の単語を同一のビジュアルキーワードに属する部分画像とみなして重み付けを行うことで、各画像に出現する部分画像の重要度を下げ、特定の画像に際立って出現する特徴的な部分画像についての重要度を上げるように特徴ベクトルのスカラ値に重み付けを行うことができる。 In this way, the document in TF / IDF is regarded as an image, and the words in the document are regarded as partial images belonging to the same visual keyword, and weighting is performed, so that the importance of partial images appearing in each image is reduced and specified. The scalar value of the feature vector can be weighted so as to increase the importance of the characteristic partial image that appears conspicuously in the image.

このＴＦ／ＩＤＦによる重み付けを用いて、クエリ画像内の部分画像が属するビジュアルキーワードと、検索対象画像内の部分画像が属するビジュアルキーワードとの類似スコアを求めてもよい。 Using the weighting by TF / IDF, a similarity score between the visual keyword to which the partial image in the query image belongs and the visual keyword to which the partial image in the search target image belongs may be obtained.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１画像検索装置
１０クエリ画像受付部
２０特徴ベクトル生成部
２２部分画像抽出部
２４クラスタリング部
２６マッピング部
３０類似度算出部
５０検索結果出力部
６０ビジュアルキーワード生成部
７０インデクシング部
７２部分画像抽出部
７４クラスタリング部
７６マッピング部
６５ビジュアルキーワードＤＢ
７０検索対象画像ＤＢ
７５インデックスＤＢ
８０領域管理ＤＢ
９０検索対象画像ＤＢ DESCRIPTION OF SYMBOLS 1 Image search device 10 Query image reception part 20 Feature vector generation part 22 Partial image extraction part 24 Clustering part 26 Mapping part 30 Similarity degree calculation part 50 Search result output part 60 Visual keyword generation part 70 Indexing part 72 Partial image extraction part 74 Clustering Part 76 Mapping part 65 Visual keyword DB
70 Search target image DB
75 Index DB
80 Area management DB
90 Search target image DB

Claims

画像内から抽出した複数の部分画像を複数の基準クラスタの何れかに分類して、各基準クラスタへの分類数に基づく特徴ベクトルを生成する特徴ベクトル生成装置において、
前記抽出された複数の部分画像を、該部分画像の特徴量に基づいてクラスタリングするクラスタリング手段と、
前記クラスタリングによって形成されたクラスタを対象クラスタとして、該対象クラスタと前記各基準クラスタとの間の距離を算出する距離算出手段と、
前記対象クラスタに属する部分画像を、該対象クラスタとの前記距離が最も近い基準クラスタに分類する分類手段と、
を備えることを特徴とする特徴ベクトル生成装置。 In a feature vector generation device that classifies a plurality of partial images extracted from within an image into any of a plurality of reference clusters, and generates a feature vector based on the number of classifications into each reference cluster.
Clustering means for clustering the extracted plurality of partial images based on feature amounts of the partial images;
A distance calculation means for calculating a distance between the target cluster and each reference cluster, with the cluster formed by the clustering as a target cluster;
Classifying means for classifying the partial images belonging to the target cluster into reference clusters having the closest distance to the target cluster;
A feature vector generation device comprising:

前記分類手段は、
前記個々の部分画像と、前記画像内から抽出した部分画像の分類が行われた基準クラスタとの間の距離を算出し、この距離が最も近い基準クラスタに分類先を変更する
ことを特徴とする請求項１に記載の特徴ベクトル生成装置。 The classification means includes
Calculating a distance between the individual partial image and a reference cluster on which the partial image extracted from the image is classified, and changing the classification destination to the reference cluster having the closest distance. The feature vector generation device according to claim 1.

前記分類が行われた基準クラスタと、該基準クラスタに分類された前記部分画像との間の距離を算出するクラスタ距離算出手段を更に備え、
前記分類手段は、
前記クラスタ距離算出手段により算出された距離が所定値以上の部分画像を選定し、その選定した部分画像と該部分画像の分類が行われた基準クラスタとの間の距離を算出して、分類先の変更を行う
ことを特徴とする請求項２に記載の特徴ベクトル生成装置。 A cluster distance calculating means for calculating a distance between the reference cluster in which the classification is performed and the partial image classified in the reference cluster;
The classification means includes
Select a partial image whose distance calculated by the cluster distance calculation means is a predetermined value or more, calculate a distance between the selected partial image and a reference cluster on which the partial image is classified, The feature vector generation device according to claim 2, wherein

コンピュータが、画像内から抽出した複数の部分画像を複数の基準クラスタの何れかに分類して、各基準クラスタへの分類数に基づく特徴ベクトルを生成する特徴ベクトル生成方法において、
前記抽出された複数の部分画像を、該部分画像の特徴量に基づいてクラスタリングするクラスタリング工程と、
前記クラスタリングによって形成されたクラスタを対象クラスタとして、該対象クラスタと前記各基準クラスタとの間の距離を算出する距離算出工程と、
前記対象クラスタに属する部分画像を、該対象クラスタとの前記距離が最も近い基準クラスタに分類する分類工程と、
を前記コンピュータが実行することを特徴とする特徴ベクトル生成方法。 In a feature vector generation method in which a computer classifies a plurality of partial images extracted from an image into any of a plurality of reference clusters, and generates a feature vector based on the number of classifications to each reference cluster.
A clustering step of clustering the extracted plurality of partial images based on feature amounts of the partial images;
A distance calculation step of calculating a distance between the target cluster and each of the reference clusters using the cluster formed by the clustering as a target cluster;
A classification step of classifying the partial images belonging to the target cluster into a reference cluster having the closest distance to the target cluster;
Is executed by the computer.

請求項４に記載の特徴ベクトル生成方法をコンピュータに実行させるためのプログラム。 A program for causing a computer to execute the feature vector generation method according to claim 4.