JP2017102869A

JP2017102869A - Importance calculation device, method, and program

Info

Publication number: JP2017102869A
Application number: JP2015238006A
Authority: JP
Inventors: 稔森; Minoru Mori; 小萌武; Xiaomeng Wu; 邦夫柏野; Kunio Kashino
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-12-04
Filing date: 2015-12-04
Publication date: 2017-06-08
Anticipated expiration: 2035-12-04
Also published as: JP6427480B2

Abstract

PROBLEM TO BE SOLVED: To provide an importance calculation device capable of calculating an appropriate weight even when there are a large number of features to be extracted.SOLUTION: The importance calculation device includes: a feature extraction part 24 that extracts each of features from a plurality of learning data; and a feature importance calculation part 26 that calculates an ITFF (Inverse Total Feature Frequency), which is a value obtained by taking the logarithm of a value obtained by dividing the total number of all features extracted by the number of extracted features as a weight of the feature for each of the features.SELECTED DRAWING: Figure 1

Description

本発明は、抽出された特徴の重要度を算出するための重要度算出装置、方法、及びプログラムに関するものである。 The present invention relates to an importance calculation device, method, and program for calculating the importance of extracted features.

従来、IDF（Inverse Document Frequency）と呼ばれる重み算出方法がよく用いられる。IDFは事前に取得された対象（画像や文書など）から特徴を抽出し、各特徴が含まれる対象の数（画像であれば事前に取得された画像郡の中で着目特徴を含む画像数、文書であれば着目単語を含む文書数）を算出する。k番目の特徴のIDFであるidf(v_k)は、全体の数量Nを算出した数d_kで除算しlogをとった下記（１）式で求まる値として算出される。ここで、d_kは、対象となるｋ番目の特徴を含んでいる対象の数である。 Conventionally, a weight calculation method called IDF (Inverse Document Frequency) is often used. IDF extracts features from previously acquired objects (images, documents, etc.), and the number of objects that contain each feature (if images, the number of images that include the feature of interest in the previously acquired image group, If it is a document, the number of documents including the word of interest) is calculated. idf (v _k ), which is the IDF of the k-th feature, is calculated as a value obtained by the following equation (1) obtained by dividing log by dividing the total quantity N by the calculated number d _k . Here, d _k is the number of objects including the k-th feature as an object.

idf(v_k)の値により、より少ない対象にしか含まれない特徴（d_kが小さい特徴）にはより大きな重みが、より多くの対象に含まれる特徴には小さい重みが与えられることになる。idf(v_k)の値を類似値の算出に反映させることにより、少ない出現頻度の特徴同士が比較対象間に存在していれば、類似値がより大きくなり類似していると判定されやすく、逆に多くの画像から抽出されやすく、出現頻度が高い特徴同士に対しては小さい重みが与えられるため、類似値への影響は小さくなる。そのため、何も重みを与えない場合と比較し、より精度の高い類似性の比較が可能となる。 Depending on the value of idf (v _k ), features with fewer objects (features with low d _k ) are given higher weights, and features with more subjects are given smaller weights. . By reflecting the value of idf (v _k ) in the calculation of the similarity value, if features with a low appearance frequency exist between the comparison targets, the similarity value becomes larger and it is easy to determine that they are similar, On the other hand, since it is easy to extract from many images and a small weight is given to features having a high appearance frequency, the influence on the similarity value is small. Therefore, it is possible to compare similarities with higher accuracy than when no weight is given.

Sivic, J., Zisserman, A.: Video ***: A text retrieval approach to object matching in videos. In: International Conference on Computer Vision. pp. 1470-1477 (2003)Sivic, J., Zisserman, A .: Video ***: A text retrieval approach to object matching in videos.In: International Conference on Computer Vision.pp. 1470-1477 (2003)

しかし、対象となる画像から非常に多くの特徴を抽出したり、一つの文章が非常に多くの特徴となる単語を含んでいたりする場合、各特徴が含まれる画像や文章の数が増加することでIDFの値は0に近づき、また特徴間の差が小さくなることで、精度の高い類似性の判定が出来なくなるという問題がある。 However, if a very large number of features are extracted from the target image, or if a sentence contains words that have a very large number of features, the number of images and sentences that contain each feature will increase. However, the IDF value approaches 0, and the difference between features becomes small, so that there is a problem that similarity cannot be determined with high accuracy.

また、IDFは同じ対象から着目特徴が一つだけ抽出されても、複数抽出されても、当該対象に着目特徴が含まれるという情報は同じであり、d_kは変わらないため、IDFの値も変わらず、幾つ抽出されても影響がない。しかし、一つだけ抽出されたか、複数抽出されたかの違いは、対象間の類似性に関係がある為、IDFのこのような性質は好ましくないという問題もある。 In addition, even if only one feature of interest is extracted from the same target or multiple IDFs, the information that the target feature is included in the target is the same and d _k does not change. No change, no matter how many are extracted. However, there is also a problem that such a property of IDF is not preferable because the difference between whether only one is extracted or a plurality of extracted is related to the similarity between objects.

また、IDFが0に近くなって重要度が小さくなった特徴が非常に多く抽出された場合、類似値が多数積算されることにより、より類似していると誤検出される可能性が増加したり、類似値の計算処理回数が増えて、処理時間や必要メモリ量が増大したりするという問題がある。 In addition, when a large number of features whose IDF is close to 0 and have a low importance are extracted, the possibility of false detection of more similar increases by accumulating many similar values. Or the number of similar value calculation processes increases, resulting in an increase in processing time and required memory.

本発明では、上記問題点を解決するために成されたものであり、抽出される特徴数が多くても、適切な重みを算出することができる重要度算出装置、方法、及びプログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and provides an importance calculation device, method, and program capable of calculating an appropriate weight even if the number of extracted features is large. For the purpose.

上記目的を達成するために、第１の発明に係る重要度算出方法は、特徴抽出部と特徴重要度算出部とを含む、重要度算出装置における、重要度算出方法であって、前記特徴抽出部は、複数の学習用データから特徴の各々を抽出し、前記特徴重要度算出部は、前記特徴の各々について、全特徴が抽出された総数を前記特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦ（Inverse Total Feature Frequency）を前記特徴の重みとして算出する。 To achieve the above object, an importance calculation method according to a first invention is an importance calculation method in an importance calculation device including a feature extraction unit and a feature importance calculation unit, wherein the feature extraction The unit extracts each of the features from a plurality of learning data, and the feature importance calculation unit has a value obtained by dividing the total number of all the features extracted by the number of the extracted features for each of the features. ITFF (Inverse Total Feature Frequency), which is a logarithmic value, is calculated as the weight of the feature.

第２の発明に係る重要度算出装置は、複数の学習用データから特徴の各々を抽出する特徴抽出部と、前記特徴の各々について、全特徴が抽出された総数を前記特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦ（Inverse Total Feature Frequency）を前記特徴の重みとして算出する特徴重要度算出部と、を含んで構成される。 The importance calculation device according to the second invention includes a feature extraction unit that extracts each feature from a plurality of learning data, and the total number of all the features extracted for each of the features. And a feature importance degree calculation unit for calculating ITFF (Inverse Total Feature Frequency), which is a value obtained by taking the logarithm of the value divided by, as the weight of the feature.

第１及び第２の発明によれば、特徴抽出部により、複数の学習用データから特徴の各々を抽出し、特徴重要度算出部により、特徴の各々について、全特徴が抽出された総数を特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦ（Inverse Total Feature Frequency）を特徴の重みとして算出する。 According to the first and second aspects, the feature extraction unit extracts each of the features from the plurality of learning data, and the feature importance calculation unit calculates the total number of all the features extracted for each of the features. ITFF (Inverse Total Feature Frequency), which is a value obtained by taking the logarithm of the value obtained by dividing by the extracted number, is calculated as the feature weight.

このように、複数の学習用データから特徴の各々を抽出し、特徴の各々について、全特徴が抽出された総数を特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦを特徴の重みとして算出することにより、抽出される特徴数が多くても、適切な重みを算出することができる。 Thus, each feature is extracted from a plurality of learning data, and for each feature, ITFF, which is a logarithm of the value obtained by dividing the total number of all extracted features by the number of extracted features, is obtained. By calculating the feature weight, an appropriate weight can be calculated even if the number of extracted features is large.

また、第１及び第２の発明において、前記特徴重要度算出部により特徴の重みを算出することは、前記ＩＴＦＦの値が予め定められた閾値未満である場合には、前記特徴の重みを０とし、前記ＩＴＦＦの値が前記閾値以上である場合には、前記特徴の重みを前記ＩＴＦＦの値としてもよい。 In the first and second aspects of the invention, calculating the feature weight by the feature importance calculating unit sets the feature weight to 0 when the value of the ITFF is less than a predetermined threshold. When the ITFF value is equal to or greater than the threshold value, the feature weight may be the ITFF value.

また、第１及び第２の発明において、前記特徴抽出部により特徴を抽出することは、更にクエリデータから特徴の各々を抽出し、前記学習用データ毎の特徴の各々と、前記クエリデータの特徴の各々と、前記特徴毎の重みとに基づいて、前記複数の学習用データから、前記クエリデータに類似する学習用データを検索する検索部を更に含んでもよい。 In the first and second aspects of the invention, the feature extraction by the feature extraction unit further extracts each feature from query data, and each feature for each of the learning data and the feature of the query data And a search unit for searching learning data similar to the query data from the plurality of learning data based on each of the above and the weight for each feature.

また、本発明のプログラムは、コンピュータを、上記の重要度算出装置を構成する各部として機能させるためのプログラムである。 Moreover, the program of this invention is a program for functioning a computer as each part which comprises said importance calculation apparatus.

以上説明したように、本発明の重要度算出装置、方法、及びプログラムによれば、複数の学習用データから特徴の各々を抽出し、特徴の各々について、全特徴が抽出された総数を特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦを特徴の重みとして算出することにより、抽出される特徴数が多くても、適切な重みを算出することができる。 As described above, according to the importance calculation device, method, and program of the present invention, each feature is extracted from a plurality of learning data, and for each feature, the total number of all the features extracted is the feature. By calculating ITFF, which is a value obtained by taking the logarithm of the value obtained by dividing the extracted number as a feature weight, an appropriate weight can be calculated even if the number of extracted features is large.

また、閾値処理により重要ではない特徴の重みを０とすることにより、誤検出の可能性を低下させたり、類似値の計算処理量や必要なメモリ量を削減したりすることが可能となる。 Further, by setting the weight of features that are not important by threshold processing to 0, it is possible to reduce the possibility of erroneous detection, reduce the amount of calculation processing of similar values, and the amount of memory required.

第１の実施形態に係る重要度算出装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the importance calculation apparatus which concerns on 1st Embodiment. 第１の実施形態に係る重要度算出装置における重み学習処理ルーチンを示すフローチャートである。It is a flowchart which shows the weight learning process routine in the importance calculation apparatus which concerns on 1st Embodiment. 第１の実施形態に係る重要度算出装置における検索処理ルーチンを示すフローチャートである。It is a flowchart which shows the search process routine in the importance calculation apparatus which concerns on 1st Embodiment. 第２の実施形態に係る重要度算出装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the importance calculation apparatus which concerns on 2nd Embodiment. 第２の実施形態に係る重要度算出装置における重み学習処理ルーチンを示すフローチャートである。It is a flowchart which shows the weight learning process routine in the importance calculation apparatus which concerns on 2nd Embodiment. 実験結果の例を示す図である。It is a figure which shows the example of an experimental result.

以下、図面を参照して本発明の実施形態を詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

＜本発明の実施形態の概要＞
まず、本実施形態の概要について説明する。 <Outline of Embodiment of the Present Invention>
First, an outline of the present embodiment will be described.

本実施形態において用いる重みを算出する方法は、抽出する特徴数が多かったり、一つの対象が非常に多くの特徴を含んでいたりする場合においても、より適切な特徴の重みを算出することができる。 The method of calculating weights used in the present embodiment can calculate more appropriate feature weights even when the number of features to be extracted is large or when one target includes a very large number of features. .

本実施形態において用いる重みを算出する方法は、着目特徴を含んでいる対象の数ではなく、抽出された特徴の数を利用することで、抽出特徴数が多くても、特徴間の差を反映したり、重みが０に近づいたりしないようにする。 The method of calculating the weight used in this embodiment reflects the difference between features even if the number of extracted features is large by using the number of extracted features instead of the number of targets that include the feature of interest. And the weight does not approach 0.

具体的には、ｋ番目の特徴がｎ番目の対象から抽出された数f_k ⁿとすると、ｋ番目の特徴に対する重みitff_kは、下記（２）式に従って算出される。 Specifically, assuming that the number f _k ⁿ at which the k th feature is extracted from the n th target, the weight itff _k for the k th feature is calculated according to the following equation (2).

ここで、ITFF（Inverse Total Feature Frequency）は、全ての特徴抽出数に対する、着目特徴の抽出数の比率に基づいているため、全ての対象から抽出されたとしても重みは０にはならない。また、特徴抽出数に基づくため、ある対象から１つだけ抽出された場合と、複数抽出された場合では、異なる値が算出され、より特徴間の差を強調する値となる。 Here, since ITFF (Inverse Total Feature Frequency) is based on the ratio of the number of feature extractions to the number of all feature extractions, the weight does not become zero even if extracted from all targets. Also, since it is based on the number of feature extractions, a different value is calculated for a case where only one is extracted from a certain target and a case where a plurality of features are extracted, resulting in a value that emphasizes the difference between features.

なお、IDF_kについては、上述の（１）式に従って算出することができる。 Note that IDF _k can be calculated according to the above equation (1).

＜第１の実施形態に係る重要度算出装置の構成＞
次に、第１の実施形態に係る重要度算出装置の構成について説明する。図１に示すように、第１の実施形態に係る重要度算出装置１００は、ＣＰＵと、ＲＡＭと、後述する各種処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この重要度算出装置１００は、機能的には図１に示すように入力部１０と、演算部２０と、出力部９０とを含んで構成されている。 <Configuration of Importance Calculation Device According to First Embodiment>
Next, the configuration of the importance calculation device according to the first embodiment will be described. As shown in FIG. 1, the importance calculation apparatus 100 according to the first embodiment includes a CPU, a RAM, and a ROM that stores programs and various data for executing various processing routines to be described later. Can be configured. The importance calculation apparatus 100 is functionally configured to include an input unit 10, an arithmetic unit 20, and an output unit 90 as shown in FIG.

入力部１０は、特徴の各々、及び当該特徴の重みを学習するための学習用データである学習用画像の各々を受け付ける。また、入力部１０は、クエリとしてクエリデータである画像（以後、クエリ画像）を受け付ける。 The input unit 10 receives each of the features and each of the learning images that are learning data for learning the weights of the features. Further, the input unit 10 receives an image (hereinafter referred to as a query image) that is query data as a query.

演算部２０は、記憶部２２と、特徴抽出部２４と、特徴重要度算出部２６と、検索部２８とを含んで構成されている。 The calculation unit 20 includes a storage unit 22, a feature extraction unit 24, a feature importance calculation unit 26, and a search unit 28.

記憶部２２には、入力部１０において受け付けた学習用画像の各々が記憶されている。なお、後述の特徴抽出部２４の処理後においては、記憶されている学習用画像の各々に、当該学習用画像の特徴ベクトルが紐付けられて記憶されている。また、記憶部２２には、後述の特徴重要度算出部２６の処理後においては、各特徴の重要度が記憶されている。 Each of the learning images received by the input unit 10 is stored in the storage unit 22. Note that, after the processing of the feature extraction unit 24 described later, the feature vector of the learning image is associated with each stored learning image and stored. The storage unit 22 stores the importance level of each feature after processing by a feature importance level calculation unit 26 described later.

特徴抽出部２４は、記憶部２２に記憶されている学習用画像の各々について、当該学習用画像から特徴の各々を抽出する。また、特徴抽出部２４は、学習用画像の各々について、当該学習用画像から抽出された特徴の各々の数を、予め定められた順番に並べた特徴ベクトルを、当該学習用画像の特徴ベクトルとして作成し、記憶部２２に記憶する。なお、第１の実施形態においては、特徴を抽出する対象が画像であることから、特徴抽出部２４は、対象となる画像から色や幾何学的な情報を特徴として抽出する。 For each learning image stored in the storage unit 22, the feature extraction unit 24 extracts each of the features from the learning image. Further, the feature extraction unit 24 uses, as a feature vector of the learning image, a feature vector in which the number of features extracted from the learning image is arranged in a predetermined order for each of the learning images. Created and stored in the storage unit 22. In the first embodiment, since the target for extracting features is an image, the feature extracting unit 24 extracts color and geometric information from the target image as features.

また、特徴抽出部２４は、入力部１０において受け付けたクエリ画像からも同様に特徴の各々を抽出し、当該クエリ画像の特徴ベクトルを作成する。なお、特徴抽出部２４において抽出される特徴は幾何学的な情報を取得することが出来ればどのような手法を用いてもよい。また、各特徴は、ベクトルとして取得される。 The feature extraction unit 24 similarly extracts each feature from the query image received by the input unit 10 and creates a feature vector of the query image. Note that any method may be used for the features extracted by the feature extraction unit 24 as long as geometric information can be acquired. Each feature is acquired as a vector.

特徴重要度算出部２６は、特徴抽出部２４において取得した学習用画像毎の特徴の各々に基づいて、上記（２）式に従って、各特徴の重みとして、ＩＴＦＦを算出する。また、特徴重要度算出部２６は、取得した各特徴の重みを記憶部２２に記憶する。 The feature importance calculation unit 26 calculates ITFF as the weight of each feature according to the above equation (2) based on each feature for each learning image acquired by the feature extraction unit 24. The feature importance calculation unit 26 stores the acquired weight of each feature in the storage unit 22.

検索部２８は、特徴抽出部２４において取得したクエリ画像の特徴ベクトルと、記憶部２２に記憶されている各特徴の重みと、記憶部２２に記憶されている学習用画像毎の特徴ベクトルとに基づいて、クエリ画像に類似する学習用画像を類似画像として検索し、出力部９０から出力する。 The search unit 28 uses the feature vector of the query image acquired by the feature extraction unit 24, the weight of each feature stored in the storage unit 22, and the feature vector for each learning image stored in the storage unit 22. Based on this, a learning image similar to the query image is searched as a similar image and output from the output unit 90.

具体的には、検索部２８は、学習用画像の各々について、当該学習用画像の特徴ベクトルと、取得したクエリ画像の特徴ベクトルとに基づいて、学習用画像の特徴ベクトルとクエリ画像の特徴ベクトルとの要素毎の差に、各特徴の重みを掛け合わせた結果を、学習用画像とクエリ画像との特徴ベクトル間の距離として算出する。そして、検索部２８は、算出された特徴ベクトル間の距離の各々に基づいて、当該ベクトル間の距離が予め定められた閾値以下である学習用画像を類似画像として取得し、出力部９０から出力する。 Specifically, for each of the learning images, the search unit 28 uses the learning image feature vector and the query image feature vector based on the learning image feature vector and the acquired query image feature vector. The result of multiplying the difference for each element by the weight of each feature is calculated as the distance between the feature vectors of the learning image and the query image. Then, based on each of the calculated distances between feature vectors, the search unit 28 acquires a learning image whose distance between the vectors is equal to or less than a predetermined threshold as a similar image, and outputs it from the output unit 90 To do.

＜第１の実施形態に係る重要度算出装置の作用＞
次に、第１の実施形態に係る重要度算出装置１００の作用について説明する。重要度算出装置１００は、入力部１０によって、学習用画像の各々を受け付け記憶部２２に記憶すると、重要度算出装置１００によって、図２に示す重み学習処理ルーチンが実行される。また、重要度算出装置１００は、入力部１０によって、クエリ画像を受け付けると、重要度算出装置１００によって、図３に示す検索処理ルーチンが実行される。 <Operation of Importance Calculation Device According to First Embodiment>
Next, the operation of the importance calculation device 100 according to the first embodiment will be described. When the importance calculation device 100 receives each of the learning images by the input unit 10 and stores them in the storage unit 22, the importance calculation device 100 executes a weight learning process routine shown in FIG. Further, when the importance calculation apparatus 100 receives a query image by the input unit 10, the importance calculation apparatus 100 executes a search processing routine shown in FIG.

まず、図２に示す重み学習処理ルーチンについて説明する。 First, the weight learning process routine shown in FIG. 2 will be described.

図２に示す重み学習処理のステップＳ１００で、記憶部２２に記憶されている学習用画像の各々を読み込む。 In step S100 of the weight learning process shown in FIG. 2, each learning image stored in the storage unit 22 is read.

次に、ステップＳ１０２で、ステップＳ１００において取得した学習用画像毎に、当該学習用画像の特徴の各々を抽出し、当該抽出された特徴の各々に基づいて、当該学習用画像の特徴ベクトルを作成し、記憶部２２に記憶する。 Next, in step S102, for each learning image acquired in step S100, each feature of the learning image is extracted, and a feature vector of the learning image is created based on each of the extracted features. And stored in the storage unit 22.

次に、ステップＳ１０４で、ステップＳ１０２において取得した学習用画像毎の特徴の各々に基づいて、上記（２）式に従って、各特徴の重みとしてＩＴＦＦを算出する。 Next, in step S104, ITFF is calculated as the weight of each feature according to the above equation (2) based on each feature for each learning image acquired in step S102.

次に、ステップＳ１０６で、ステップＳ１０４において取得した各特徴の重みの各々を記憶部２２に記憶し、重み学習処理ルーチンを終了する。 Next, in step S106, each weight of each feature acquired in step S104 is stored in the storage unit 22, and the weight learning process routine is terminated.

次に、図３に示す検索処理ルーチンについて説明する。 Next, the search processing routine shown in FIG. 3 will be described.

図３に示す検索処理のステップＳ１２０で、記憶部２２に記憶されている特徴ベクトルが紐付けられている学習用画像の各々と、各特徴の重みとを読み込む。 In step S120 of the search process shown in FIG. 3, each learning image associated with the feature vector stored in the storage unit 22 and the weight of each feature are read.

次に、ステップＳ１２２で、入力部１０において受け付けたクエリ画像から特徴の各々を抽出する。 Next, in step S122, each of the features is extracted from the query image received by the input unit 10.

次に、ステップＳ１２４で、ステップＳ１２２において取得したクエリ画像の特徴の各々に基づいて、当該クエリ画像の特徴ベクトルを作成する。 Next, in step S124, a feature vector of the query image is created based on each feature of the query image acquired in step S122.

次に、ステップＳ１２６で、ステップＳ１２４において取得したクエリ画像の特徴ベクトルと、ステップＳ１２０において取得した各学習用画像の特徴ベクトルと、ステップＳ１２０において取得した各特徴の重みと、予め定められたベクトル間の距離の閾値とに基づいて、クエリ画像に類似する画像を検索する。 Next, in step S126, the feature vector of the query image acquired in step S124, the feature vector of each learning image acquired in step S120, the weight of each feature acquired in step S120, and a predetermined vector An image similar to the query image is searched based on the threshold value of the distance.

次に、ステップＳ１２８で、ステップＳ１２６において取得したクエリ画像に類似する画像を出力部９０から出力して、検索処理ルーチンを終了する。 Next, in step S128, an image similar to the query image acquired in step S126 is output from the output unit 90, and the search processing routine ends.

以上説明したように、第１の実施形態に係る重要度算出装置によれば、複数の学習用データから特徴の各々を抽出し、特徴の各々について、全特徴が抽出された総数を特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦを特徴の重みとして算出することにより、抽出される特徴数が多くても、適切な重みを算出することができる。 As described above, according to the importance calculation apparatus according to the first embodiment, each feature is extracted from a plurality of learning data, and the feature is extracted as the total number of all the features extracted for each feature. By calculating ITFF, which is a value obtained by taking the logarithm of the value divided by the obtained number, as a feature weight, an appropriate weight can be calculated even if the number of extracted features is large.

また、処理対象となる画像や文書などから特徴を抽出し、抽出された特徴同士を比較することにより対象間の類似性を評価することで、類似画像検索や類似文書検索などを実現する処理において、抽出された特徴の重要度を表す重みの算出処理を行うことができる。 In addition, in the process that realizes similar image search, similar document search, etc. by extracting features from images and documents to be processed and evaluating the similarity between the objects by comparing the extracted features , A process of calculating a weight representing the importance of the extracted feature can be performed.

また、類似性を表す類似値や距離値を算出する際に、各特徴の重要度となる重みを反映させることで、より精度が高い類似性の評価が可能となる。 In addition, when calculating the similarity value and the distance value representing the similarity, it is possible to evaluate the similarity with higher accuracy by reflecting the weight as the importance of each feature.

なお、本発明は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。 Note that the present invention is not limited to the above-described embodiment, and various modifications and applications are possible without departing from the gist of the present invention.

次に、第２の実施形態について説明する。 Next, a second embodiment will be described.

第２の実施形態については、特定の条件を満たす特徴の重みを０にする点が、第１の実施形態と主に異なる。なお、第１の実施形態に係る重要度算出装置１００と同様の構成、及び作用については、同一の符号を付すことにより説明を省略する。 The second embodiment is mainly different from the first embodiment in that the weight of a feature that satisfies a specific condition is set to zero. In addition, about the structure and effect | action similar to the importance calculation apparatus 100 which concerns on 1st Embodiment, description is abbreviate | omitted by attaching | subjecting the same code | symbol.

＜第２の実施形態に係る重要度算出装置の構成＞
第２の実施形態に係る重要度算出装置の構成について説明する。図４に示すように、第２の実施形態に係る重要度算出装置２００は、ＣＰＵと、ＲＡＭと、後述する各種処理ルーチンを実行するためのプログラムや各種データを記憶したＲＯＭと、を含むコンピュータで構成することが出来る。この重要度算出装置２００は、機能的には図４に示すように入力部１０と、演算部２２０と、出力部９０とを含んで構成されている。 <Configuration of Importance Calculation Device According to Second Embodiment>
A configuration of the importance calculation apparatus according to the second embodiment will be described. As shown in FIG. 4, the importance calculation apparatus 200 according to the second embodiment includes a CPU, a RAM, and a ROM that stores programs and various data for executing various processing routines to be described later. Can be configured. The importance calculation apparatus 200 is functionally configured to include an input unit 10, a calculation unit 220, and an output unit 90 as shown in FIG.

演算部２２０は、記憶部２２２と、特徴抽出部２４と、特徴重要度算出部２２６と、検索部２８とを含んで構成されている。 The calculation unit 220 includes a storage unit 222, a feature extraction unit 24, a feature importance level calculation unit 226, and a search unit 28.

記憶部２２２には、入力部１０において受け付けた学習用画像の各々が記憶されている。なお、後述の特徴抽出部２４の処理後においては、記憶されている学習用画像の各々に、当該学習用画像の特徴ベクトルが紐付けられて記憶されている。また、記憶部２２２には、後述の特徴重要度算出部２２６の処理後においては、各特徴の重要度が記憶されている。 Each of the learning images received by the input unit 10 is stored in the storage unit 222. Note that, after the processing of the feature extraction unit 24 described later, the feature vector of the learning image is associated with each stored learning image and stored. The storage unit 222 stores the importance level of each feature after processing by a feature importance level calculation unit 226 described later.

特徴重要度算出部２２６は、特徴重要度算出部２２６は、特徴抽出部２４において取得した学習用画像毎の特徴の各々に基づいて、上記（２）式に従って、各特徴のＩＴＦＦを算出する。そして、各特徴について、ＩＴＦＦの値が予め定められた閾値（例えば、１）未満である場合には、当該特徴の重みを０とする。なお、ＩＴＦＦの値が閾値以上である場合には、ＩＴＦＦの値を、当該特徴の重みとする。また、当該閾値は、予め実験により適切な値を定めておくものとする。 The feature importance calculation unit 226 calculates the ITFF of each feature according to the above equation (2) based on each feature for each learning image acquired by the feature extraction unit 24. For each feature, when the value of ITFF is less than a predetermined threshold (for example, 1), the weight of the feature is set to zero. When the ITFF value is equal to or greater than the threshold value, the ITFF value is used as the weight of the feature. In addition, an appropriate value for the threshold is determined in advance through experiments.

また、特徴重要度算出部２２６は、取得した各特徴の重みを記憶部２２２に記憶する。 The feature importance calculation unit 226 stores the acquired weight of each feature in the storage unit 222.

なお、重要度算出装置２００の他の構成は、第１の実施形態に係る重要度算出装置１００と同様のため、説明は省略する。 The other components of the importance calculation apparatus 200 are the same as those of the importance calculation apparatus 100 according to the first embodiment, and thus the description thereof is omitted.

＜第２の実施形態に係る重要度算出装置の作用＞
次に、第２の実施形態に係る重要度算出装置２００の作用について説明する。重要度算出装置２００は、入力部１０によって、学習用画像の各々を受け付け記憶部２２２に記憶すると、重要度算出装置２００によって、図５に示す重み学習処理ルーチンが実行される。また、重要度算出装置２００は、入力部１０によって、クエリ画像を受け付けると、重要度算出装置２００によって、図３に示す検索処理ルーチンが実行される。 <Operation of Importance Calculation Device According to Second Embodiment>
Next, the operation of the importance calculation device 200 according to the second embodiment will be described. When the importance calculation device 200 receives each of the learning images by the input unit 10 and stores them in the storage unit 222, the importance calculation device 200 executes a weight learning process routine shown in FIG. Further, when the importance calculating apparatus 200 receives a query image by the input unit 10, the importance calculating apparatus 200 executes a search processing routine shown in FIG.

図５に示す重み学習処理のステップＳ２０４で、ステップＳ１０２において取得した学習用画像毎の特徴の各々に基づいて、上記（２）式に従って、各特徴のＩＴＦＦ値を算出し、当該ＩＴＦＦの値と、予め定められた閾値とに基づいて、重みを算出する。 In step S204 of the weight learning process shown in FIG. 5, the ITFF value of each feature is calculated according to the above equation (2) based on each feature for each learning image acquired in step S102, and the value of the ITFF The weight is calculated based on a predetermined threshold value.

なお、重要度算出装置２００の他の作用については、第１の実施形態に係る重要度算出装置１００の作用と同一であるため、説明を省略する。 The other operations of the importance calculation device 200 are the same as the operations of the importance calculation device 100 according to the first embodiment, and thus the description thereof is omitted.

＜実験例＞
第１の実施形態に係る重要度算出装置１００、及び第２の実施形態に係る重要度算出装置２００を用いて、１００枚の検索クエリ画像に対し、類似している３１４枚及び異なる５０００枚の計５３１４枚の画像からどれだけ類似した画像が検出できるのかの評価実験結果を図６に示す。各数値は上位Ｍ個に類似した画像が含まれている割合を示す。 <Experimental example>
Using the importance calculation device 100 according to the first embodiment and the importance calculation device 200 according to the second embodiment, 314 similar and 5,000 different images are obtained for 100 search query images. FIG. 6 shows the results of an evaluation experiment on how similar images can be detected from a total of 5314 images. Each numerical value indicates a ratio in which images similar to the top M are included.

図６の結果から、ＩＴＦＦを用いた結果の方が、従来のＩＤＦを用いた結果よりも検索精度が高いということがいえる。更に、閾値処理をしたＩＴＦＦ（Thresholded ITFF）を用いた結果の方が、ＩＴＦＦをそのまま用いた結果よりも検索精度が高いということがいえる。また、ＩＤＦに限らず、着目特徴が対象に含まれているか否かのみの情報d_kを用いる、ＢＭ２５などの異なる重み計算法においても、着目特徴の抽出特徴数f_k ⁿを用いる方法を適用することが可能である。 From the result of FIG. 6, it can be said that the result using ITFF has higher search accuracy than the result using conventional IDF. Furthermore, it can be said that the result using the thresholded ITFF (Thresholded ITFF) has a higher search accuracy than the result using the ITFF as it is. Further, not only the IDF but also a different weight calculation method such as BM25 that uses only information d _k indicating whether or not the target feature is included in the target, a method using the number of extracted features f _k ⁿ of the target feature is applied. Is possible.

以上のことより、第２の実施形態に係る重要度算出装置は、検索精度を向上させることができ、また、検索時間を短縮することができる。 From the above, the importance calculation device according to the second embodiment can improve the search accuracy and can shorten the search time.

例えば、第１及び第２の実施形態においては、対象を画像とする場合について説明する場合について説明したが、これに限定されるものではなく、例えば、対象は画像でなく、例えば、文章等の任意のものであってもよい。この場合、特徴抽出部において抽出される特徴は、対象に対応した特徴を抽出する。例えば、対象が文章である場合には、当該文書に含まれる単語等を特徴として抽出する。 For example, in the first and second embodiments, the case where the target is an image has been described. However, the present invention is not limited to this. For example, the target is not an image, for example, a sentence or the like. It may be arbitrary. In this case, the feature corresponding to the object is extracted as the feature extracted by the feature extraction unit. For example, when the target is a sentence, words included in the document are extracted as features.

また、第１及び第２の実施形態においては、類似度を特徴ベクトル間の距離に基づいて判断する場合について説明したが、これに限定されるものではない。例えば、任意の方法を用いてもよい。 In the first and second embodiments, the case where the similarity is determined based on the distance between the feature vectors has been described. However, the present invention is not limited to this. For example, any method may be used.

また、第１及び第２の実施形態においては、学習用画像、及びクエリ画像について、特徴ベクトルを作成する場合について説明したが、これに限定されるものではない。例えば、任意の方法により各画像について特徴を表現してもよい。 In the first and second embodiments, the case where a feature vector is created for a learning image and a query image has been described. However, the present invention is not limited to this. For example, features may be expressed for each image by an arbitrary method.

また、第２の実施形態においては、ＩＴＦＦの値が閾値未満である特徴の重みを０として記憶し、当該特徴についても処理の対象とする場合について説明したがこれに限定されるものではない。例えば、ＩＴＦＦの値が閾値未満である特徴については、処理の対象としないようにしてもよい。そのため、当該場合、ＩＴＦＦの値が閾値未満である特徴については、学習用画像の各々、及びクエリ画像についての各処理において対象としないものとする。 In the second embodiment, a case has been described in which the weight of a feature whose ITFF value is less than the threshold value is stored as 0, and the feature is also subject to processing. However, the present invention is not limited to this. For example, features whose ITFF value is less than the threshold value may not be processed. Therefore, in this case, a feature whose ITFF value is less than the threshold value is not considered in each process for each of the learning images and the query image.

また、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能であるし、ネットワークを介して提供することも可能である。 Further, in the present specification, the embodiment has been described in which the program is installed in advance. However, the program can be provided by being stored in a computer-readable recording medium or provided via a network. It is also possible to do.

１０入力部
２０演算部
２２記憶部
２４特徴抽出部
２６特徴重要度算出部
２８検索部
９０出力部
１００重要度算出装置
２００重要度算出装置
２２０演算部
２２２記憶部
２２６特徴重要度算出部 DESCRIPTION OF SYMBOLS 10 Input part 20 Operation part 22 Storage part 24 Feature extraction part 26 Feature importance degree calculation part 28 Search part 90 Output part 100 Importance degree calculation apparatus 200 Importance degree calculation apparatus 220 Operation part 222 Storage part 226 Feature importance degree calculation part

Claims

特徴抽出部と特徴重要度算出部とを含む、重要度算出装置における、重要度算出方法であって、
前記特徴抽出部は、複数の学習用データから特徴の各々を抽出し、
前記特徴重要度算出部は、前記特徴の各々について、全特徴が抽出された総数を前記特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦ（Inverse Total Feature Frequency）を前記特徴の重みとして算出する
重要度算出方法。 An importance calculation method in an importance calculation device including a feature extraction unit and a feature importance calculation unit,
The feature extraction unit extracts each feature from a plurality of learning data,
The feature importance calculation unit calculates ITFF (Inverse Total Feature Frequency), which is a logarithm of a value obtained by dividing the total number of all extracted features by the number of extracted features for each of the features. Importance calculation method to calculate as feature weight.

前記特徴重要度算出部により特徴の重みを算出することは、前記ＩＴＦＦの値が予め定められた閾値未満である場合には、前記特徴の重みを０とし、前記ＩＴＦＦの値が前記閾値以上である場合には、前記特徴の重みを前記ＩＴＦＦの値とする請求項１記載の重要度算出方法。 The feature weight calculation unit calculates the feature weight when the ITFF value is less than a predetermined threshold value, the feature weight is set to 0, and the ITFF value is greater than or equal to the threshold value. The importance calculation method according to claim 1, wherein in some cases, the weight of the feature is the value of the ITFF.

前記特徴抽出部により特徴を抽出することは、更にクエリデータから特徴の各々を抽出し、
前記学習用データ毎の特徴の各々と、前記クエリデータの特徴の各々と、前記特徴毎の重みとに基づいて、前記複数の学習用データから、前記クエリデータに類似する学習用データを検索する検索部を更に含む請求項１又２記載の重要度算出方法。 Extracting the features by the feature extraction unit further extracts each of the features from the query data;
Based on each feature for each learning data, each feature of the query data, and a weight for each feature, the learning data similar to the query data is searched from the plurality of learning data. The importance calculation method according to claim 1 or 2, further comprising a search unit.

複数の学習用データから特徴の各々を抽出する特徴抽出部と、
前記特徴の各々について、全特徴が抽出された総数を前記特徴が抽出された数で割った値の対数をとった値であるＩＴＦＦ（Inverse Total Feature Frequency）を前記特徴の重みとして算出する特徴重要度算出部と、
を含む重要度算出装置。 A feature extraction unit that extracts each of the features from a plurality of learning data;
For each of the features, feature importance is calculated by calculating ITFF (Inverse Total Feature Frequency), which is a logarithm of a value obtained by dividing the total number of all extracted features by the number of extracted features as the weight of the feature A degree calculator,
Importance calculation device including

前記特徴重要度算出部は、前記ＩＴＦＦの値が予め定められた閾値未満である場合には、前記特徴の重みを０とし、前記ＩＴＦＦの値が前記閾値以上である場合には、前記特徴の重みを前記ＩＴＦＦの値とする請求項４記載の重要度算出装置。 The feature importance calculation unit sets the weight of the feature to 0 when the value of the ITFF is less than a predetermined threshold, and sets the weight of the feature when the value of the ITFF is equal to or greater than the threshold. The importance calculation apparatus according to claim 4, wherein a weight is a value of the ITFF.

前記特徴抽出部は、更にクエリデータから特徴の各々を抽出し、
前記学習用データ毎の特徴の各々と、前記クエリデータの特徴の各々と、前記特徴毎の重みとに基づいて、前記複数の学習用データから、前記クエリデータに類似する学習用データを検索する検索部を更に含む、請求項４又は５記載の重要度算出装置。 The feature extraction unit further extracts each of the features from the query data,
Based on each feature for each learning data, each feature of the query data, and a weight for each feature, the learning data similar to the query data is searched from the plurality of learning data. The importance calculation device according to claim 4, further comprising a search unit.

コンピュータを、請求項４〜請求項６の何れか１項に記載の重要度算出装置の各部として機能させるためのプログラム。 The program for functioning a computer as each part of the importance calculation apparatus of any one of Claims 4-6.