JP6012814B1

JP6012814B1 - Sequential clustering apparatus, method, and program

Info

Publication number: JP6012814B1
Application number: JP2015104608A
Authority: JP
Inventors: 島村　潤; 潤島村; 大我吉田; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2015-05-22
Filing date: 2015-05-22
Publication date: 2016-10-25
Anticipated expiration: 2035-05-22
Also published as: JP2016218847A

Abstract

【課題】高速なクラスタリング処理を実現する。【解決手段】逐次クラスタリング装置では、複数の特徴データの各々が、各々のセントロイドが定められた所定個数のクラスタの何れかにクラスタリングされていると共に、新たな特徴データをクラスタリングするための近似近傍探索用インデックスと、近似近傍探索用インデックスの更新要否を判定するためのｇａｐ閾値とが予め定められている。新たな特徴データを受信する毎に、受信した新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出する（６８）。算出されたセントロイド移動量がｇａｐ閾値を超えているか否かを判定する（７０）。算出されたセントロイド移動量がｇａｐ閾値を超えていると判定された場合にのみ、近似近傍探索用インデックスを更新する（７２）。【選択図】図４A high-speed clustering process is realized. In a sequential clustering apparatus, each of a plurality of feature data is clustered into one of a predetermined number of clusters in which each centroid is defined, and an approximate neighborhood for clustering new feature data A search index and a gap threshold for determining whether or not the approximate neighborhood search index needs to be updated are determined in advance. Each time new feature data is received, the movement amount of the centroid of the cluster to which the received new feature data belongs is calculated, and the centroid movement amount that is the cumulative amount of the calculated movement amount of the centroid is calculated (68 ). It is determined whether the calculated centroid movement amount exceeds the gap threshold (70). The approximate neighborhood search index is updated only when it is determined that the calculated centroid movement amount exceeds the gap threshold (72). [Selection] Figure 4

Description

本発明は、逐次的に入力される特徴データをクラスタリングする逐次クラスタリング装置、方法、及び、プログラムに関する。 The present invention relates to a sequential clustering apparatus, method, and program for clustering feature data that are sequentially input.

特徴データ群を似たものでクラスタリングする技術において、逐次クラスタリング技術は、逐次的に入力される特徴データのクラスタリングを可能としている。 In a technique for clustering feature data groups similar to each other, the sequential clustering technique enables clustering of feature data input sequentially.

例えば、非特許文献１記載の方法では、ある程度の数から成る特徴データ群がＫ個のセントロイド（重心）を基準にＫ個のクラスタに既に分けられた状態とする。この状態で新たな特徴データが入力された際に、新たに入力された特徴データとＫ個のセントロイドとに基づいて、新たに入力された特徴データに対する最近傍のセントロイドをＫ個のセントロイドの中から決定する。また、決定されたセントロイドを、入力された特徴データを用いて更新する。そして、以上の処理を新たな特徴データが入力される毎に繰返すことで逐次クラスタリングを実現している。 For example, in the method described in Non-Patent Document 1, a certain number of feature data groups are already divided into K clusters based on K centroids (centroids). When new feature data is input in this state, based on the newly input feature data and K centroids, the nearest centroid for the newly input feature data is set to K centimeters. Decide from Lloyd. Further, the determined centroid is updated using the input feature data. Then, the above processing is repeated every time new feature data is input, thereby realizing sequential clustering.

非特許文献１においてＫ個のセントロイドのうち、新たに入力された特徴データに対する最近傍のセントロイドを決定する処理では、入力された特徴データと各セントロイドとの間の距離を計算して最近接となるセントロイドを決定する。このような各距離を計算する処理には時間を要する。特にＫが大きい場合にはその計算量が膨大となる。そのため、非特許文献２記載の方法では、ＦＬＡＮＮなどに代表される近似近傍探索処理を用いて近傍探索を行っている。この処理では、高速な探索を実現するために、新たに入力された特徴データがどのセントロイドに最も近いかを判断するための情報である近似近傍探索用インデックスを作成するインデキシング処理を事前に行う。 In the non-patent document 1, among the K centroids, in the process of determining the nearest centroid for the newly input feature data, the distance between the input feature data and each centroid is calculated. Determine the closest centroid. Processing for calculating each distance requires time. In particular, when K is large, the amount of calculation becomes enormous. For this reason, in the method described in Non-Patent Document 2, a neighborhood search is performed using an approximate neighborhood search process typified by FLANN. In this process, in order to realize a high-speed search, an indexing process for creating an approximate neighborhood search index that is information for determining which centroid the newly input feature data is closest to is performed in advance. .

Pham, Duc Truong, Dimov, Stefan Simeonov and Nguyen著、「An incremental K-means algorithm」、C. D. 2004. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218 (7) , pp. 783-795.Pham, Duc Truong, Dimov, Stefan Simeonov and Nguyen, “An incremental K-means algorithm”, CD 2004. Proceedings of the Institution of Mechanical Engineers, Part C: Journal of Mechanical Engineering Science 218 (7), pp. 783- 795. Marius Muja and David G. Lowe著、「Scalable Nearest Neighbor Algorithms for High Dimensional Data」、Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014.Marius Muja and David G. Lowe, "Scalable Nearest Neighbor Algorithms for High Dimensional Data", Pattern Analysis and Machine Intelligence (PAMI), Vol. 36, 2014.

しかし、近似近傍探索処理におけるインデキシング処理の時間もＫが大きい場合には時間を要する。そして、逐次クラスタリング技術のように、更新処理によってセントロイドが変化する毎に近似近傍探索処理用のインデキシング処理を行う必要が生じ、時間が掛かるといった問題があった。 However, the time for the indexing process in the approximate neighborhood search process also takes time if K is large. Then, like the sequential clustering technique, every time the centroid is changed by the update process, it is necessary to perform the indexing process for the approximate neighborhood search process, which takes time.

本発明は、上記問題を解決すべくなされたものであり、高速なクラスタリング処理を実現することが可能な逐次クラスタリング装置、方法、及び、プログラムを提供することを目的とする。 The present invention has been made to solve the above-described problems, and an object thereof is to provide a sequential clustering apparatus, method, and program capable of realizing high-speed clustering processing.

上記の目的を達成するために本発明に係る逐次クラスタリング装置は、複数の特徴データの各々が、各々のセントロイドが定められた所定個数のクラスタの何れかにクラスタリングされていると共に、新たな特徴データをクラスタリングするための近似近傍探索用インデックスと、前記近似近傍探索用インデックスの更新要否を判定するための閾値とが予め定められ、新たな特徴データを受信する毎に、受信した前記新たな特徴データを、前記近似近傍探索用インデックスを用いてクラスタリングする逐次クラスタリング装置であって、新たな特徴データを受信する毎に、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出する算出部と、前記算出されたセントロイド移動量が前記閾値を超えているか否かを判定する判定部と、前記算出されたセントロイド移動量が前記閾値を超えていると判定された場合にのみ、前記近似近傍探索用インデックスを更新する更新部と、を備えている。 In order to achieve the above object, the sequential clustering device according to the present invention is configured such that each of a plurality of feature data is clustered into one of a predetermined number of clusters in which each centroid is defined, and a new feature An approximate neighborhood search index for clustering data and a threshold for determining whether the approximate neighborhood search index needs to be updated are determined in advance, and each time new feature data is received, the received new A sequential clustering device that clusters feature data using the approximate neighborhood search index, and each time new feature data is received, the movement amount of the centroid of the cluster to which the received new feature data belongs is calculated And calculating a centroid movement amount that is a cumulative amount of the calculated centroid movement amount A determination unit that determines whether the calculated centroid movement amount exceeds the threshold value, and the approximation only when it is determined that the calculated centroid movement amount exceeds the threshold value. An update unit that updates the neighborhood search index.

本発明に係る逐次クラスタリング方法は、複数の特徴データの各々が、各々のセントロイドが定められた所定個数のクラスタの何れかにクラスタリングされていると共に、新たな特徴データをクラスタリングするための近似近傍探索用インデックスと、前記近似近傍探索用インデックスの更新要否を判定するための閾値とが予め定められ、新たな特徴データを受信する毎に、受信した前記新たな特徴データを、前記近似近傍探索用インデックスを用いてクラスタリングする逐次クラスタリング方法であって、算出部が、新たな特徴データを受信する毎に、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出し、判定部が、前記算出されたセントロイド移動量が前記閾値を超えているか否かを判定し、更新部が、前記算出されたセントロイド移動量が前記閾値を超えていると判定された場合にのみ、前記近似近傍探索用インデックスを更新することを含む。 In the sequential clustering method according to the present invention, each of a plurality of feature data is clustered into one of a predetermined number of clusters in which each centroid is defined, and an approximate neighborhood for clustering new feature data A search index and a threshold for determining whether or not the approximate neighborhood search index needs to be updated are determined in advance, and each time new feature data is received, the received new feature data is searched for the approximate neighborhood search. In this case, the calculation unit calculates and calculates the amount of movement of the centroid of the cluster to which the received new feature data belongs each time new feature data is received. A centroid movement amount that is a cumulative amount of the centroid movement amount is calculated, and the determination unit calculates the calculated amount. It is determined whether or not the centroid movement amount exceeds the threshold value, and the approximate neighborhood search index is determined only when the update unit determines that the calculated centroid movement amount exceeds the threshold value. Including updating.

本発明によれば、算出部が、新たな特徴データを受信する毎に、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出する。 According to the present invention, every time the calculation unit receives new feature data, the calculation unit calculates the movement amount of the centroid of the cluster to which the received new feature data belongs, and the accumulated amount of the calculated movement amount of the centroid The amount of centroid movement is calculated.

判定部が、前記算出されたセントロイド移動量が前記閾値を超えているか否かを判定し、更新部が、前記算出されたセントロイド移動量が前記閾値を超えていると判定された場合にのみ、前記近似近傍探索用インデックスを更新する。 When the determination unit determines whether or not the calculated centroid movement amount exceeds the threshold value, and the update unit determines that the calculated centroid movement amount exceeds the threshold value Only the approximate neighborhood search index is updated.

このように、算出されたセントロイド移動量が閾値を超えていると判定された場合にのみ、近似近傍探索用インデックスを更新するので、比較的時間のかかる近似近傍探索用インデックスを更新することを減らすことができ、高速なクラスタリングの処理を実現することができる。 In this way, the approximate neighborhood search index is updated only when it is determined that the calculated centroid movement amount exceeds the threshold value. It can be reduced, and high-speed clustering processing can be realized.

本発明に係るプログラムは、コンピュータを、上記逐次クラスタリング装置の各部として機能させるためのプログラムである。 The program according to the present invention is a program for causing a computer to function as each unit of the sequential clustering apparatus.

以上説明したように、本発明の逐次クラスタリング装置、方法、及びプログラムによれば、高速なクラスタリング処理を実現することができる、という効果が得られる。 As described above, according to the sequential clustering apparatus, method, and program of the present invention, there is an effect that high-speed clustering processing can be realized.

本実施の形態における逐次クラスタリング装置１０の構成を示すブロック図である。It is a block diagram which shows the structure of the sequential clustering apparatus 10 in this Embodiment. 逐次クラスタリング装置１０が実行する逐次クラスタリング処理プログラムを示すフローチャートである。It is a flowchart which shows the sequential clustering processing program which the sequential clustering apparatus 10 performs. 初期セントロイド作成部１６が実行する初期セントロイドの作成処理プログラムを示すフローチャートである。It is a flowchart which shows the creation processing program of the initial centroid which the initial centroid creation part 16 performs. セントロイド更新部１８が実行するセントロイドの更新処理を示すフローチャートである。It is a flowchart which shows the update process of the centroid which the centroid update part 18 performs. メモリ１４の記憶領域１４Ａの内容を示す図である。3 is a diagram showing the contents of a storage area 14A of a memory 14. FIG. メモリ１４の記憶領域１４Ｂの内容を示す図である。FIG. 3 is a diagram showing the contents of a storage area 14B of a memory 14.

以下、図面を参照して本発明の実施の形態を詳細に説明する。
[第１の実施の形態]
以下、本発明の第１の実施の形態に係る逐次クラスタリング装置１０について図面を参照して説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[First embodiment]
The sequential clustering apparatus 10 according to the first embodiment of the present invention will be described below with reference to the drawings.

図１は、本実施の形態における逐次クラスタリング装置１０の構成を示すブロック図である。図５は、メモリ１４の記憶領域１４Ａの内容を示す図である。図１に示すように、逐次クラスタリング装置１０は、初期セントロイドが作成済みか否かを判断することなどの前処理を実行する前処理部１２を備えている。逐次クラスタリング装置１０は、特徴データ、初期セントロイド群、ｇａｐ閾値、セントロイド移動量、及び近似近傍探索用インデックスを記憶するための各記憶領域８０〜８８（図５参照）が設けられているメモリ１４を備えている。逐次クラスタリング装置１０は、初期セントロイドを作成する初期セントロイド作成部１６、及びセントロイドを更新するセントロイド更新部１８を備えている。 FIG. 1 is a block diagram showing a configuration of a sequential clustering apparatus 10 according to the present embodiment. FIG. 5 is a diagram showing the contents of the storage area 14 </ b> A of the memory 14. As shown in FIG. 1, the sequential clustering device 10 includes a preprocessing unit 12 that performs preprocessing such as determining whether an initial centroid has been created. The sequential clustering apparatus 10 includes memory areas 80 to 88 (see FIG. 5) for storing feature data, initial centroid group, gap threshold value, centroid movement amount, and approximate neighborhood search index. 14 is provided. The sequential clustering apparatus 10 includes an initial centroid creation unit 16 that creates an initial centroid and a centroid update unit 18 that updates the centroid.

初期セントロイド作成部１６は、特徴データをクラスタリングする処理などを実行するクラスタリング部２０と、ｇａｐ閾値を決定することなどを行う決定部２２でと、を備えている。セントロイド更新部１８は、特徴データを受け取ることなどを行うデータ処理部２４と、セントロイド移動量を算出することなどを行う算出部２６と、を備える。セントロイド更新部１８は、セントロイド移動量がｇａｐ閾値より大きいか否かを判定する判定部２８と、近似近傍探索用インデックスを更新することなどを行う更新部３０と、を備える。 The initial centroid creation unit 16 includes a clustering unit 20 that performs a process of clustering feature data and the like, and a determination unit 22 that determines a gap threshold. The centroid update unit 18 includes a data processing unit 24 that receives feature data and the like, and a calculation unit 26 that calculates a centroid movement amount. The centroid update unit 18 includes a determination unit 28 that determines whether or not the centroid movement amount is larger than the gap threshold, and an update unit 30 that updates the approximate neighborhood search index.

逐次クラスタリング装置１０は、図示しないＣＰＵ、ＲＯＭ、及びメモリ１４などを備えている。ＣＰＵが後述する逐次クラスタリング処理プログラムを実行することにより、ＣＰＵが上記各部（１２、１６（２０、２２）、１８（２４〜３０））として機能する。 The sequential clustering apparatus 10 includes a CPU, a ROM, and a memory 14 that are not shown. When the CPU executes a sequential clustering processing program to be described later, the CPU functions as each of the above-described units (12, 16 (20, 22), 18 (24-30)).

次に、本実施の形態の作用を説明する。図２は、逐次クラスタリング装置１０が実行する逐次クラスタリング処理プログラム（メモリ１４に記憶されている）を示すフローチャートである。図２で示すように、ステップ３２で、逐次クラスタリング装置１０の前処理部１２は、特徴データを受信する。特徴データは、画像データに基づいて定められた特徴点の画素の輝度と、特徴点の画素の周囲に位置する複数の画素の各々の輝度との変化量の平均値である。特徴点は、隣接する画素間の輝度の変化量などに基づいて定められる画像上でのコーナーなどの画素の位置である。 Next, the operation of the present embodiment will be described. FIG. 2 is a flowchart showing a sequential clustering processing program (stored in the memory 14) executed by the sequential clustering apparatus 10. As shown in FIG. 2, in step 32, the preprocessing unit 12 of the sequential clustering apparatus 10 receives feature data. The feature data is an average value of the amount of change between the brightness of the pixel at the feature point determined based on the image data and the brightness of each of the plurality of pixels located around the pixel at the feature point. A feature point is the position of a pixel such as a corner on an image determined based on the amount of change in luminance between adjacent pixels.

ステップ３４で、前処理部１２は、メモリ１４のセントロイド群記憶領域８２（図５参照）に、初期セントロイド群が記憶されているか否かを判断することにより、初期セントロイドが作成済みかどうかを判断する。初期セントロイドが作成済みと判断されなかった場合には、ステップ３６で、前処理部１２は、特徴データを、メモリ１４の特徴データ記憶領域８０に記憶する。ステップ３８で、前処理部１２は、記憶した特徴データの数が所定数に達したかどうかを判定する。記憶した特徴データの数が所定数に達したと、前処理部１２が判断しなかった場合には、逐次クラスタリング処理はステップ３２に戻る。記憶した特徴データの数が所定数に達したと、前処理部１２が判定した場合には、ステップ４０で、前処理部１２は、初期セントロイド作成部１６を起動する。ステップ４２で、初期セントロイド作成部１６は、初期セントロイドを作成する。その後、逐次クラスタリング処理はステップ３２に戻る。このように、初期セントロイドが作成され、ステップ３２で特徴データが受信されると、ステップ３４は肯定判定となる。この場合、ステップ４４で、前処理部１２は、ステップ３２で受信された特徴データをセントロイド更新部１８に出力する。ステップ４６で、セントロイド更新部１８は、セントロイドの更新処理を実行する。 In step 34, the preprocessing unit 12 determines whether or not the initial centroid group has been created by determining whether or not the initial centroid group is stored in the centroid group storage area 82 (see FIG. 5) of the memory 14. Judge whether. If it is not determined that the initial centroid has been created, the preprocessing unit 12 stores the feature data in the feature data storage area 80 of the memory 14 in step 36. In step 38, the preprocessing unit 12 determines whether or not the number of stored feature data has reached a predetermined number. If the preprocessing unit 12 does not determine that the number of stored feature data has reached a predetermined number, the sequential clustering process returns to step 32. If the preprocessing unit 12 determines that the number of stored feature data has reached a predetermined number, the preprocessing unit 12 activates the initial centroid creation unit 16 in step 40. In step 42, the initial centroid creation unit 16 creates an initial centroid. Thereafter, the sequential clustering process returns to step 32. Thus, when the initial centroid is created and the feature data is received in step 32, step 34 is affirmative. In this case, in step 44, the preprocessing unit 12 outputs the feature data received in step 32 to the centroid update unit 18. In step 46, the centroid update unit 18 executes centroid update processing.

図３は、初期セントロイド作成部１６が実行する初期セントロイドの作成処理プログラムを示すフローチャートである。図３のステップ５２で、初期セントロイド作成部１６のクラスタリング部２０は、メモリ１４の特徴データ記憶領域８０（図５参照）から上記所定数の特徴データを読み出す。ステップ５４で、クラスタリング部２０は、クラスタリング処理を実行する。クラスタリング処理は例えばＫ−ｍｅａｎｓクラスタリングなどの処理によって実現できる。クラスタリング処理により、各特徴データが所定個数のクラスタのいずれかに分類されると共に、各クラスタのセントロイド（初期セントロイド）が計算され、各クラスタの初期セントロイドがセントロイド群記憶領域８２（図５参照）に記憶される。 FIG. 3 is a flowchart showing an initial centroid creation processing program executed by the initial centroid creation unit 16. In step 52 of FIG. 3, the clustering unit 20 of the initial centroid creation unit 16 reads the predetermined number of feature data from the feature data storage area 80 (see FIG. 5) of the memory 14. In step 54, the clustering unit 20 executes a clustering process. The clustering process can be realized by a process such as K-means clustering. By the clustering process, each feature data is classified into one of a predetermined number of clusters, and the centroid (initial centroid) of each cluster is calculated, and the initial centroid of each cluster is stored in the centroid group storage area 82 (FIG. 5).

ステップ５６で、クラスタリング部２０は、クラスタリング処理結果の初期セントロイド群を用いて近似近傍探索用インデックスを作成する。近似近傍探索用インデックスは、新たに特徴データが受信されると、この特徴データがどのクラスタに属するのかを判断するための情報である。例えば、各クラスタのセントロイドをＡ、Ｂ、Ｃ・・・とした場合、近似近傍探索用インデックスは、位置を基準としたセントロイドＡ、Ｂ、Ｃ・・・の木構造で形成される（図５の記憶領域８８参照）。具体的には、セントロイドＡ、Ｂ、Ｃ・・・を、ある位置（根ノード）を基準に右側に位置するセントロイドのまとまりと、左側に位置するセントロイドのまとまりと、に分ける。各まとまりを更に同様に別の位置を基準に右側及び左側に位置するより小さいまとまりに分ける。新たに特徴データが受信され、新たな特徴データが上記ある位置よりも右側に位置する場合には、左側のまとまりのセントロイドとの関係を考慮しなくてもよいので、より迅速に新たな特徴データがどのクラスタに分類されるのかを探索することができる。この処理はＦＬＡＮＮなどに代表される既存の近似近傍探索処理で実現される。 In step 56, the clustering unit 20 creates an approximate neighborhood search index using the initial centroid group of the clustering processing result. The approximate neighborhood search index is information for determining which cluster the feature data belongs to when new feature data is received. For example, if the centroid of each cluster is A, B, C,..., The approximate neighborhood search index is formed with a tree structure of centroids A, B, C. (See the storage area 88 in FIG. 5). Specifically, the centroids A, B, C,... Are divided into a centroid group located on the right side and a centroid group located on the left side based on a certain position (root node). Each group is further divided into smaller groups located on the right side and the left side based on another position. When new feature data is received and the new feature data is located on the right side of the certain position, it is not necessary to consider the relationship with the left centroid. It is possible to search to which cluster the data is classified. This process is realized by an existing approximate neighborhood search process represented by FLANN.

ステップ５８で、決定部２２は、近似近傍探索用インデックスの更新要否を判定するためのｇａｐ閾値を算出することにより決定する。以下、ｇａｐ閾値の算出方法について説明する。ここで、セントロイドの数をＫ、受信した特徴データの数をＮとする。この場合、ｇａｐ閾値は以下のように算出する。 In step 58, the determination unit 22 determines the gap threshold value for determining whether or not the approximate neighborhood search index needs to be updated. Hereinafter, a method for calculating the gap threshold will be described. Here, the number of centroids is K, and the number of received feature data is N. In this case, the gap threshold value is calculated as follows.

Ｎ＞２Ｋの場合には、ｇａｐ閾値は（式１）に従い算出する。
［数１］
ｇＴＨ＝（√（ｅ）／Ｎ）*Ｋ*ｂｉａｓ・・・ (式１) In the case of N> 2K, the gap threshold is calculated according to (Equation 1).
[Equation 1]
gTH = (√ (e) / N) * K * bias (Equation 1)

ここで、ｅはクラスタリング結果のコンパクト尺度であり、（式２）で求められる。
［数２］
ｅ＝Σ_i||samples_i−centers_labelsi||²・・・（式２）
ここで、ｓａｍｐｌｅｓ_iはｉ番目の特徴データを、ｃｅｎｔｅｒｓ_labelsiはｉ番目の特徴データがクラスタリングにより属したクラスタのセントロイドを示す。 Here, e is a compact scale of the clustering result, and is obtained by (Equation 2).
[Equation 2]
e = Σ _i || samples _i −centers _labelsi || ² (Formula 2)
Here, samples _i indicates the i-th feature data, and centers _labels i indicates the centroid of the cluster to which the i-th feature data belongs by clustering.

また（式１）において、ｂｉａｓはインデキシング処理の実施頻度に作用するパラメータである。大きくすると実施頻度が低くなり高速になるが、逐次クラスタリングの精度が下がる。逆に小さくすると、実施頻度が高まり逐次クラスタリングの精度は良くなるが処理時間が掛かるようになる。例えば１．０などの値を用いることができる。 In (Expression 1), bias is a parameter that affects the frequency of indexing processing. Increasing the frequency reduces the frequency of implementation and increases the speed, but decreases the accuracy of sequential clustering. On the other hand, if it is made smaller, the frequency of execution increases and the accuracy of sequential clustering improves, but it takes longer processing time. For example, a value such as 1.0 can be used.

このようにｇａｐ閾値は、例えば、全ての特徴データが、対応するセントロイドに位置する場合（最もまとまっている場合）には、ｇａｐ閾値は０となる。特徴データが、対応するセントロイドからずれに従って、ｇａｐ閾値は大きくなる。よって、ｇａｐ閾値は、各特徴データの、対応するセントロイドを基準としたまとまりの度合いを全てのクラスタで総合的に示す値である。 As described above, the gap threshold value is 0 when, for example, all the feature data are located in the corresponding centroid (when they are all together). The gap threshold increases as the feature data deviates from the corresponding centroid. Therefore, the gap threshold is a value that comprehensively indicates the degree of grouping of each feature data with reference to the corresponding centroid in all clusters.

Ｎ＜＝２Ｋの場合には、ｇａｐ閾値は（式３）に従い算出する。
［数３］
ｇＴＨ＝α*ｂｉａｓ・・・(式３)
ここで、αは予め与えられる固定値であり、例えば１００などの比較的大きな値が用いられる。 In the case of N <= 2K, the gap threshold value is calculated according to (Equation 3).
[Equation 3]
gTH = α * bias (Expression 3)
Here, α is a fixed value given in advance, and a relatively large value such as 100 is used.

ステップ６０で、決定部２２は、初期セントロイド群、近似近傍探索用インデックス、ｇａｐ閾値のそれぞれを、対応する記憶領域８２、８８、８４Ａに記憶する。 In Step 60, the determination unit 22 stores the initial centroid group, the approximate neighborhood search index, and the gap threshold value in the corresponding storage areas 82, 88, and 84A, respectively.

図４は、セントロイド更新部１８が実行するセントロイドの更新処理を示すフローチャートである。図４のステップ６２で、データ処理部２４は、図２のステップ３４が肯定判定され、前処理部１２がステップ４４で出力した特徴データを受け取る。ステップ６４で、データ処理部２４は、メモリ１４から初期セントロイド群、近似近傍探索用インデックス、及びｇａｐ閾値を読み出す。 FIG. 4 is a flowchart showing a centroid update process executed by the centroid update unit 18. In step 62 of FIG. 4, the data processing unit 24 receives the feature data output in step 44 by the preprocessing unit 12 in which affirmative determination is made in step 34 of FIG. 2. In step 64, the data processing unit 24 reads the initial centroid group, the approximate neighborhood search index, and the gap threshold value from the memory 14.

ステップ６６で、算出部２６は、受け取った特徴データが属するクラスタを決定し、セントロイド位置を再算出して更新する。より詳細に説明すると、まず、算出部２６は、受け取った特徴データと読み出した近似近傍探索用インデックスとに基づいて、受け取った特徴データに対して近傍探索を行うことで最近傍距離となるセントロイドを決定し、属するクラスタを決定する。そして、算出部２６は、そのセントロイドの位置を、受け取った特徴データと、そのセントロイドのクラスタに属する全ての特徴データとを用いて更新する。これは、新たに特徴データが加わったため、クラスタのセントロイドの位置が移動するからである。ステップ６６の処理は従来の方法と同様である。 In step 66, the calculation unit 26 determines a cluster to which the received feature data belongs, and recalculates and updates the centroid position. In more detail, first, the calculation unit 26 performs a neighborhood search on the received feature data based on the received feature data and the read approximate neighborhood search index, thereby obtaining a centroid that is the nearest neighbor distance. And the cluster to which it belongs. Then, the calculation unit 26 updates the position of the centroid using the received feature data and all feature data belonging to the centroid cluster. This is because the position of the centroid of the cluster moves because feature data is newly added. The processing in step 66 is the same as the conventional method.

ステップ６８で、算出部２６は、更新したセントロイドの移動量をセントロイド移動量に加算して、セントロイド移動量記憶領域８６Ａに記憶する。
ここで、更新したセントロイドの移動量は、例えば、更新前後のセントロイド間のユークリッド距離を求めることで算出される。
一方、セントロイド移動量は、セントロイドの移動量の累積量である。上記のようにｇａｐ閾値が各特徴データの、対応するセントロイドを基準にしたまとまりの度合いを全てのクラスタで総合的に示す値であるので、セントロイド移動量も、全てのクラスタで総合的に示す値である。具体的には、あるセントロイドが更新されて、そのセントロイドの移動量が計算された場合、計算されたセントロイドの移動量がセントロイド移動量に加算される。また、別のセントロイドが更新されて、その別のセントロイドの移動量が計算された場合、計算されたセントロイドの移動量が上記セントロイド移動量に加算される。このようにセントロイド移動量は、全てのセントロイド（クラスタ）について１つ存在する。なお、セントロイド移動量は、逐次クラスタリング装置１０が起動した際に０で初期化されているものとする。 In step 68, the calculation unit 26 adds the updated movement amount of the centroid to the centroid movement amount and stores it in the centroid movement amount storage area 86A.
Here, the movement amount of the updated centroid is calculated, for example, by obtaining the Euclidean distance between the centroids before and after the update.
On the other hand, the centroid movement amount is a cumulative amount of centroid movement amount. As described above, the gap threshold is a value that comprehensively indicates the degree of unity of each feature data with respect to the corresponding centroid in all clusters. Therefore, the centroid movement amount is also comprehensively determined in all clusters. This is the value shown. Specifically, when a certain centroid is updated and the movement amount of the centroid is calculated, the calculated movement amount of the centroid is added to the centroid movement amount. When another centroid is updated and the movement amount of the other centroid is calculated, the calculated movement amount of the centroid is added to the centroid movement amount. Thus, one centroid movement amount exists for all centroids (clusters). Note that the centroid movement amount is initialized to 0 when the sequential clustering apparatus 10 is activated.

ステップ７０で、判定部２８は、算出したセントロイド移動量が、読み出したｇａｐ閾値より大きいかどうかを判断する。セントロイド移動量がｇａｐ閾値より大きい場合には、現在の近似近傍探索用インデックスを作成するために用いたセントロイド群が、実際のセントロイド群よりも大きくずれている。このため、現在の近似近傍探索用インデックスでは、新たな特徴データを適正にクランスタリングすることができないと判断することができる。 In step 70, the determination unit 28 determines whether or not the calculated centroid movement amount is larger than the read gap threshold value. When the centroid movement amount is larger than the gap threshold, the centroid group used to create the current approximate neighborhood search index is greatly deviated from the actual centroid group. For this reason, it can be determined that the new feature data cannot be properly clustered with the current approximate neighborhood search index.

そこで、ステップ７２で、更新部３０は、近似近傍探索用インデックスを現在のセントロイド群を用いて更新する。このように、近似近傍探索用インデックスを作成するために用いたセントロイド群が、現在のセントロイド群を基準に作成されたので、ステップ７４で、更新部３０は、セントロイド移動量を０で初期化する。ステップ７６で、更新部３０は、現在のセントロイド群をメモリ１４のセントロイド群記憶領域８２に記憶する。これにより、セントロイドの更新処理が終了する。 Therefore, in step 72, the updating unit 30 updates the approximate neighborhood search index using the current centroid group. In this way, since the centroid group used to create the approximate neighborhood search index is created based on the current centroid group, the update unit 30 sets the centroid movement amount to 0 in step 74. initialize. In step 76, the update unit 30 stores the current centroid group in the centroid group storage area 82 of the memory 14. Thus, the centroid update process is completed.

一方、算出したセントロイド移動量がｇａｐ閾値以下の場合は、ステップ７０が否定判定され、セントロイドの更新処理は、ステップ７２、７４をスキップし、ステップ７６に進む。よって、近似近傍探索用インデックスが更新されることなく、ステップ７６で、更新部３０は、現在のセントロイド群をメモリ１４のセントロイド群記憶領域８２に記憶して、セントロイドの更新処理が終了する。 On the other hand, if the calculated centroid movement amount is equal to or less than the gap threshold value, a negative determination is made in step 70, and the centroid update process skips steps 72 and 74 and proceeds to step 76. Therefore, without updating the approximate neighborhood search index, the update unit 30 stores the current centroid group in the centroid group storage area 82 of the memory 14 in step 76, and the centroid update process ends. To do.

以上説明したように、第１の実施の形態に係る逐次クラスタリング装置１０によれば、初期クラスタ作成時に生成されたクラスタのコンパクト尺度を用いて、近似近傍探索用インデックスの更新要否を判定するためｇａｐ閾値を算出する。ｇａｐ閾値は、各特徴データのまとまり度合いを全てのクラスタで総合的に示す値である。 As described above, according to the sequential clustering device 10 according to the first embodiment, the necessity for updating the approximate neighborhood search index is determined using the cluster compact measure generated at the time of initial cluster creation. A gap threshold is calculated. The gap threshold is a value that comprehensively indicates the degree of grouping of each feature data in all clusters.

更新したセントロイドの移動量の累積値であるセントロイド移動量がｇａｐ閾値を超えなければ、特徴データのまとまりの度合いの範囲内でセントロイドが変化したと判断できる。従って、既存の近似近傍探索用インデックスで新たな特徴データのクラスタリング処理を適正に行うことができる。よって、既存の近似近傍探索用インデックスを更新する必要はない。 If the centroid movement amount, which is the cumulative value of the updated centroid movement amount, does not exceed the gap threshold value, it can be determined that the centroid has changed within the range of the degree of unity of the feature data. Therefore, it is possible to appropriately perform clustering processing of new feature data using the existing approximate neighborhood search index. Therefore, there is no need to update the existing approximate neighborhood search index.

しかし、セントロイド移動量がｇａｐ閾値を超えると、セントロイドが特徴データのまとまりの度合いの範囲を超えて変化したので既存の近似近傍探索用インデックスでは、新たな特徴データのクラスタリング処理を適正に行うことができない。そこで、最新のセントロイド群を用いて近似近傍探索用インデックスを更新する。 However, if the centroid movement amount exceeds the gap threshold, the centroid has changed beyond the range of the degree of unity of the feature data, so the existing approximate neighborhood search index appropriately performs clustering processing of new feature data. I can't. Therefore, the approximate neighborhood search index is updated using the latest centroid group.

ところで、近似近傍探索用インデックスを更新することは、全てのセントロイドを考慮するため、処理時間を要する。しかし、セントロイド移動量がｇａｐ閾値を超えた場合にのみ近似近傍探索用インデックスを更新するので、近似近傍探索用インデックスの更新処理の実施回数を減らすことができる。よって、セントロイド数Ｋが大きい場合でも高速なクラスタリング処理を実現することができる。 By the way, updating the approximate neighborhood search index takes a processing time in consideration of all centroids. However, since the approximate neighborhood search index is updated only when the centroid movement amount exceeds the gap threshold, the number of times of execution of the update process of the approximate neighborhood search index can be reduced. Therefore, high-speed clustering processing can be realized even when the centroid number K is large.

[第２の実施の形態]
次に、第２の実施の形態に係る逐次クラスタリング装置１０について説明する。なお、第２の実施の形態の逐次クラスタリング装置１０は、第１の実施の形態と同様の構成であるので、その説明を省略する。また、第２の実施の形態による逐次クラスタリング装置１０の作用は第１の実施の形態の逐次クラスタリング装置１０作用とほぼ同様であるので異なる部分についてのみ説明する。 [Second Embodiment]
Next, the sequential clustering apparatus 10 according to the second embodiment will be described. Note that the sequential clustering apparatus 10 of the second embodiment has the same configuration as that of the first embodiment, and thus description thereof is omitted. The operation of the sequential clustering device 10 according to the second embodiment is almost the same as the operation of the sequential clustering device 10 according to the first embodiment, so only the differences will be described.

クラスタには、第１に、特徴データが、対応するセントロイドに比較的近い位置に位置するようなまとまりの度合いが比較的大きいコンパクトなクラスタが含まれる。第２に、特徴データが、対応するセントロイドに比較的遠い位置に位置するようなまとわりの度合いが比較的小さいコンパクトでないクラスタが含まれる。 First, the cluster includes a compact cluster having a relatively large degree of grouping such that the feature data is located at a position relatively close to the corresponding centroid. Second, non-compact clusters are included that have a relatively small degree of clutter such that the feature data is located relatively far from the corresponding centroid.

しかし、第１の実施の形態の方法では、ｇａｐ閾値を全セントロイドに対して１つのみ設定する。このため、コンパクトなクラスタのセントロイドの移動量と、コンパクトでないクラスタのセントロイドの移動量とが同等に扱われることとになる。よって、セントロイド移動量がｇａｐ閾値を超えていなくても、コンパクトなクラスタではセントロイドの移動量が大きい場合、近似近傍探索用インデックスが更新されず、新しい特徴データクラスタリング処理の精度が劣化することがある。第２の実施の形態はこの問題を解決するためになされたものである。 However, in the method of the first embodiment, only one gap threshold is set for all centroids. For this reason, the movement amount of the centroid of the compact cluster and the movement amount of the centroid of the non-compact cluster are treated equally. Therefore, even if the centroid movement amount does not exceed the gap threshold, if the centroid movement amount is large in a compact cluster, the approximate neighborhood search index is not updated, and the accuracy of the new feature data clustering process deteriorates. There is. The second embodiment has been made to solve this problem.

第２の実施の形態の初期セントロイド作成部１６の決定部２２では、図３のステップ５８で、ｇａｐ閾値を以下の処理によりセントロイド毎のｇａｐ閾値ｇＴＨ＿ｃとして算出する。図６は、メモリ１４の記憶領域１４Ｂの内容を示す図である。図６に示すようにｇａｐ閾値記憶領域８４Ｂ内の、対応するセントロイドに対応する記憶領域にｇａｐ閾値を記憶する。 In step 58 of FIG. 3, the determination unit 22 of the initial centroid creation unit 16 of the second exemplary embodiment calculates the gap threshold value as the gap threshold value gTH_c for each centroid by the following processing. FIG. 6 is a diagram showing the contents of the storage area 14 </ b> B of the memory 14. As shown in FIG. 6, the gap threshold value is stored in the storage area corresponding to the corresponding centroid in the gap threshold value storage area 84B.

Ｎ＞２Ｋの場合には、セントロイド毎のｇａｐ閾値ｇＴＨ＿ｃは（式４）に従い算出する。
［数４］
ｇＴＨ＿ｃ＝（√（ｅ）／Ｎｃ）*Ｋ*ｂｉａｓ・・・(式４) When N> 2K, the gap threshold value gTH_c for each centroid is calculated according to (Expression 4).
[Equation 4]
gTH_c = (√ (e) / Nc) * K * bias (Expression 4)

ここで、ｇＴＨ＿ｃは、あるセントロイドｃに対するｇａｐ閾値、Ｎｃはあるクラスタに属する特徴データの数である。ｇａｐ閾値ｇＴＨ＿ｃは、各特徴データの、対応するセントロイドを基準としたまとまりの度合いをクラスタ毎に示す値である。 Here, gTH_c is a gap threshold for a certain centroid c, and Nc is the number of feature data belonging to a certain cluster. The gap threshold value gTH_c is a value indicating the degree of grouping of each feature data with respect to the corresponding centroid for each cluster.

Ｎ＜＝２Ｋの場合には、クラスタ毎のｇａｐ閾値ｇＴＨ＿ｃは（式５）に従い算出する。
［数５］
ｇＴＨ＿ｃ＝α*ｂｉａｓ・・・(式５)
なお、α、ｂｉａｓは、上記と同様であるが、クラスタ毎に定めてもよい。 In the case of N <= 2K, the gap threshold value gTH_c for each cluster is calculated according to (Expression 5).
[Equation 5]
gTH_c = α * bias (Expression 5)
Α and bias are the same as described above, but may be determined for each cluster.

第２の実施の形態のセントロイド更新部１８の算出部２６は、図４のステップ６８で、セントロイド移動量を、新しい特徴データが属するクラスタのセントロイドｃのＣｅｎｔＴｒａｎｓ＿ｃとして加算する。算出部２６は、図６に示すように、セントロイド移動量記憶領域８６Ｂ内の、対応するセントロイドに対応する記憶領域に記憶する。 The calculation unit 26 of the centroid update unit 18 according to the second embodiment adds the centroid movement amount as CentTrans_c of the centroid c of the cluster to which the new feature data belongs in step 68 of FIG. As shown in FIG. 6, the calculation unit 26 stores in a storage area corresponding to the corresponding centroid in the centroid movement amount storage area 86 </ b> B.

ここで、セントロイド毎のセントロイド移動量ＣｅｎｔＴｒａｎｓ＿ｃは、あるセントロイドｃの移動量の蓄積量を記憶したものであり、逐次クラスタリング装置１０が起動した際に全ての値が０で初期化されているものとする。 Here, the centroid movement amount CentTrans_c for each centroid stores the accumulated amount of movement of a certain centroid c, and all values are initialized to 0 when the clustering device 10 is sequentially activated. It shall be.

図４のステップ７０で、セントロイド移動量がｇａｐ閾値より大きくなるかどうかを判定する際には、セントロイド移動量ＣｅｎｔＴｒａｎｓ＿ｃが、セントロイドｃにおけるｇａｐ閾値ｇＴＨ＿ｃより大きくなるかどうかが判定される。 In step 70 of FIG. 4, when it is determined whether the centroid movement amount is larger than the gap threshold value, it is determined whether the centroid movement amount CentTrans_c is larger than the gap threshold value gTH_c in the centroid c.

セントロイド移動量ＣｅｎｔＴｒａｎｓ＿ｃがｇａｐ閾値ｇＴＨ＿ｃより大きい場合には、近似近傍探索用インデックスが更新され（ステップ７２）、セントロイド移動量ＣｅｎｔＴｒａｎｓ＿ｃが０で初期化される（ステップ７４）。一方、セントロイド移動量ＣｅｎｔＴｒａｎｓ＿ｃがｇａｐ閾値ｇＴＨ＿ｃ以下の場合は、近似近傍探索用インデックスは更新されない。 When the centroid movement amount CentTrans_c is larger than the gap threshold value gTH_c, the approximate neighborhood search index is updated (step 72), and the centroid movement amount CentTrans_c is initialized to 0 (step 74). On the other hand, when the centroid movement amount CentTrans_c is equal to or smaller than the gap threshold value gTH_c, the approximate neighborhood search index is not updated.

以上説明した処理により、コンパクトなクラスタのセントロイドと、コンパクトでないクラスタのセントロイドのそれぞれの移動量を別々に扱うことが可能となり、精度を劣化させずに高速なクラスタリング処理を実現することができる。 With the processing described above, it becomes possible to handle the movement amounts of the centroid of a compact cluster and the centroid of a non-compact cluster separately, and high-speed clustering processing can be realized without degrading accuracy. .

以上説明した各実施の形態では、逐次クラスタリング装置１０が実行する逐次クラスタリング処理プログラムをコンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、逐次クラスタリング装置１０に係る上述した種々の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 In each of the embodiments described above, the sequential clustering processing program executed by the sequential clustering apparatus 10 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. Thus, the above-described various processes related to the sequential clustering apparatus 10 may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。また、逐次クラスタリング処理プログラムは、逐次クラスタリング処理プログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The sequential clustering processing program may be transmitted from a computer system that stores the sequential clustering processing program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

なお、特徴データは、例えば、音データのＬＰＣケプストラム係数でもよい。 The feature data may be, for example, an LPC cepstrum coefficient of sound data.

以上、本発明を実施の形態の例に基づき具体的に説明したが、上述の実施の形態の説明は、本発明を説明するためのものであって、特許請求の範囲に記載の発明を限定しあるいは範囲を減縮するように解すべきではない。また、本発明の各部構成は上述の実施の形態に限らず、特許請求の範囲に記載の技術的範囲内で種々の変形が可能であることは勿論である。 The present invention has been specifically described above based on the example of the embodiment. However, the above description of the embodiment is for explaining the present invention, and the invention described in the claims is limited. Or should not be construed as reducing the scope. The configuration of each part of the present invention is not limited to the above-described embodiment, and various modifications can be made within the technical scope described in the claims.

１０逐次クラスタリング装置
１２前処理部
１４メモリ
１４Ａ記憶領域
１４Ｂ記憶領域
１６初期セントロイド作成部
１８セントロイド更新部
２０クラスタリング部
２２決定部
２４データ処理部
２６算出部
２８判定部
３０更新部
８０特徴データ記憶領域
８２セントロイド群記憶領域
８４Ａｇａｐ閾値記憶領域
８４Ｂｇａｐ閾値記憶領域
８６Ａセントロイド移動量記憶領域
８６Ｂセントロイド移動量記憶領域
８８近似近傍探索用インデックス記憶領域 DESCRIPTION OF SYMBOLS 10 Sequential clustering apparatus 12 Preprocessing part 14 Memory 14A Storage area 14B Storage area 16 Initial centroid creation part 18 Centroid update part 20 Clustering part 22 Determination part 24 Data processing part 26 Calculation part 28 Determination part 30 Update part 80 Feature data storage Area 82 Centroid group storage area 84A gap threshold storage area 84B gap threshold storage area 86A Centroid movement amount storage area 86B Centroid movement amount storage area 88 Approximate neighborhood search index storage area

Claims

複数の特徴データの各々が、各々のセントロイドが定められた所定個数のクラスタの何れかにクラスタリングされていると共に、新たな特徴データをクラスタリングするための近似近傍探索用インデックスと、前記近似近傍探索用インデックスの更新要否を判定するための閾値とが予め定められ、新たな特徴データを受信する毎に、受信した前記新たな特徴データを、前記近似近傍探索用インデックスを用いてクラスタリングする逐次クラスタリング装置であって、
新たな特徴データを受信する毎に、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出する算出部と、
前記算出されたセントロイド移動量が前記閾値を超えているか否かを判定する判定部と、
前記算出されたセントロイド移動量が前記閾値を超えていると判定された場合にのみ、前記近似近傍探索用インデックスを更新する更新部と、
を備えた逐次クラスタリング装置。 Each of the plurality of feature data is clustered into any one of a predetermined number of clusters in which each centroid is defined, an approximate neighborhood search index for clustering new feature data, and the approximate neighborhood search A threshold for determining whether or not the index for updating is necessary is determined in advance, and each time new feature data is received, the received new feature data is clustered using the approximate neighborhood search index. A device,
Every time new feature data is received, calculation is performed to calculate the centroid movement amount of the cluster to which the received new feature data belongs, and to calculate the centroid movement amount that is the accumulated amount of the calculated centroid movement amount. And
A determination unit that determines whether the calculated centroid movement amount exceeds the threshold;
An update unit that updates the approximate neighborhood search index only when it is determined that the calculated centroid movement amount exceeds the threshold;
Sequential clustering device with

前記閾値は、前記所定個数のクラスタの各々毎に定められ、
前記セントロイド移動量は、前記所定個数のクラスタの各々毎に算出され、
前記算出部は、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、前記新たな特徴データが属するクラスタの前記セントロイド移動量を算出し、
前記判定部は、前記算出されたセントロイド移動量が、前記新たな特徴データが属するクラスタに対して定められた前記閾値を超えているか否かを判定する
請求項１に記載の逐次クラスタリング装置。 The threshold is determined for each of the predetermined number of clusters,
The centroid movement amount is calculated for each of the predetermined number of clusters,
The calculation unit calculates a movement amount of a centroid of a cluster to which the received new feature data belongs, calculates a movement amount of the centroid of a cluster to which the new feature data belongs,
The sequential clustering device according to claim 1, wherein the determination unit determines whether or not the calculated centroid movement amount exceeds the threshold value determined for a cluster to which the new feature data belongs.

前記特徴データは、画像データに基づいて定められた特徴点の画素の輝度と、前記特徴点の画素の周囲に位置する複数の画素の各々の輝度との変化量の平均値である請求項１又は２記載の逐次クラスタリング装置。 2. The feature data is an average value of an amount of change between a luminance of a pixel at a feature point determined based on image data and a luminance of each of a plurality of pixels located around the pixel at the feature point. Or the sequential clustering apparatus of 2 description.

複数の特徴データの各々が、各々のセントロイドが定められた所定個数のクラスタの何れかにクラスタリングされていると共に、新たな特徴データをクラスタリングするための近似近傍探索用インデックスと、前記近似近傍探索用インデックスの更新要否を判定するための閾値とが予め定められ、新たな特徴データを受信する毎に、受信した前記新たな特徴データを、前記近似近傍探索用インデックスを用いてクラスタリングする逐次クラスタリング方法であって、
算出部が、新たな特徴データを受信する毎に、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、算出したセントロイドの移動量の累積量であるセントロイド移動量を算出し、
判定部が、前記算出されたセントロイド移動量が前記閾値を超えているか否かを判定し、
更新部が、前記算出されたセントロイド移動量が前記閾値を超えていると判定された場合にのみ、前記近似近傍探索用インデックスを更新する、
ことを含む逐次クラスタリング方法。 Each of the plurality of feature data is clustered into any one of a predetermined number of clusters in which each centroid is defined, an approximate neighborhood search index for clustering new feature data, and the approximate neighborhood search A threshold for determining whether or not the index for updating is necessary is determined in advance, and each time new feature data is received, the received new feature data is clustered using the approximate neighborhood search index. A method,
Each time the calculation unit receives new feature data, it calculates the centroid movement amount of the cluster to which the received new feature data belongs, and the centroid movement amount that is the accumulated amount of the calculated centroid movement amount To calculate
The determination unit determines whether or not the calculated centroid movement amount exceeds the threshold value,
The update unit updates the approximate neighborhood search index only when it is determined that the calculated centroid movement amount exceeds the threshold.
A sequential clustering method.

前記閾値は、前記所定個数のクラスタの各々毎に定められ、
前記セントロイド移動量は、前記所定個数のクラスタの各々毎に算出され、
前記算出部は、受信した前記新たな特徴データが属するクラスタのセントロイドの移動量を算出し、前記新たな特徴データが属するクラスタの前記セントロイド移動量を算出し、
前記判定部は、前記算出されたセントロイド移動量が、前記新たな特徴データが属するクラスタに対して定められた前記閾値を超えているか否かを判定する
請求項４に記載の逐次クラスタリング方法。 The threshold is determined for each of the predetermined number of clusters,
The centroid movement amount is calculated for each of the predetermined number of clusters,
The calculation unit calculates a movement amount of a centroid of a cluster to which the received new feature data belongs, calculates a movement amount of the centroid of a cluster to which the new feature data belongs,
The sequential clustering method according to claim 4, wherein the determination unit determines whether the calculated centroid movement amount exceeds the threshold value determined for a cluster to which the new feature data belongs.

前記特徴データは、画像データに基づいて定められた特徴点の画素の輝度と、前記特徴点の画素の周囲に位置する複数の画素の各々の輝度との変化量の平均値である請求項４又は５記載の逐次クラスタリング方法。 5. The feature data is an average value of a change amount between a brightness of a pixel at a feature point determined based on image data and a brightness of each of a plurality of pixels located around the pixel at the feature point. Or the sequential clustering method of 5.

コンピュータを、請求項１〜請求項３の何れか１項記載の逐次クラスタリング装置の各部として機能させるためのプログラム。 The program for functioning a computer as each part of the sequential clustering apparatus in any one of Claims 1-3.