JP3089572B2

JP3089572B2 - Clustering method

Info

Publication number: JP3089572B2
Application number: JP04034382A
Authority: JP
Inventors: 裕酒匂; 正博阿部
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-01-24
Filing date: 1992-01-24
Publication date: 2000-09-18
Anticipated expiration: 2015-09-18
Also published as: JPH05205058A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、パターン認識分野にお
けるクラスタリング方法、すなわち、複数のデータにお
いて類似したものどうしを１つのクラスタとしてまとめ
る方法に関連し、たとえば、特徴データから物体の種類
や形状を識別するような場面に適用できる方法に関する
ものにある。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a clustering method in the field of pattern recognition, that is, a method of combining similar data in a plurality of data into one cluster. It relates to a method that can be applied to a scene where identification is performed.

【０００２】[0002]

【従来の技術】従来の代表的なクラスタリング方法とし
ては、融合法と再配置法とがある（画像処理ハンドブッ
ク、高木幹雄編、東京大学出版会、機能編「分類」）。
融合法は、データ間の類似度を両者の距離とし、距離が
最も近いものどうしを融合してクラスタを作っていくも
のである。再配置法は、初期状態として適当なクラスタ
を与え、そのメンバデータを組み替えて少しずつよりよ
いクラスタを求めていくものである。2. Description of the Related Art As a conventional representative clustering method, there are a fusion method and a rearrangement method (Image Processing Handbook, edited by Mikio Takagi, University of Tokyo Press, "Classification", Function Edition).
In the fusion method, a similarity between data is defined as a distance between the two, and a cluster is formed by fusing the closest distances. In the rearrangement method, an appropriate cluster is given as an initial state, and its member data is rearranged to gradually obtain a better cluster.

【０００３】[0003]

【発明が解決しようとする課題】上記の両方法では、
（１）あらかじめ最終的なクラスタの数が明確な場合の
み適用できたが、一般には最適なクラスタの数を見積も
れない場合が多い、（２）データが追加された場合、再
度始めから全データを用いてクラスタリングしなおす必
要があり、処理量が多く処理時間が長くなる傾向にあっ
た等の問題があった。本発明は上記事情を鑑みて考案さ
れたもので、最適なクラスタの数を決定し、さらに、デ
ータが追加されても必要に応じてクラスタを追加、融合
し、常に最適な数のクラスタを高速に求めることのでき
る方法を提供することにある。In both of the above methods,
(1) It can be applied only when the final number of clusters is clear in advance, but in general, it is often impossible to estimate the optimal number of clusters. (2) When data is added, all data is re-started from the beginning. There is a problem that the clustering needs to be performed again and the processing amount is large and the processing time tends to be long. The present invention has been devised in view of the above circumstances, and determines the optimal number of clusters. Further, even if data is added, clusters are added and merged as needed, so that the optimal number of clusters is always high-speed. It is to provide a method that can be sought.

【０００４】[0004]

【課題を解決するための手段】上記課題を解決するため
に、本発明では、クラスタ数を設定し、複数のデータを
設定されたクラスタ数にクラスタリングし、各クラスタ
のデータメンバ数とデータの分散から最適なクラスタ数
の場合に最小となるようなクラスタリング状態評価量Ａ
を数３により求め、以下、クラスタ数を順次変更設定
し、上記のようにしてクラスタリング状態評価量Ａを求
め、各クラスタ数毎に求められたクラスタリング状態評
価量Ａの内の最も小さいクラスタリング状態評価量Ａを
判定し、該クラスタリング状態評価量Ａに対応するクラ
スタリングを最適なクラスタリングと決定する。また、
データが追加された場合、それによりクラスタが追加、
融合、または、現状維持となる状態を仮定し、それぞれ
の状態のクラスタリング状態評価量Ａを求め、求められ
たクラスタリング状態評価量Ａの内の最も小さいクラス
タリング状態評価量Ａを判定し、該クラスタリング状態
評価量Ａに対応するクラスタリングを最適なクラスタリ
ングと決定する。In order to solve the above-mentioned problems, in the present invention, the number of clusters is set, a plurality of data are clustered into the set number of clusters, and the number of data members of each cluster and the distribution of data are set. Clustering state evaluation amount A which is minimized in the case of the optimal number of clusters from
Is calculated by the following equation (3), the number of clusters is sequentially changed and set, and the clustering state evaluation amount A is obtained as described above, and the smallest clustering state evaluation among the clustering state evaluation amounts A obtained for each cluster number is obtained. The amount A is determined, and the clustering corresponding to the clustering state evaluation amount A is determined as the optimal clustering. Also,
As data is added, it adds clusters,
Assuming a state of fusion or maintenance of the current state, a clustering state evaluation amount A of each state is obtained, and the smallest clustering state evaluation amount A among the obtained clustering state evaluation amounts A is determined. The clustering corresponding to the evaluation amount A is determined as the optimal clustering.

【０００５】[0005]

【作用】上記数３によりクラスタリング状態評価量Ａを
計算し、最小のクラスタリング状態評価量Ａを選択する
ことで最適なクラスタ数を決定でき、データが追加され
た場合にも全データを用いて再度クラスタリング処理を
行なうことなく、既に求められた分散値等を用いて新た
なクラスタリング状態評価量Ａを求めることができるた
め、最適なクラスタリング状態をより速く得ることがで
きる。The optimum cluster number can be determined by calculating the clustering state evaluation amount A according to the above equation (3) and selecting the minimum clustering state evaluation amount A. Even when data is added, the entire data is used again. The new clustering state evaluation amount A can be obtained using the already obtained variance value or the like without performing the clustering processing, so that the optimum clustering state can be obtained more quickly.

【０００６】[0006]

【実施例】以下、本発明の実施例を図１、図２、図３、
図４、図５、図６を用いて説明する。本実施例は、具体
例として例えば、ベルトコンベアにより製品を構成する
部品を搬送し、ベルトコンベアの上部に撮像管を配置
し、撮像管により撮像した部品の映像（シルエット像）
からシルエット像の輪郭線の長さ、即ち、周囲長と、シ
ルエット像の面積等の物理データを得、この物理データ
を用いて製品の種類を決定する課題に本発明を適用した
ものであり、新たな製品の分類が課された場合にも適応
的にクラスタを追加し、それぞれの分類が可能となるこ
とを説明する。図６には部品のシルエット像の例を示
す。FIG. 1, FIG. 2, FIG. 3, FIG.
This will be described with reference to FIGS. In the present embodiment, as a specific example, for example, parts constituting a product are conveyed by a belt conveyor, an image pickup tube is arranged above the belt conveyor, and an image (silhouette image) of the part imaged by the image pickup tube
From the length of the outline of the silhouette image, i.e., the perimeter, and obtain physical data such as the area of the silhouette image, and apply the present invention to the problem of determining the type of product using this physical data, It will be explained that a cluster is adaptively added even when a new product classification is imposed, and that each classification can be performed. FIG. 6 shows an example of a silhouette image of a part.

【０００７】まず、予め与えられたデータを用いて学習
としてのクラスタリングを行ない、最適なクラスタリン
グを決定するための方法を図２を用いて説明する。この
方法は、基本的には、１個からＮ個にクラスタリングし
た場合それぞれのクラスタリング状態評価量Ａ（以下、
評価量Ａという）を計算し、最も小さい評価量Ａの場合
のクラスタ数のクラスタリングを最良の結果とするもの
である。First, a method for performing clustering as learning using given data and determining optimal clustering will be described with reference to FIG. This method is basically based on the clustering state evaluation amount A (hereinafter, referred to as “N”) when clustering is performed from 1 to N clusters.
An evaluation amount A) is calculated, and clustering of the number of clusters in the case of the smallest evaluation amount A is the best result.

【０００８】ステップ９では、クラスタ数ＩをＩ＝１に
設定すると共に、最大クラスタ数Ｎを設定する。ステッ
プ１０からステップ１２では最初にクラスタ数Ｉ＝１に
ついて処理を行ない、ステップ１３では該処理おけるク
ラスタ数が最大クラスタ数Ｎに達しているか否かを判定
し、達していなければＩをＩ＋１にしてステップ１０に
戻り、達していればステップ１４に進む。In step 9, the number of clusters I is set to I = 1, and the maximum number of clusters N is set. In steps 10 to 12, processing is first performed on the number of clusters I = 1. In step 13, it is determined whether or not the number of clusters in the processing has reached the maximum number of clusters N. If not, I is set to I + 1. Returning to step 10, if it has been reached, proceed to step 14.

【０００９】ステップ１０のクラスタリング実行部では
図３のフローで示した処理によりクラスタリングを行な
う。クラスタリングは最初クラスタ数Ｉ＝１について行
ない、順次Ｉの値を変化させ、Ｉ＝Ｎまで行なう。The clustering execution unit in step 10 performs clustering by the processing shown in the flow of FIG. The clustering is first performed for the number of clusters I = 1, and the value of I is sequentially changed until I = N.

【００１０】図３のステップ１５は全サンプルデータ
（メモリ内に格納されている）の分布中心付近にランダ
ムに標準データ（最初は１個）を配置するステップであ
り、配置された標準データの値はメモリに記憶される。Step 15 in FIG. 3 is a step of randomly arranging standard data (initially one piece) near the center of distribution of all sample data (stored in the memory). Is stored in the memory.

【００１１】ステップ１６は各サンプルデータとＩ個の
標準データ（最初はＩ＝１であるので標準データは１
個）とのそれぞれの距離を求め、サンプルデータ毎に最
も近い距離の標準データを決定し、それによってそれぞ
れの標準データを代表とする各クラスタに属するサンプ
ルデータを求めるステップであり、メモリ中の各サンプ
ルデータ毎に対応する標準データとの関係が記憶され
る。In step 16, each sample data and I pieces of standard data (the standard data is 1 at the beginning because I = 1)
And determining the closest standard data for each sample data, thereby obtaining the sample data belonging to each cluster represented by the respective standard data. The relationship with the standard data corresponding to each sample data is stored.

【００１２】ステップ１７は、数１で表される各標準デ
ータとそのクラスタに属するサンプルデータ（メンバ）
との誤差Ｅを求めるステップである。誤差Ｅ計算式であ
る数１で表される誤差Ｅの具体的な意味は、クラスごと
に各サンプルと標準データの距離の平均値を求め、それ
ぞれのクラスの平均値を加算したものである。従って、
各標準データの位置が良い場合には、誤差Ｅは小さな値
となる。In step 17, the standard data represented by the equation 1 and sample data (members) belonging to the cluster are
This is a step of obtaining an error E from the above. The specific meaning of the error E represented by Equation 1 as the error E calculation formula is obtained by calculating the average value of the distance between each sample and the standard data for each class, and adding the average value of each class. Therefore,
If the position of each standard data is good, the error E has a small value.

【００１３】[0013]

【数１】 (Equation 1)

【００１４】ステップ１８は、今回の誤差を誤差Ｅ
（ｔ）とし、既に求められ記憶されている前回の誤差を
誤差Ｅ（ｔ−１）として、誤差Ｅ（ｔ−１）と誤差Ｅ
（ｔ）との差が小さな値として予め設定された一定値よ
り小さいかどうかを判定するステップである。Step 18 calculates the current error as an error E
(T), the previous error already obtained and stored is referred to as error E (t-1), and error E (t-1) and error E
This is a step of determining whether or not the difference from (t) is smaller than a predetermined value set as a small value.

【００１５】ステップ１９は、ステップ１８の判定によ
り上記の誤差の差が一定値より大きい場合に数２にした
がって各標準データを移動させ、ステップ１６に戻るス
テップである。数２は標準データ変更式であり、標準デ
ータの変更の仕方を表現したものであり、右辺の第２項
が変更量である。これは誤差曲面の最急降下方向への変
更を意味しており、この変更により標準データは誤差Ｅ
が減る方向に移動することになる。Step 19 is a step of moving each standard data according to the formula 2 when the difference between the above-mentioned errors is larger than a predetermined value by the judgment of step 18, and returning to step 16. Equation 2 is a standard data change equation, which expresses how to change the standard data, and the second term on the right side is the change amount. This means that the error surface is changed in the steepest descent direction, and the standard data is changed to the error E by this change.
Will move in the direction of decreasing.

【００１６】[0016]

【数２】 (Equation 2)

【００１７】ステップ１８の判定により上記の誤差の差
が一定値より小さくなった場合は、その時の各標準デー
タを最終の標準データ、各サンプルデータはそれに最も
近い標準データのクラスタに属しているものとして処理
を終了する。If it is determined in step 18 that the difference between the above errors is smaller than a predetermined value, each standard data at that time belongs to the final standard data, and each sample data belongs to the closest standard data cluster. And the process ends.

【００１８】説明を図２のフローに戻して、ステップ１
１では上述した図３の処理を行なうステップ１０のクラ
スタリング実行部により得られたＩ個のクラスタの各ク
ラスタに属すメンバ（サンプルデータ）数とそのメンバ
の分散値を各クラスタ毎に求める。ステップ１２ではス
テップ１１で求めた分散値により、クラスタリング状態
評価量Ａ計算式である数３で表される評価量Ａの計算を
行なう。数３は、評価量Ａの具体的な計算式を表現した
もので、４項からなる。この評価量Ａが小さくなるため
に要求されることは、第１項のクラスタのメンバ数が大
きくなり、第２項の分散値が小さくなることである。こ
の要求は、一般には、相反するものであるため、両者が
バランスした点の最適ケースが求まることになる。な
お、第３項と第４項は補正項である。Returning to the description of FIG.
In step 1, the number of members (sample data) belonging to each cluster of the I clusters obtained by the clustering execution unit in step 10 for performing the processing of FIG. 3 described above and the variance of the members are obtained for each cluster. In step 12, the evaluation value A represented by Expression 3 which is the clustering state evaluation amount A calculation formula is calculated based on the variance value obtained in step 11. Equation 3 expresses a specific calculation formula of the evaluation amount A and includes four terms. What is required for the evaluation amount A to decrease is that the number of members of the first term cluster increases and the variance value of the second term decreases. Since these requirements are generally contradictory, an optimal case in which both are balanced is obtained. Note that the third and fourth terms are correction terms.

【００１９】[0019]

【数３】 (Equation 3)

【００２０】ステップ１３では、前述したようにクラス
タ数が最大クラスタ数Ｎに達しているか否かを判定し、
達していなければＩをＩ＋１にしてステップ１０に戻
り、達していればステップ１４に進む。そして、ステッ
プ１４では、得られた各クラスタ対応の評価量Ａの内の
最も小さい評価量Ａを選択し、選択された評価量Ａの場
合のクラスタ数のクラスタリングを最良の結果とするも
のである。In step 13, it is determined whether or not the number of clusters has reached the maximum number of clusters N as described above.
If not reached, I is set to I + 1 and the process returns to step 10, and if reached, the process proceeds to step 14. In step 14, the smallest evaluation amount A among the obtained evaluation amounts A corresponding to the respective clusters is selected, and clustering of the number of clusters in the case of the selected evaluation amount A is the best result. .

【００２１】図４は、以上説明した方法で最適なクラス
タリングを決定する過程を実際の物理データである前述
したシルエット像の周囲長と、面積を用いて示したもの
である。図４の（ａ）はクラスタリングを行う２次元デ
ータ（シルエット像の周囲長と面積）の分布状態を示し
ている。図４の（ｂ）〜（ｆ）はクラスタリングをクラ
スタ個数（Ｉ＝２〜６）に従って実行した結果とそれぞ
れの場合の評価量Ａの値を示したものである。各図内の
（・）が各クラスタの中心を表す標準データである。図
から明らかのように、評価量Ａを比較すると、４個のク
ラスタにクラスタリングしたものが最適であり、これは
データ分布からの人間の判断と一致している。FIG. 4 shows the process of determining the optimal clustering by the method described above, using the perimeter and the area of the above-described silhouette image, which is actual physical data. FIG. 4A shows a distribution state of two-dimensional data (perimeter and area of a silhouette image) to be clustered. FIGS. 4B to 4F show the results of executing the clustering according to the number of clusters (I = 2 to 6) and the value of the evaluation amount A in each case. (·) In each figure is standard data representing the center of each cluster. As is clear from the figure, when the evaluation amount A is compared, the one obtained by clustering into four clusters is optimal, which is consistent with human judgment from the data distribution.

【００２２】このようにして最適なクラスタリングが行
なわれた後、各部品を種類別に種分けを行なう場合に
は、ベルトコンベアにより部品を搬送し、ベルトコンベ
アの上部に配置した撮像管により部品を撮像し、部品の
シルエット像の周囲長と、面積の物理データを得、この
物理データと最も距離の小さな標準データを決定し、こ
の標準データに対応するクラスタに与えられた部品の種
類を撮像された部品の種類として決定する。After the optimum clustering has been performed in this way, when each component is to be classified by type, the components are transported by a belt conveyor, and the components are imaged by an image pickup tube arranged above the belt conveyor. Then, the perimeter of the silhouette image of the part and the physical data of the area were obtained, the standard data having the smallest distance from this physical data was determined, and the type of the part given to the cluster corresponding to this standard data was imaged. Determined as the type of part.

【００２３】次に、データが追加された場合に，最適な
クラスタリングの状態に変更する方法を図１と図５を用
いて説明する。図５の（ａ）は、初期のクラスタリング
状態である。このクラスタリングは先に説明した方法で
得ることができる。図５の（ａ）の（・）は各クラスタ
の標準データを示している。ここで、ｎ₁、ｎ₂、ｎ₃を
各クラスタＣ１、Ｃ２、Ｃ３の所属メンバ数、（Ｖ₁₁，
Ｖ₁₂）、（Ｖ₂₁，Ｖ₂₂）、（Ｖ₃₁，Ｖ₃₂）を各クラスタ
の各座標毎の分散値とする。Next, a method for changing to an optimum clustering state when data is added will be described with reference to FIGS. FIG. 5A shows an initial clustering state. This clustering can be obtained by the method described above. (A) in FIG. 5A indicates standard data of each cluster. Here, n ₁ , n ₂ , n ₃ are the number of members belonging to each cluster C 1, C 2, C 3, (V ₁₁ ,
V ₁₂ ), (V ₂₁ , V ₂₂ ), and (V ₃₁ , V ₃₂ ) are assumed to be the variance values for each coordinate of each cluster.

【００２４】まず、ステップ１にて、追加された入力デ
ータｐに最も近いクラスタを、各クラスタの標準データ
（・）との距離計算で求める。本例の場合は、クラスタ
Ｃ１となるので、入力データはクラスタＣ１に所属する
と仮定する。この場合、クラスタＣ１の所属メンバ数は
ｎ₁＋１となり、前の分散値（Ｖ₁₁，Ｖ₁₂）、所属メン
バ数ｎ₁、クラスタの総メンバの各座標の２乗和のみを
記憶しておけば、新分散値を求めることができる（新分
散値を求めるためには既に知られた公式を用いればよ
い）。そこで、ステップ２にて、クラスタＣ１の新所属
メンバ数と新分散値、他のクラスタの所属メンバ数と分
散値を数３に代入して、評価量Ａ１を求める。図５の
（ｂ）はこの仮定でのクラスタリング状態と評価量Ａ１
を示したものである。First, in step 1, the cluster closest to the added input data p is obtained by calculating the distance from the standard data (•) of each cluster. In the case of this example, since the data is the cluster C1, it is assumed that the input data belongs to the cluster C1. In this case, the number of members belonging to the cluster C1 is n ₁ +1 and only the previous variance (V ₁₁ , V ₁₂ ), the number of members n ₁ , and the sum of squares of the coordinates of all members of the cluster can be stored. Thus, a new variance value can be obtained (a known formula can be used to obtain a new variance value). Therefore, in step 2, the evaluation amount A1 is obtained by substituting the number of new members and the new variance of the cluster C1 and the number of members and the variance of the other clusters into Expression 3. FIG. 5B shows the clustering state and the evaluation amount A1 under this assumption.
It is shown.

【００２５】ステップ３、４では、最短クラスタＣ１の
メンバに入力データを追加したものを、図３で説明した
方法で２つにクラスタリングする。このときにはこれら
２つの新クラスタの所属メンバ数ｎ₁'、ｎ₁''と分散値
（Ｖ₁₁'，Ｖ₁₂'）（Ｖ₁₁''，Ｖ₁₂''）が求まっているの
で、それらと他のクラスタＣ２とＣ３の所属メンバ数と
分散値を利用してステップ５にて評価量Ａ２が容易に計
算できる。図５の（ｃ）はこの仮定でのクラスタリング
状態と評価量Ａ２を示したものである。In steps 3 and 4, the members obtained by adding the input data to the members of the shortest cluster C1 are clustered into two by the method described with reference to FIG. In this case these two belong member number n ₁ of the new cluster in ', n _1' 'and the dispersion value _{_{(V 11', V 12 '}} ) (V 11'', V 12'') so has been obtained, and their The evaluation amount A2 can be easily calculated in step 5 using the number of members belonging to the other clusters C2 and C3 and the variance. FIG. 5C shows the clustering state and the evaluation amount A2 under this assumption.

【００２６】ステップ６では、追加された入力データｐ
に最も近いクラスタ（Ｃ１）と次に近いクラスタ（Ｃ
２）を求め、それらの融合クラスタに入力データが所属
していると仮定する。この場合も、新クラスタＣ１２の
所属メンバ数はｎ₁＋ｎ₂＋１となり、前の分散値
（Ｖ₁₁，Ｖ₁₂）と（Ｖ₂₁，Ｖ₂₂）、所属メンバ数ｎ₁と
ｎ₂、それぞれのクラスタの総メンバの各座標の２乗和
のみを記憶しておけば、新分散値を求めることができ
る。そこで、ステップ７にて、クラスタＣ１２の新所属
メンバ数と新分散値、他のクラスタＣ３の所属メンバ数
と分散値を数３に代入して、評価量Ａ３を求める。図５
の（ｄ）はこの場合でのクラスタリング状態と評価量Ａ
３を示したものである。In step 6, the added input data p
Cluster closest to (C1) and the next closest cluster (C
2), and assume that the input data belongs to those fusion clusters. Also in this case, the number of members belonging to the new cluster C12 is n ₁ + n ₂ +1. The previous variances (V ₁₁ , V ₁₂ ) and (V ₂₁ , V ₂₂ ), the number of members n ₁ and n ₂ , If only the sum of the squares of the coordinates of all members of the cluster is stored, a new variance value can be obtained. Thus, in step 7, the evaluation amount A3 is obtained by substituting the number of new members and the new variance of the cluster C12 and the number of members and the variance of the other cluster C3 into Expression 3. FIG.
(D) shows the clustering state and the evaluation amount A in this case.
3 is shown.

【００２７】ステップ８では、求めた評価量Ａ１、Ａ
２、Ａ３の内最小のものを求め、その場合のクラスタリ
ングを最適なものとして選択する。本例の場合は、図５
の（ｂ）のクラスタリング結果が選択される。図５の
（ｅ）は追加データが多数になった場合の結果を示した
もので、新しいクラスタＣ４が追加されていることが分
かる。In step 8, the obtained evaluation amounts A1, A
2. Find the smallest one of A3 and select the clustering in that case as the optimal one. In the case of this example, FIG.
(B) of the clustering result is selected. FIG. 5E shows the result when the number of additional data is large, and it can be seen that a new cluster C4 has been added.

【００２８】以上説明した方法によれば、データが追加
された場合に、再度すべてのデータを使用したクラスタ
リングを実行しないで、追加データの周囲のデータのみ
を使用しながら最適なクラスタリング状態を得ることが
できる。次に、本発明を適用した上記実施例を実施する
システム構成について、図７を用いて説明する。１は撮
像装置、２はクラスタリング部、３は判定部、４は画像
処理部、５は記憶部、６はＣＰＵである。記憶部５はサ
ンプルデータ、標準データを記憶し、ＣＰＵ６はシステ
ム全体の制御、すなわち、画像の採り込み、クラスタリ
ング部２、判定部３、画像処理部４および記憶部５の起
動などを制御する。According to the above-described method, when data is added, an optimum clustering state is obtained using only data surrounding the additional data without performing clustering using all the data again. Can be. Next, a system configuration for implementing the above embodiment to which the present invention is applied will be described with reference to FIG. 1 is an imaging device, 2 is a clustering unit, 3 is a judgment unit, 4 is an image processing unit, 5 is a storage unit, and 6 is a CPU. The storage unit 5 stores the sample data and the standard data, and the CPU 6 controls the entire system, that is, controls the taking of the image, the activation of the clustering unit 2, the determination unit 3, the image processing unit 4, and the storage unit 5, and the like.

【００２９】予め学習用の部品のシルエット像である画
像データを撮像装置から入力し、画像処理部４でシルエ
ット像の周囲長、面積を求め、サンプルデータとして記
憶部５に格納する。クラスタリング部２では、記憶部５
に格納されたサンプルデータを用いて、図２のフローで
示されたクラスタリング処理により標準データを作成し
て記憶部５に格納する。Image data, which is a silhouette image of a learning part, is input in advance from an imaging device, and the image processing unit 4 obtains the perimeter and area of the silhouette image and stores them in the storage unit 5 as sample data. In the clustering unit 2, the storage unit 5
The standard data is created by the clustering process shown in the flow of FIG.

【００３０】実際に、部品の種類を判定する判定時に
は、撮像装置１から未知種類の部品のシルエット像を採
り込み、画像処理部４でシルエット像の周囲長、面積の
データを求め、判定部で該データと記憶部５に格納され
ている各標準データとの距離を求め、最短の距離となる
標準データに対応する部品の種類を未知種類の部品の種
類として決定する。Actually, when determining the type of component, a silhouette image of an unknown type of component is taken from the imaging device 1, the image processing unit 4 obtains data on the circumference and area of the silhouette image, and the determination unit determines the data. The distance between the data and each of the standard data stored in the storage unit 5 is determined, and the type of the component corresponding to the standard data having the shortest distance is determined as the type of the unknown type component.

【００３１】なお、本例では、図示の都合上、３つの２
次元データのクラスタの例をあげて説明したが、より多
くの多次元データの場合にも当然適用できる。さらに、
本実施例では製品の２つの静的な物理量（周囲長と面
積）から製品の種類を判定する例で説明したが、例え
ば、データグローブ等の複数のセンサーから得られる動
的な時系列データのクラスタリングにも適用できる。こ
の場合、１センサーを１次元と考えると、各時刻の複数
センサーからのデータは多次元空間上の一点で表現され
るので、そのデータ群をクラスタリングすることで典型
的なデータ（データグローブの場合には指形状）を自動
検出することができることになる。In this example, for convenience of illustration, three 2
Although an example of a cluster of dimensional data has been described, the present invention can be applied to a case of more multidimensional data. further,
In this embodiment, an example has been described in which the type of a product is determined from two static physical quantities (perimeter and area) of the product. For example, dynamic time-series data obtained from a plurality of sensors such as data gloves is described. It can also be applied to clustering. In this case, if one sensor is considered to be one-dimensional, data from a plurality of sensors at each time is represented by one point in a multidimensional space. Therefore, typical data (in the case of a data glove, Can be automatically detected.

【００３２】[0032]

【発明の効果】以上、実施例で説明したように、本発明
では、人間の最適クラスタリング結果の場合に対して最
小値をとるような、クラスタリング状態を評価する評価
量Ａを定義し、その量が最小となるように常にクラスタ
数を決定するようにしている。したがって、クラスタ数
が推測できない対象にも適用できる。また、データが追
加された場合にも、再度すべてのデータを使用したクラ
スタリングを実行せずに追加データの周囲のデータのみ
を使用することによりクラスタリングを可能とすること
ができ、より少ない処理量で高速に最適なクラスタリン
グ状態を得ることができる。As described above, in the present invention, the evaluation amount A for evaluating the clustering state is defined so as to take the minimum value in the case of the optimal clustering result of the human. The number of clusters is always determined so that is minimized. Therefore, the present invention can be applied to a target whose number of clusters cannot be estimated. In addition, even when data is added, clustering can be enabled by using only the data surrounding the additional data without performing clustering using all the data again, and with a smaller amount of processing. An optimal clustering state can be obtained at high speed.

【図面の簡単な説明】[Brief description of the drawings]

【図１】データが追加された場合に最適なクラスタリン
グの状態に変更する方法を説明するためのフローチャー
ト図である。FIG. 1 is a flowchart for explaining a method of changing to a state of optimal clustering when data is added.

【図２】最適なクラスタリングを決定するための方法を
説明するためのフローチャート図である。FIG. 2 is a flowchart for explaining a method for determining optimal clustering.

【図３】クラスタリングを行なう方法を説明するための
フローチャート図である。FIG. 3 is a flowchart for explaining a method for performing clustering.

【図４】最適なクラスタリングを決定する過程および各
クラスタリング状態評価量Ａを実際の物理データを用い
て示した図である。FIG. 4 is a diagram showing a process of determining an optimum clustering and each clustering state evaluation amount A using actual physical data.

【図５】既に決定しているクラスタリングにおいて、さ
らにデータが追加された場合の最適なクラスタリングを
決定する過程および各クラスタリング状態評価量Ａを実
際の物理データを用いて示した図である。FIG. 5 is a diagram illustrating a process of determining an optimal clustering when data is further added in the already determined clustering and each clustering state evaluation amount A using actual physical data.

【図６】部品のシルエット像の例を示す図である。FIG. 6 is a diagram illustrating an example of a silhouette image of a part.

【図７】本発明を適用した実施例を実施するためのシス
テム構成を示す図である。FIG. 7 is a diagram showing a system configuration for implementing an embodiment to which the present invention is applied.

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】複数のデータをクラスタリングする方法
において、クラスタ数を順次変更設定し、設定された各
クラスタ数毎に、複数のデータをクラスタ数にクラスタ
リングし、各クラスタのデータメンバ数とデータの分散
から最適なクラスタ数の場合に最小となるようなクラス
タリング状態評価量Ａを求め、各クラスタ数毎に求めら
れたクラスタリング状態評価量Ａの内の最も小さいクラ
スタリング状態評価量Ａを判定し、該クラスタリング状
態評価量Ａに対応するクラスタリングを最適なクラスタ
リングと決定することを特徴とするクラスタリング方
法。In a method of clustering a plurality of data, the number of clusters is sequentially changed and set, and for each set number of clusters, a plurality of data are clustered into the number of clusters, and the number of data members of each cluster and the number of data are set. From the variance, a clustering state evaluation amount A that is minimized in the case of an optimal number of clusters is obtained, and the smallest clustering state evaluation amount A among the clustering state evaluation amounts A obtained for each number of clusters is determined. A clustering method comprising: determining a clustering corresponding to a clustering state evaluation amount A as an optimal clustering.

【請求項２】請求項１記載のクラスタリング方法によ
り決定されたクラスタリングの対象となつた複数のデー
タに更にデータが追加されたときのクラスタリング方法
であって、データの追加に対してクラスタが追加、融
合、または、現状維持となる状態を仮定し、それぞれの
状態のクラスタリング状態評価量Ａを求め、求められた
クラスタリング状態評価量Ａの内の最も小さいクラスタ
リング状態評価量Ａを判定し、該クラスタリング状態評
価量Ａに対応するクラスタリングを最適なクラスタリン
グと決定することを特徴とするクラスタリング方法。2. A clustering method in which data is further added to a plurality of data to be subjected to clustering determined by the clustering method according to claim 1, wherein a cluster is added for the addition of data. Assuming a state of fusion or maintenance of the current state, a clustering state evaluation amount A of each state is obtained, and the smallest clustering state evaluation amount A among the obtained clustering state evaluation amounts A is determined. A clustering method characterized by determining the clustering corresponding to the evaluation amount A as the optimal clustering.

【請求項３】請求項１記載のクラスタリング方法によ
り決定されたクラスタリングの対象となつた複数のデー
タに更にデータが追加されたときのクラスタリング方法
であって、追加データがそのデータの最短クラスタ内に
所属し、そのクラスタを１つのクラスタと仮定して得た
クラスタリング状態評価量Ａ１と、追加データが最短ク
ラスタに所属し、そのクラスタを２つのクラスタにクラ
スタリングした場合のクラスタリング状態評価量Ａ２
と、追加データが最短クラスタと準最短クラスタを加え
た１つのクラスタに所属していると仮定して得たクラス
タリング状態評価量Ａ３を求め、求められたクラスタリ
ング状態評価量Ａ１、Ａ２、Ａ３の内の最も小さいクラ
スタリング状態評価量Ａを判定し、該クラスタリング状
態評価量Ａに対応するクラスタリングを最適なクラスタ
リングと決定することを特徴とするクラスタリング方
法。3. A clustering method when data is further added to a plurality of data to be subjected to clustering determined by the clustering method according to claim 1, wherein the additional data is included in the shortest cluster of the data. Clustering state evaluation amount A1 belonging to the cluster and assuming the cluster as one cluster, and clustering state evaluation amount A2 when additional data belongs to the shortest cluster and the cluster is clustered into two clusters.
And a clustering state evaluation amount A3 obtained assuming that the additional data belongs to one cluster obtained by adding the shortest cluster and the quasi-shortest cluster, and among the obtained clustering state evaluation amounts A1, A2, and A3, A clustering state evaluation amount A having the smallest clustering state, and determining the clustering corresponding to the clustering state evaluation amount A as the optimal clustering.

【請求項４】請求項１ないし請求項３記載のいずれか
の請求項記載のクラスタリング方法において、複数のデ
ータをクラスタ数にクラスタリングする場合、クラスタ
数と同数の標準データを順次変更配置し、データと各標
準データ間の距離を求め、データと最短距離にある標準
データに該データは属するとして誤差Ｅを求め、前回の
標準データの配置での誤差Ｅとの差が所定一定値より大
のとき標準データの移動を行ない再度誤差Ｅおよび前記
差を求め、該差が所定一定値より小のときクラスタリン
グを決定するようにしたことを特徴とするクラスタリン
グ方法。4. A clustering method according to claim 1, wherein when a plurality of data are clustered into a number of clusters, the same number of standard data as the number of clusters are sequentially changed and arranged. And the distance between each of the standard data, and the error E is determined assuming that the data belongs to the standard data located at the shortest distance from the data. When the difference from the error E in the previous arrangement of the standard data is larger than a predetermined constant value, A clustering method, wherein the standard data is moved, the error E and the difference are obtained again, and when the difference is smaller than a predetermined fixed value, clustering is determined.