JP5120254B2

JP5120254B2 - Clustering system and defect type determination apparatus

Info

Publication number: JP5120254B2
Application number: JP2008523694A
Authority: JP
Inventors: 信楜澤; 昭男勝呂; 孝二大西
Original assignee: Asahi Glass Co Ltd
Current assignee: AGC Inc
Priority date: 2006-07-06
Filing date: 2007-07-03
Publication date: 2013-01-16
Anticipated expiration: 2027-07-03
Also published as: TW200818060A; CN101484910A; JPWO2008004559A1; WO2008004559A1; KR100998456B1; TWI434229B; CN101484910B; KR20090018920A

Description

本発明は、検出対象物の画像における欠陥部分の部分画像を切り出し、この部分画像から欠陥の特徴信号を抽出して、欠陥の種別を分類するクラスタリングシステム、欠陥種類判定装置に関する。 The present invention relates to a clustering system and a defect type determination device that cut out a partial image of a defect portion in an image of a detection object, extract a feature signal of the defect from the partial image, and classify the defect type.

未知データと学習データとの距離、例えばマハラノビス（Mahalanobis generalized distance）距離によるクラスタリング手法は従来から一般的に行われている。すなわち、未知データが事前に学習した母集団としてのクラスタに属するか否かを判定することにより分類され、クラスタリング処理が行われる。たとえば、複数クラスタに対するマハラノビス距離の大小により、未知データがいずれの母集団のクラスタに属するかの判定が行われている（たとえば、特許文献１参照）。
また、上述した距離を効率的に計算するため、複数の特徴量を選択してクラスタリングの処理を行うことが行われている。A clustering technique based on a distance between unknown data and learning data, for example, a Mahalanobis (Mahalanobis generalized distance) distance, has been generally performed. That is, it is classified by determining whether unknown data belongs to a cluster as a population learned in advance, and clustering processing is performed. For example, a determination is made as to which population cluster the unknown data belongs to based on the Mahalanobis distance for a plurality of clusters (see, for example, Patent Document 1).
In addition, in order to efficiently calculate the above-described distance, a plurality of feature amounts are selected and clustering processing is performed.

また、多数の識別器（classifier）より得られた結果の投票により、その未知データが帰属するクラスタの判定を行う手法も一般的であり、異なるセンサの出力の識別結果、または一つの画像上の異なる領域に対する未知データの識別における識別結果等が用いられている（たとえば、特許文献２参照）。
上記クラスタリング手法により、血液検査の結果得られたパラメータによる病気の診断、すなわちいずれの病気に属するかのクラスタリングにおいて、複数のクラスタにおける２つのクラスタずつの組み合わせを設定し、この組合せごとに、被検データがいずれかのクラスタに類似すると判定されるかを全ての組み合わせに対して行い、その判定の数の集計結果により、判定された数の多いクラスタに分類されると決定する手法がある（たとえば、特許文献３）。It is also common to determine the cluster to which the unknown data belongs by voting the results obtained from a number of classifiers (classifiers). An identification result or the like in identifying unknown data for different areas is used (see, for example, Patent Document 2).
In the above-described clustering method, in the diagnosis of diseases based on the parameters obtained as a result of the blood test, that is, in the clustering of which disease belongs, a combination of two clusters in a plurality of clusters is set, and for each combination, the test is performed. There is a method of determining whether data is determined to be similar to any cluster for all combinations, and determining that the data is classified into a large number of determined clusters based on the result of counting the number of determinations (for example, Patent Document 3).

ＬＣＤガラス基板に付けられた各欠陥を、予め設定されている欠陥種類毎に分類する際、分類する際の識別に対応させて、分類に用いる各特徴量の最適化を行い、この最適化に対応するように、各特徴量に対してそれぞれ重み付けを行い、この最適化された特徴量を用いて、いずれのクラスタに属するかの判定を行うクラスタリングが行われている（例えば、特許文献４）。
特開２００５−２１４６８２号公報特開２００１−５６８６１号公報特開平０７−１０５１６６号公報特開２００２−９９９１６号公報 When each defect attached to the LCD glass substrate is classified for each preset defect type, each feature quantity used for classification is optimized in accordance with the identification at the time of classification. In order to correspond, clustering is performed to weight each feature amount and determine which cluster belongs to using this optimized feature amount (for example, Patent Document 4). .
JP 2005-214682 A JP 2001-56861 A Japanese Patent Laid-Open No. 07-105166 JP 2002-99916 A

しかしながら、特許文献３に示すクラスタリングにあっては、個々の組み合わせの最適化が行われておらず、判別材料となる特徴量を活かしきれておらず、かつ判別すべきクラスタが多くなると、組み合わせ数が膨大となり、判定処理にかかる時間が増大してしまうという問題がある。
また、特許文献４に示すクラスタリングにあっては、判定率を元に、特徴量に重みを付け判別精度を向上させようとしているが、クラスタ毎の特徴量の最適化の概念が無く、上述した特許文献３と同様に、特徴量が生かし切れていないため、高い精度の分類が行われない欠点がある。However, in the clustering shown in Patent Document 3, the individual combinations are not optimized, the feature amount as the discrimination material is not fully utilized, and the number of combinations increases when the number of clusters to be discriminated increases. There is a problem that the time required for the determination process increases.
Moreover, in the clustering shown in Patent Document 4, the feature amount is weighted based on the determination rate to improve the discrimination accuracy, but there is no concept of feature amount optimization for each cluster, and the above-mentioned is described above. Similar to Patent Document 3, since the feature amount is not fully utilized, there is a defect that classification with high accuracy is not performed.

本発明は、このような事情に鑑みてなされたもので、属するクラスタに分類する対象である分類対象データから抽出された特徴量を判別する際に活かし、従来例に比し、より高速に、より高精度に分類対象データを分類する、例えばガラス面に付いた欠陥を、欠陥種類に対応したクラスタに分類することができるクラスタリングシステム、欠陥種類判定装置を提供する。 The present invention has been made in view of such circumstances, and is utilized when discriminating the feature amount extracted from the classification target data that is the target to be classified into the cluster to which it belongs, faster than the conventional example, Provided are a clustering system and a defect type determination device that can classify data to be classified with higher accuracy, for example, can classify defects on a glass surface into clusters corresponding to defect types.

上述した課題を解決するため、本発明においては、分類対象データと、各クラスタとの間の距離を同一の種類の特徴量にて算出して分類先を決定する従来例とは異なり、各クラスタ間にて差分を得ることができる特徴量のセットをクラスタ毎に設定し、それぞれのクラスタとの間にて異なる特徴量にて距離を求めているため、従来に比較してより精度の高い分類を行うこととなる。
上述した特徴量のセットは、各クラスタに属する学習データの特性に基づいて行うため、他のクラスタと区別が可能な特徴量にて構成されている。
すなわち、本発明は以下の構成を採用した。In order to solve the above-described problem, in the present invention, each cluster is different from the conventional example in which the classification target is determined by calculating the distance between the classification target data and each cluster with the same type of feature amount. A set of feature quantities that can be obtained in the meantime is set for each cluster, and distances are obtained with different feature quantities from each cluster, so classification is more accurate than before. Will be performed.
Since the feature amount set described above is performed based on the characteristics of the learning data belonging to each cluster, the feature amount set is composed of feature amounts that can be distinguished from other clusters.
That is, the present invention employs the following configuration.

本発明のクラスタリングシステムは、学習データ（learning data）の母集団（population）により形成されたクラスタ各々に、入力データ（input data）を、該入力データが有する特徴量（parameter）により分類するクラスタリングシステムにおいて、クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セット（parameter set）が記憶されている特徴量セット記憶部と、入力データから予め設定されている特徴量を抽出する特徴量抽出部と、各クラスタに対応した特徴量セット毎に、該特徴量セットに含まれる特徴量に基づいて、各クラスタの母集団の中心と前記入力データとの距離を、各々セット距離として計算して出力する距離計算部と、前記各セット距離を小さい順に配列する順位抽出部とを有し、前記特徴量セットが各クラスタ毎に複数設定され、前記特徴量セット毎に得られた前記セット距離において、該セット距離の順位に基づいて設定された入力データの各クラスタへの分類基準を示す規則パターンにより、前記入力データがいずれのクラスタに属するかを検出するクラスタ分類部をさらに有することを特徴とする。 The clustering system according to the present invention classifies input data (input data) into each cluster formed by a population of learning data (population) according to features (parameters) of the input data. , A feature value set storage unit storing a feature value set (parameter set) that is a combination of feature values used for classification corresponding to each cluster, and a feature value set in advance from the input data is extracted. For each feature quantity set corresponding to each cluster, the feature quantity extraction unit, based on the feature quantities included in the feature quantity set, the distance between the center of the population of each cluster and the input data is set as the set distance, respectively. a distance calculation unit that calculates and outputs, possess a rank extracting unit for arranging the each set distance in ascending order, the feature amount set for each cluster A plurality of the set distances obtained for each feature quantity set, and the input data is set according to a rule pattern indicating a classification standard for each cluster of the input data set based on the rank of the set distance. It further has a cluster classification part for detecting whether it belongs to a cluster .

本発明の好ましいクラスタリングシステムは、前記クラスタ分類部が、前記セット距離の順位により、前記入力データがいずれのクラスタに属するかを検出し、該順位が上位となったセット距離が多いクラスタを、前記入力データの属するクラスタとして検出する。 In a preferred clustering system of the present invention, the cluster classification unit detects which cluster the input data belongs to according to the rank of the set distance, and the cluster having a large set distance with the rank being higher is selected. It is detected as a cluster to which input data belongs.

本発明の好ましいクラスタリングシステムは、前記クラスタ分類部が、順位が上位となった数に対する閾値を有しており、上位となったクラスタが該閾値以上であれば入力データの属するクラスタとして検出する。 In a preferred clustering system of the present invention, the cluster classification unit has a threshold for the number of ranks higher than the rank, and if the higher-ranked cluster is equal to or higher than the threshold, it is detected as a cluster to which input data belongs.

本発明の好ましいクラスタリングシステムは、前記距離計算部が、前記セット距離に対して特徴量セット対応して設定されている補正係数を乗算し、各特徴量セット間におけるセット距離を標準化することを特徴とする。 In a preferred clustering system of the present invention, the distance calculation unit multiplies the set distance by a correction coefficient set corresponding to the feature amount set, and standardizes a set distance between the feature amount sets. And

本発明の好ましいクラスタリングシステムは、各クラスタ毎の特徴量セットを作成する特徴量セット作成部をさらに有し、前記特徴量セット作成部が、各特徴量の複数の組合せ毎に、各クラスタの母集団の学習データの平均値を原点とし、この原点と他のクラスタの母集団の各学習データとの距離の平均値を求め、最も大きな平均値となった特徴量の組合せを、各クラスタの他のクラスタとの識別に用いる特徴量セットとして選択する。 A preferred clustering system of the present invention further includes a feature quantity set creation unit that creates a feature quantity set for each cluster, and the feature quantity set creation unit includes a mother of each cluster for each of a plurality of combinations of each feature quantity. The average value of the learning data of the group is used as the origin, the average value of the distance between this origin and each of the learning data of the population of other clusters is obtained, and the combination of the feature values having the largest average value is determined for each cluster. Is selected as a feature amount set used for identification with a cluster.

本発明の欠陥種類判定装置は、上記記載のクラスタリングシステムのいずれかが設けられ、前記入力データが製品の欠陥の画像データであり、欠陥を示す特徴量により、画像データにおける欠陥を、欠陥の種類別に分類する。
本発明の好ましい欠陥種類判定装置は、前記製品がガラス物品であり、該ガラス物品の欠陥を、欠陥の種類別に分類する。The defect type determination apparatus according to the present invention is provided with any of the clustering systems described above, wherein the input data is image data of a product defect, and the defect type in the image data is determined based on the feature amount indicating the defect. Classify separately.
In a preferred defect type determination apparatus according to the present invention, the product is a glass article, and the defects of the glass article are classified by defect type.

本発明の欠陥検出装置は、上記欠陥種類判定装置が設けられた、製品の欠陥の種別を検出する。 A defect detection apparatus according to the present invention detects a defect type of a product provided with the defect type determination apparatus.

本発明の製造状態判定装置は、上記記載の欠陥種類判定装置が設けられた、製品の欠陥の種別を行い、該種別に対応した発生要因との対応に基づき、製造プロセスにおける欠陥の発生要因の検出を行う。 The manufacturing state determination apparatus of the present invention performs the type of product defect provided with the defect type determination apparatus described above, and based on the correspondence with the generation factor corresponding to the type, the generation factor of the defect in the manufacturing process Perform detection.

本発明の好ましい製造状態判定装置は、上記記載のクラスタリングシステムのいずれかが設けられ、前記入力データが製品の製造プロセスにおける製造条件を示す特徴量であり、この特徴量を、製造プロセスの各工程の製造状態別に分類する。
本発明の好ましい製造状態判定装置は、前記製品がガラス物品であり、該ガラス物品の製造プロセスにおける特徴量を、製造プロセスの各工程の製造状態別に分類する。A preferable manufacturing state determination apparatus of the present invention is provided with any of the clustering systems described above, and the input data is a feature value indicating a manufacturing condition in a product manufacturing process, and the feature value is used as each step of the manufacturing process. Are classified according to their manufacturing status.
In a preferable manufacturing state determination apparatus of the present invention, the product is a glass article, and the feature amount in the manufacturing process of the glass article is classified according to the manufacturing state of each step of the manufacturing process.

本発明の製造状態検出装置は、上記記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別を検出する。 The manufacturing state detection device of the present invention detects the type of manufacturing state in each step of the product manufacturing process provided with the manufacturing state determination device described above.

本発明の製品製造管理装置は、上記記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別の検出を行い、該種別に対応した制御項目に基づき、製造プロセスの工程におけるプロセス制御を行う。 A product manufacturing management apparatus according to the present invention detects a type of manufacturing state in each step of a manufacturing process of a product provided with the above-described manufacturing state determination device, and manufactures based on a control item corresponding to the type Process control in this process is performed.

以上説明したように、本発明によれば、分類先のクラスタ毎に、分類対象データの有する複数の特徴量から、他のクラスタとの距離が遠くなる最適な特徴量の組合せを予め設定しておき、分類対象データと各クラスタとの間における距離をそれぞれ計算し、この計算された距離が最も小さいクラスタに、分類対象データを分類するため、従来の手法に比較して、より正確に分類対象データを対応するクラスタに分類することができる。
また、本発明によれば、クラスタ毎に上記組合せを複数設定し、全クラスタと分類対象データとの計算結果の距離を小さい順にならべて、予め設定した数の上位グループに含まれる数が最も多いクラスタに、分類対象データを分類するため、従来に比較して精度の高い分類が行うことができる。As described above, according to the present invention, for each cluster to be classified, an optimal combination of feature amounts that is far from other clusters is preset from a plurality of feature amounts of the classification target data. In addition, since the distance between the classification target data and each cluster is calculated, and the classification target data is classified into the cluster with the smallest calculated distance, the classification target is more accurately compared with the conventional method. Data can be classified into corresponding clusters.
Further, according to the present invention, a plurality of the above combinations are set for each cluster, and the distances of the calculation results between all the clusters and the classification target data are arranged in ascending order, and the number included in the preset upper group is the largest. Since the data to be classified is classified into clusters, classification can be performed with higher accuracy than in the past.

本発明の第１および第２の実施形態によるクラスタリングシステムの構成例を示すブロック図である。It is a block diagram which shows the structural example of the clustering system by the 1st and 2nd embodiment of this invention. 判別基準値λによる特徴セットの選択に対する処理を説明するテーブルである。It is a table explaining the process with respect to selection of the feature set by the discrimination reference value λ. 判別基準値λによる特徴セットの選択に対する処理を説明するテーブルである。It is a table explaining the process with respect to selection of the feature set by the discrimination reference value λ. 判別基準値λによる特徴セットの選択に対する効果を説明するヒストグラムを示す図である。It is a figure which shows the histogram explaining the effect with respect to the selection of the feature set by discrimination | determination reference value (lambda). 第１の実施形態による各クラスタに対する特徴量セットを選択する処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in the process which selects the feature-value set with respect to each cluster by 1st Embodiment. 第１の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in the process of the clustering with respect to the classification object data by 1st Embodiment. 第２の実施形態におけるクラスタリングの処理に用いる規則パターンのテーブルを生成する動作例を示すフローチャートである。It is a flowchart which shows the operation example which produces | generates the table of the rule pattern used for the process of clustering in 2nd Embodiment. 第２の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in the process of clustering with respect to the classification object data by 2nd Embodiment. 第２の実施形態による分類対象データに対する他のクラスタリングの処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in the process of the other clustering with respect to the classification object data by 2nd Embodiment. 第３の実施形態による分類対象データに対するクラスタリングの処理における動作例を示すフローチャートである。It is a flowchart which shows the operation example in the process of clustering with respect to the classification object data by 3rd Embodiment. 特徴量の変換方法としての演算式を設定する動作例を示すフローチャートである。It is a flowchart which shows the operation example which sets the computing equation as a conversion method of a feature-value. 図１１のフローチャートにおける評価値の算出の動作例を示すフローチャートである。12 is a flowchart illustrating an operation example of evaluation value calculation in the flowchart of FIG. 11. 設定された変換方法を用いて変換した特徴量を用いた距離の算出の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the calculation of the distance using the feature-value converted using the set conversion method. 各クラスタに属する学習データを示すテーブルである。It is a table which shows the learning data which belongs to each cluster. 図１４の学習データを従来例によるクラスタリング方法により分類した結果を示す結果テーブルである。It is a result table which shows the result of having classified the learning data of FIG. 14 by the clustering method by a prior art example. 全体補正判定率の算出方法を説明する概念図である。It is a conceptual diagram explaining the calculation method of a whole correction determination rate. 図１４の学習データを第１の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。It is a result table which shows the result of having classified the learning data of Drawing 14 by the clustering system in a 1st embodiment. 図１４の学習データを第２の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。It is a result table which shows the result of having classified the learning data of Drawing 14 by the clustering system in a 2nd embodiment. 図１４の学習データを第２の実施形態におけるクラスタリングシステムにより分類した結果を示す結果テーブルである。It is a result table which shows the result of having classified the learning data of Drawing 14 by the clustering system in a 2nd embodiment. 本発明のクラスタリングシステムを用いた検査装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the test | inspection apparatus using the clustering system of this invention. 図２０の検査装置における特徴量セットの選択の動作例を示すフローチャートである。FIG. 21 is a flowchart illustrating an operation example of feature quantity set selection in the inspection apparatus of FIG. 20. FIG. 図２０の検査装置におけるクラスタリング処理の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the clustering process in the inspection apparatus of FIG. 本発明のクラスタリングシステムを用いた欠陥種類判定装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the defect kind determination apparatus using the clustering system of this invention. 本発明のクラスタリングシステムを用いた製造管理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the manufacturing management apparatus using the clustering system of this invention. 本発明のクラスタリングシステムを用いた他の製造管理装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the other manufacturing management apparatus using the clustering system of this invention.

符号の説明Explanation of symbols

１…特徴量セット作成部
２…特徴量抽出部
３…距離計算部
４…特徴量セット記憶部
５…クラスタデータベース
１００…被検査物
１０１…画像取得部
１０２…照明装置
１０３…撮像装置
１０４…欠陥候補検出部
１０５…クラスタリング部
２００，３００…制御装置
２０１，２０２…画像取得装置
３０１，３０２…製造装置
３０３…告知部
３０４…記録部DESCRIPTION OF SYMBOLS 1 ... Feature-value set creation part 2 ... Feature-value extraction part 3 ... Distance calculation part 4 ... Feature-value set storage part 5 ... Cluster database 100 ... Inspected object 101 ... Image acquisition part 102 ... Illuminating device 103 ... Imaging device 104 ... Defect Candidate detection unit 105 ... clustering unit 200, 300 ... control device 201, 202 ... image acquisition device 301, 302 ... manufacturing device 303 ... notification unit 304 ... recording unit

本発明のクラスタリングシステムは、学習データを母集団として形成されたクラスタ各々に、分類対象の入力データを、この入力データが有する特徴量により分類するクラスタリングシステムに関するものであり、前記クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セットが記憶されている特徴量セット記憶部を有し、特徴量抽出部が予め設定されている該特徴量セットに基づいて、前記入力データから特徴量を抽出し、距離計算部が各クラスタに対応した特徴量セット毎に、該特徴量セットに含まれる特徴量に基づいて、母集団及び前記入力データとの距離を、各々セット距離として計算し、順位抽出部が各セット距離を小さい順に配列し、配列順に対応してクラスタへの分類を行うものである。 The clustering system of the present invention relates to a clustering system that classifies the input data to be classified into each cluster formed by using learning data as a population, according to the feature quantity of the input data, and corresponds to each cluster. A feature quantity set storage unit storing a feature quantity set that is a combination of feature quantities used for classification, and a feature quantity extraction unit based on the preset feature quantity set from the input data The feature amount is extracted, and the distance calculation unit calculates the distance between the population and the input data as the set distance for each feature amount set corresponding to each cluster based on the feature amount included in the feature amount set. Then, the rank extraction unit arranges the set distances in ascending order and classifies them into clusters corresponding to the arrangement order.

＜第１の実施形態＞
以下、本発明の第１の実施形態によるクラスタリングシステムを図面を参照して説明する。図１は同実施形態によるクラスタリングシステムの構成例を示すブロック図である。
本実施形態のクラスタリングシステムは、図１に示すように、特徴量セット作成部１，特徴量抽出部２，距離計算部３，特徴量セット記憶部４およびクラスタデータベース５を有している。
特徴量セット記憶部４には、各クラスタの識別情報に対応して、クラスタ毎に個別に設定された、分類対象データの特徴量の組合せを示す特徴量セットが記憶されている。たとえば、分類対象データが特徴量の集合｛ａ，ｂ，ｃ，ｄ｝である場合、各クラスタの特徴量セットは［ａ，ｂ］，［ａ，ｂ，ｃ，ｄ］，［ｃ］等の種類の特徴量の組合せとして設定されている。以下の説明においては、前記特徴量の集合から、特徴量全ての組合せ，複数（前記例においては、集合のいずれか２つ，３つの特徴量）の組合せ，いずれか１つを、「特徴量の組合せ」と定義する。<First Embodiment>
Hereinafter, a clustering system according to a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram showing a configuration example of a clustering system according to the embodiment.
As shown in FIG. 1, the clustering system of this embodiment includes a feature quantity set creation unit 1, a feature quantity extraction unit 2, a distance calculation unit 3, a feature quantity set storage unit 4, and a cluster database 5.
The feature value set storage unit 4 stores a feature value set indicating a combination of feature values of the classification target data, which is individually set for each cluster, corresponding to the identification information of each cluster. For example, when the classification target data is a set of feature values {a, b, c, d}, the feature value set of each cluster is [a, b], [a, b, c, d], [c], etc. It is set as a combination of feature quantities of the types. In the following description, a combination of all feature amounts or a plurality of combinations (any two or three feature amounts in the set in the above example) or any one of the feature amount sets will be referred to as “feature amount”. Defined as “combinations of”.

ここで、クラスタＡ，ＢおよびＣが分類先のクラスタとして設定されている場合、各クラスタに対応する特徴量セットは、各クラスタに予め分類されている学習データを用い、各クラスタと他のクラスタとの距離が最も大きくなる特徴量の組合せとして求められ、特徴量セット記憶部４に記憶されている。
たとえば、クラスタＡに対して設定されている特徴量セットは、クラスタＡに属する学習データの各特徴量の平均値からなるベクトルと、その他のクラスタＢおよびＣに属する学習データの各特徴量の平均値からなるベクトルとの距離が、最も大きくなる特徴量の組合せとして設定されている。
また、分類対象データと、各クラスタにおける母集団の学習データとは、同一の特徴量の集合から構成されている。Here, when the clusters A, B, and C are set as classification destination clusters, the feature amount set corresponding to each cluster uses learning data classified in advance in each cluster, and each cluster and other clusters Is obtained as a combination of feature amounts having the largest distance to and stored in the feature amount set storage unit 4.
For example, the feature amount set set for the cluster A includes a vector composed of an average value of the feature amounts of the learning data belonging to the cluster A and an average of the feature amounts of the other learning data belonging to the clusters B and C. The distance from the vector consisting of the values is set as the largest combination of feature quantities.
Further, the classification target data and the learning data of the population in each cluster are composed of the same set of feature values.

特徴量抽出部２は、入力される分類対象データから、各クラスタとの距離を計算する際、計算対象となるクラスタに対応する特徴量セットを特徴量セット記憶部４から読み出し、この特徴量セットに対応した特徴量を、分類対象データの複数の特徴量から抽出し、抽出した特徴量を距離計算部３へ出力する。
距離計算部３は、クラスタデータベース５から、計算対象のクラスタの識別情報をキーとし、計算対象となるクラスタの学習データの各特徴量の平均値からなるベクトルを読み出し、このクラスタの特徴量セットに基づいて、分類対象データが抽出した特徴量からなるベクトルと、学習データの各特徴量の平均値からなるベクトル（クラスタにおける複数の学習データの重心位置を示す重心ベクトル）との距離を算出する。When calculating the distance to each cluster from the input classification target data, the feature amount extraction unit 2 reads out the feature amount set corresponding to the cluster to be calculated from the feature amount set storage unit 4, and this feature amount set Are extracted from a plurality of feature amounts of the classification target data, and the extracted feature amounts are output to the distance calculation unit 3.
The distance calculation unit 3 reads from the cluster database 5 a vector composed of the average value of each feature quantity of the learning data of the cluster to be calculated, using the identification information of the cluster to be calculated as a key, and stores it in the feature quantity set of this cluster. Based on this, the distance between the vector composed of the feature amount extracted from the classification target data and the vector composed of the average value of the feature amounts of the learning data (the centroid vector indicating the centroid position of the plurality of learning data in the cluster) is calculated.

前記距離の計算を行う際、距離計算部３は、特徴量間のデータ単位の差異を無くし、特徴量間の数値を標準化するため、以下の（１）式により、分類対象データにおける各特徴量ｖ（i）毎に正規化を行っている。
Ｖ（i）＝（ｖ（i）−avg.（i））／std.（i） …（１）
ここで、ｖ（i）は特徴量であり、avg.（i）は計算対象のクラスタ内の学習データにおける特徴量の平均値であり、std.（i）は計算対象のクラスタ内の学習データにおける特徴量の標準偏差（standardized deviation）であり、Ｖ（i）は正規化された特徴量である。したがって、距離を計算する際に、距離計算部３は特徴量セット毎に各特徴量の規格化を行う必要がある。
また、距離計算部３は、前記正規化処理を、分類対象データにおける距離の計算に用いる特徴量毎に、学習データのそれぞれ対応する特徴量の平均値および標準偏差を用いて行う。When calculating the distance, the distance calculation unit 3 eliminates the difference in data units between the feature quantities and standardizes the numerical values between the feature quantities, so that each feature quantity in the classification target data is expressed by the following equation (1). Normalization is performed for each v (i).
V (i) = (v (i) −avg. (I)) / std. (I) (1)
Here, v (i) is a feature value, avg. (I) is an average value of feature values in learning data in the calculation target cluster, and std. (I) is learning data in the calculation target cluster. Is the standardized deviation of the feature value at V, and V (i) is the normalized feature value. Therefore, when calculating the distance, the distance calculation unit 3 needs to normalize each feature value for each feature value set.
Further, the distance calculation unit 3 performs the normalization process using the average value and the standard deviation of the corresponding feature values of the learning data for each feature value used for calculating the distance in the classification target data.

また、距離としては、上述した標準化した特徴量を用いた標準化ユークリッド距離（standardized Euclidean distance），マハラノビス距離，ミンコフスキー距離（Minkowskydistance）などのいずれを用いてもよい。
ここで、マハラノビス距離を用いた場合、マハラノビス平方距離(Mahalanobis squared distance)ＭＨＤは以下の（２）式により求められる。
ＭＨＤ＝（１／ｎ）・(Ｖ^ＴＲ^−１Ｖ） …（２）
前記（２）式における行列Ｖにおける各要素Ｖ（i）は、未知データの多次元の特徴量ｖ（i）に対して、該当クラスタ内の学習データの特徴量の平均値avg.（i）と標準偏差ｓtd.（i）により、上述した（１）式により求めた特徴量である。ｎは自由度であり、本実施形態においては特徴量セット（後述）における特徴量の数である特徴量数を示している。これにより、マハラノビス平方距離はｎ個の変換された特徴量の差分を加算した数値であり、（マハラノビス平方距離）／ｎにより母集団平均の単位距離が１となる。また、Ｖ^Ｔは特徴量ｖ（i）を要素とする行列Ｖの転置行列であり、Ｒ^−１はクラスタ内の学習データにおける各特徴量間の相関行列(correlation matrix)Ｒの逆行列である。As the distance, any of the standardized Euclidean distance, Mahalanobis distance, Minkowski distance (Minkowsky distance) using the above-described standardized feature amount may be used.
Here, when the Mahalanobis distance is used, the Mahalanobis squared distance (Mahalanobis squared distance) MHD is obtained by the following equation (2).
MHD = (1 / n) · (V ^T R ⁻¹ V) (2)
Each element V (i) in the matrix V in the equation (2) is the average value avg. (I) of the feature value of the learning data in the corresponding cluster with respect to the multidimensional feature value v (i) of the unknown data. And the standard deviation std. (I), the feature amount obtained by the above-described equation (1). n is a degree of freedom, and in the present embodiment, indicates the number of feature amounts which is the number of feature amounts in a feature amount set (described later). Thus, the Mahalanobis square distance is a numerical value obtained by adding the differences of n converted feature quantities, and the unit distance of the population average becomes 1 by (Mahalanobis square distance) / n. Further, V ^T is the transpose of the matrix V to the feature quantity v of (i) an element, R ^-1 is the correlation matrix (correlation matrix) inverse matrix R between the feature amounts in the learning data in the cluster .

特徴量セット作成部１は、前記距離計算部３が分類対象データと各クラスタとの間の距離を計算する際に用いる特徴量セットを、各クラスタ毎に算出し、算出結果を各クラスタの識別情報に対応して、特徴量セット記憶部４に書き込んで記憶させる。
特徴量セットを算出する際、特徴量セット作成部１は、各クラスタ毎に、特徴量セットを生成する対象クラスタに属する学習データの重心ベクトル(barycentric vcctor)と、該対象クラスタを除いた他のクラスタすべてに属する学習データの重心ベクトルとの距離を基に、以下の（３）式により判別基準(discriminant criterion)の値λを計算する。以下、特徴量の組合せを特徴量セットとして説明する。The feature value set creation unit 1 calculates, for each cluster, a feature value set used when the distance calculation unit 3 calculates the distance between the classification target data and each cluster, and the calculation result is identified for each cluster. Corresponding to the information, it is written and stored in the feature value set storage unit 4.
When calculating the feature value set, the feature value set creation unit 1 calculates, for each cluster, a centroid vector (barycentric vcctor) of learning data belonging to the target cluster for generating the feature value set, and other data other than the target cluster. Based on the distance from the centroid vector of the learning data belonging to all the clusters, the discriminant criterion value λ is calculated by the following equation (3). Hereinafter, a combination of feature amounts will be described as a feature amount set.

λ＝ω_ｏω_ｉ（μ_ｏ−μ_ｉ）^２／（ω_ｏσ_ｏ ^２＋ω_ｉσ_ｉ ^２） …（３）
前記（３）式において、μ_ｉは「対象クラスタに属する学習データ（クラスタ内母集団）」の特徴量セットにおける特徴量の平均値からなる重心ベクトルである。σ_ｉは該クラスタ内母集団に属する学習データの特徴量によるベクトルの標準偏差である。ω_ｉは全クラスタに属する学習データに対するクラスタ内母集団に属する学習データ数の比率である。また、μ_ｏは「対象クラスタ以外のクラスタに属する学習データ（対象クラスタ外母集団）」の特徴量セットにおける特徴量の平均値からなる重心ベクトルである。σ_ｏは該対象クラスタ外母集団に属する学習データの特徴量によるベクトルの標準偏差である。ω_０は全クラスタに属する学習データにおけるクラスタ外母集団に属する学習データ数の比率である。ここで、（３）式における（μ_ｏ−μ_ｉ）は、ｌｏｇ（対数）および平方根とした数値を用いてもよい。また、ここで各ベクトルを計算する際、特徴量セット作成部１は、（１）式により、各特徴量毎に規格化された特徴量を計算して用いる。また、比率ωｉ及びω_ｏを予め演算された、分離が大きくなる数値として固有値を設定するようにしてもよい。λ = ω _o ω _i (μ _o −μ _i ) ² / (ω _o σ _o ² + ω _i σ _i ² ) (3)
In the equation (3), μ _i is a centroid vector composed of an average value of feature amounts in a feature amount set of “learned data belonging to the target cluster (cluster population)”. σ _i is a standard deviation of a vector based on a feature amount of learning data belonging to the population in the cluster. ω _i is the ratio of the number of learning data belonging to the population in the cluster to the learning data belonging to all clusters. Μ _o is a centroid vector composed of an average value of feature amounts in a feature amount set of “learning data belonging to a cluster other than the target cluster (non-target cluster population)”. σ _o is a standard deviation of a vector based on a feature amount of learning data belonging to the population outside the target cluster. ω ₀ is the ratio of the number of learning data belonging to the population outside the cluster in the learning data belonging to all clusters. Here, as (μ _o −μ _i ) in the expression (3), a log (logarithm) and a square root may be used. Also, when calculating each vector here, the feature value set creation unit 1 calculates and uses a feature value normalized for each feature value according to equation (1). Also, pre calculating the ratio ωi and omega _o, it may be set a unique value as a numerical value separation increases.

そして、特徴量セット作成部１は、各対象クラスタ毎に、前記（３）式を用いて、他のクラスタとの前記判別基準値λを、学習データを構成する特徴量のいずれかまたは全ての組合せに対して計算し、計算された判別基準値λを大きい順に列べ、判別基準値λの順位リストを出力する。
ここで、特徴量セット作成部１は、最も大きな判別基準値λに対応する特徴量の組合せを、対象クラスタの特徴量セットとして、判別基準値λの値とともに、クラスタの識別情報に対応させて、特徴量セット記憶部４へ記憶する。Then, the feature quantity set creation unit 1 uses, for each target cluster, the discrimination reference value λ with other clusters using any one or all of the feature quantities constituting the learning data, using the equation (3). The combinations are calculated, the calculated discrimination reference values λ are listed in descending order, and a ranking list of the discrimination reference values λ is output.
Here, the feature quantity set creation unit 1 associates the combination of feature quantities corresponding to the largest discrimination reference value λ as the feature quantity set of the target cluster with the identification information of the cluster together with the value of the discrimination reference value λ. And stored in the feature value set storage unit 4.

上述した判別基準値λの決定は、図２（ａ）に示すように、特徴量セット作成部１は、各クラスタの特徴量セットの設定を行う際、学習データおよび分類対象データの特徴量がａ，ｂ，ｃ，ｄの４つである場合、この４つの特徴量全て，複数，いずれか１つの全組合せにおける判別基準値λを全て計算する。
そして、特徴量セット作成部１は、最も高い数値、たとえば、図２（ａ）においては、特徴量ｂ，ｃの組合せを選択する。As shown in FIG. 2A, the determination criterion value λ described above is determined when the feature quantity set creation unit 1 sets the feature quantities of the learning data and the classification target data when setting the feature quantity set of each cluster. When there are four a, b, c, and d, all the discrimination reference values λ are calculated for all of these four feature amounts and for any one of the plurality of combinations.
Then, the feature quantity set creation unit 1 selects the highest numerical value, for example, the combination of the feature quantities b and c in FIG.

また、他の判別基準値λの方法として、図２（ｂ）に記載したように、ＢＳＳ法、すなわち分類対象データの集合に含まれる特徴量ｎ個全てを用いた判別基準値λを演算し、次に特徴量ｎ個の集合からｎ−１個を取り出す組合せ全てに対し、判別基準値λを演算する。そして、そのｎ−１個の判別基準値λから最大値の組合せを選択し、今度は、このｎ−１個の特徴量からｎ−２個の組合せ全てに対し、判別基準値λを演算する。このように、順位、１個ずつ特徴量を集合から減少させて、減少させた特徴量の集合から、さらに１個減少させた組合せを選択して判別基準値λを演算して、少ない特徴量数で判別できる組合せを選択するよう、特徴量セット作成部１を構成してもよい。 As another method for determining the discrimination reference value λ, as described in FIG. 2B, the discrimination reference value λ using all the n feature quantities included in the set of classification target data, that is, the BSS method is calculated. Next, the discrimination reference value λ is calculated for all combinations in which n−1 are extracted from the set of n feature values. Then, a combination of maximum values is selected from the n−1 discrimination reference values λ, and this time, the discrimination reference value λ is calculated for all n-2 combinations from the n−1 feature amounts. . As described above, the feature amount is decreased one by one from the set, and a combination of the reduced feature amount is further selected, and the combination of the decreased feature amount is selected to calculate the discrimination reference value λ. The feature quantity set creation unit 1 may be configured to select a combination that can be identified by a number.

また、さらに、他の判別基準値λの方法として、図２（ｃ）に記載したように、ＦＳＳ法、すなわち分類対象データの集合に含まれる特徴量ｎ個から特徴量の全種類を１個ずつ読み出し、各特徴量の判別基準値λを演算し、この中から最大の判別基準値を有する特徴量を選択する。次に、この特徴量とそれ以外の特徴量との２つの特徴量からなる組合せを生成し、それぞれの組合せに対する判別基準値λを計算する。そして、その組合せの中から最大の判別基準値を有する組合せを選択する。次に、この組合せと、この組合せに含まれていない特徴量との３つの特徴量からなる組合せを生成し、それぞれの判別基準値λを生成する。このように、順次、直前の特徴量の組合せから最大の判別基準値λを有する特徴量を選択し、組合せの特徴量を組合せに対し、この組合せに存在しない特徴量を１個増加させ、増加させた組合せの特徴量の判別基準値λを計算し、この組合せから最大の判別基準値λを有する組合せを選択し、さらにこの組合せに存在しない特徴量を１個増加させた特徴量の組合せの判別基準値λを演算して、最終的に、判別基準値λを計算した全ての組合せから、判別基準値λが最大となる組合せを特徴量セットとして選択するよう、特徴量セット作成部１を構成してもよい。 Furthermore, as another method of determining the discrimination reference value λ, as described in FIG. 2C, the FSS method, that is, one type of all feature types from n feature amounts included in a set of classification target data is set. Each is read out, the discrimination reference value λ of each feature quantity is calculated, and the feature quantity having the maximum discrimination reference value is selected from these. Next, a combination of two feature quantities, that is, the feature quantity and other feature quantities is generated, and a discrimination reference value λ for each combination is calculated. Then, the combination having the maximum discrimination reference value is selected from the combinations. Next, a combination composed of three feature amounts, that is, this combination and a feature amount not included in the combination is generated, and each discrimination reference value λ is generated. In this way, the feature quantity having the maximum discrimination reference value λ is sequentially selected from the immediately preceding combination of feature quantities, the feature quantity of the combination is increased by one, and the feature quantity that does not exist in this combination is increased. The feature value discrimination reference value λ of the combination is calculated, a combination having the maximum discrimination reference value λ is selected from the combinations, and a feature value combination obtained by increasing one feature value not existing in the combination is selected. The feature quantity set creation unit 1 is configured to select the combination that maximizes the discrimination reference value λ as a feature quantity set from all the combinations for which the discrimination reference value λ is calculated and finally calculated. It may be configured.

次に、判別基準値λによって、クラスタリングに用いる特徴量セットの選択の有効性を、図３および４により示す。
図３には、特徴量ａ，ｂ，ｃ，ｄ，ｅから、特徴量セットを選択する組合せとして、特徴量ａおよびｇの組合せと、特徴量ａおよびｈの組合せと、特徴量ｄおよびｅとの組合せを抽出し、これらの組合せから、クラスタ１と、クラスタ２および３とにおいて、従来例に比して高い分類特性を有する特徴量セットの選択について説明する。
図３において、μ1は前記μ_ｉに、μ2は前記μ_ｏに、σ1は前記σ_ｉに、σ2は前記σ_ｏに、ω1は前記ω_ｉ、ω2は前記ω_ｏにそれぞれ対応している。Next, the effectiveness of selecting a feature set used for clustering based on the discrimination reference value λ is shown in FIGS.
FIG. 3 shows combinations of feature amounts a and g, combinations of feature amounts a and h, and feature amounts d and e as combinations for selecting a feature amount set from feature amounts a, b, c, d, and e. And a selection of a feature quantity set having higher classification characteristics in the cluster 1 and in the clusters 2 and 3 than in the conventional example will be described.
In FIG. 3, .mu.1 is the mu _i, .mu.2 for the mu _o, .sigma.1 is the sigma _i, .sigma. @ 2 is the sigma _o, .omega.1 is the omega _i, .omega.2 correspond respectively to the omega _o.

この中で、前記組合せにおいて、最も判別基準値λの値が大きいのは特徴量ａおよびｈの組合せであり、この組合せをクラスタ１と、それ以外のクラスタとの分離に用い、クラスタ１とそれ以外のクラスタ（クラスタ２及び３）との分類結果を図４により確認する。
図４において、横軸は特徴量の組合せを用いて演算したマハラノビス距離のｌｏｇの数値を示し、縦軸は対応する数値を有する分離対象データの数（ヒストグラム）を示している。ここで、横軸の数値１．４は、マハラノビス距離のｌｏｇの数値が１．４未満かつ１．２以上（１．４の左側の数値）であることを意味する。他の横軸上の数値も同様である。また、図４において１．４≦は１．４以上であることを表す。図４のマハラノビス距離は、クラスタ１に対応する特徴量セットを用いて、クラスタ１およびそれ以外のクラスタに属する分類対象データに対して各々計算したものである。
図４（ａ）が特徴量ａおよびｇの組合せを用いてマハラノビス距離を演算した例であり、図４（ｂ）が特徴量ａおよびｈの組合せを用いてマハラノビス距離を演算した例であり、図４（ｃ）が特徴量ｄおよびｅの組合せを用いてマハラノビス距離を演算した例です。
図４におけるヒストグラムを見ると、判別基準値λの数値が大きいと、クラスタ１と他のクラスタとの分類が良く行われていることが判る。Among these combinations, the combination of the feature quantities a and h has the largest discriminant reference value λ in the above combination, and this combination is used to separate cluster 1 from other clusters. The classification results with the other clusters (clusters 2 and 3) are confirmed with reference to FIG.
In FIG. 4, the horizontal axis represents the logarithm value of the Mahalanobis distance calculated using the combination of feature amounts, and the vertical axis represents the number of separation target data (histograms) having corresponding numerical values. Here, the numerical value 1.4 on the horizontal axis means that the numerical value of the log of Mahalanobis distance is less than 1.4 and 1.2 or more (the numerical value on the left side of 1.4). The same applies to the numerical values on the other horizontal axes. Further, in FIG. 4, 1.4 ≦ represents that 1.4 or more. The Mahalanobis distance in FIG. 4 is calculated for each of the classification target data belonging to cluster 1 and other clusters using the feature set corresponding to cluster 1.
FIG. 4A is an example in which the Mahalanobis distance is calculated using a combination of feature quantities a and g, and FIG. 4B is an example in which the Mahalanobis distance is calculated using a combination of feature quantities a and h. Fig. 4 (c) shows an example of calculating the Mahalanobis distance using a combination of feature values d and e.
From the histogram shown in FIG. 4, it can be seen that if the numerical value of the discrimination reference value λ is large, the cluster 1 is well classified into other clusters.

次に、図５および図６を参照して、図１の第１の実施形態によるクラスタリングシステムの動作を説明する。図５は第１の実施形態によるクラスタリングシステムの特徴量セット作成部１の動作例を示すフローチャートであり、図６は分類対象データのクラスタリングの動作例を示すフローチャートである。
以下の説明において、たとえば、分類対象データがガラス物品に付けられた傷の特徴量の集合である場合、この特徴量として「ａ：キズ(scratch)の長さ」，「ｂ：キズの面積」，「ｃ：キズの幅」，「ｄ：キズ部分を含む所定領域の透過率」，「ｅ：キズを含む所定領域の反射率」などが、画像処理や測定結果から得られるとする。したがって、特徴量の集合（以下、特徴量集合とする）としては｛ａ，ｂ，ｃ，ｄ，ｅ｝となる。また、本実施形態においては、クラスタリングに用いる距離を、規格化した特徴量を用いたマハラノビス距離として算出する。ここで、本実施形態における上記ガラス物品は、一例として、板ガラスやディスプレイ用ガラス基板が挙げられる。Next, the operation of the clustering system according to the first embodiment of FIG. 1 will be described with reference to FIGS. FIG. 5 is a flowchart showing an operation example of the feature value set creation unit 1 of the clustering system according to the first embodiment, and FIG. 6 is a flowchart showing an operation example of clustering of the classification target data.
In the following description, for example, when the classification target data is a set of feature values of scratches attached to a glass article, the feature values are “a: scratch length” and “b: scratch area”. , “C: width of scratch”, “d: transmittance of a predetermined region including a scratch portion”, “e: reflectance of a predetermined region including a scratch”, and the like are obtained from image processing and measurement results. Therefore, a set of feature values (hereinafter referred to as a feature value set) is {a, b, c, d, e}. In this embodiment, the distance used for clustering is calculated as the Mahalanobis distance using the standardized feature quantity. Here, examples of the glass article in the present embodiment include plate glass and a glass substrate for display.

Ａ．特徴量セット作成処理（図５のフローチャート対応）
ユーザは、ガラスに付けられたキズを検出し、この画像を撮像して画像データを得るとともに、この画像データからキズ部分の長さの測定などの特徴量の抽出を画像処理により行い、前記特徴量の集合からなる特徴量データを収集する。そして、ユーザはキズの発生原因や形状などの分類したい各クラスタに対して、予め判っている発生原因や形状などの情報に基づき、特徴量データを学習データとして振り分け、各クラスタの学習データの母集団とし、図示しない処理端末からクラスタの識別情報に対応させて、クラスタデータベース５へ記憶させる（ステップＳ１）。A. Feature quantity set creation processing (corresponding to the flowchart in FIG. 5)
The user detects scratches attached to the glass, captures the image, obtains image data, extracts feature values such as measurement of the length of the scratch portion from the image data, and performs image processing. Collect feature data consisting of a set of quantities. Then, the user assigns feature data as learning data to each cluster to be classified, such as the cause and shape of scratches, based on information such as the cause and shape that is known in advance, and the learning data of each cluster is the mother data. As a group, it is stored in the cluster database 5 in correspondence with the cluster identification information from a processing terminal (not shown) (step S1).

次に、特徴量セット作成部１は、各クラスタに対する特徴量セットを生成する制御命令を、前記処理端末から入力すると、クラスタデータベース５から、各クラスタの識別情報に対応して、学習データの母集団を読み込む。
そして、特徴量セット作成部１は、各クラスタ毎に、クラスタ内母集団における各特徴量の平均値および標準偏差を算出し、この平均値および標準偏差を用いて、（１）式から、各学習データにおける規格化された特徴量を算出する。Next, when a control command for generating a feature value set for each cluster is input from the processing terminal, the feature value set creation unit 1 reads from the cluster database 5 the learning data base corresponding to the identification information of each cluster. Read a group.
And the feature-value set creation part 1 calculates the average value and standard deviation of each feature-value in a population in a cluster for every cluster, and uses this average value and standard deviation, from (1) Formula, A normalized feature amount in the learning data is calculated.

次に、特徴量セット作成部１は、特徴量集合に含まれる特徴量の全ての組合せの特徴量セット毎に、（３）式により判別基準値λを算出する。
このとき、特徴量セット作成部１は、クラスタ毎に、クラスタ内母集団の規格化された特徴量を用いて、各特徴量セットに対応した特徴量からなるベクトルの平均値（重心ベクトル）μ_ｉと、クラスタ内母集団における特徴量セットに対応する特徴量からなる学習データのベクトルの標準偏差σ_ｉと、クラスタ外母集団の規格化された特徴量を用いて、各特徴量セットに対応した特徴量からなるベクトルの平均値（重心ベクトル）μ_ｏと、クラスタ外母集団における特徴量セットに対応する特徴量からなる学習データのベクトルの標準偏差σ_ｏと、全学習データ数におけるクラスタ内母集団の学習データ数の比率ω_ｉと、全学習データ数におけるクラスタ外母集団の学習データ数の比率ω_ｏとを算出する。Next, the feature quantity set creation unit 1 calculates a discrimination reference value λ by the expression (3) for each feature quantity set of all combinations of feature quantities included in the feature quantity set.
At this time, the feature value set creation unit 1 uses, for each cluster, the standardized feature value of the population in the cluster, and the average value (centroid vector) μ of the vectors composed of the feature values corresponding to each feature value set. using a _i, and the standard deviation sigma _i of the vector of the learning data consisting of feature amount corresponding to the characteristic quantity set in the cluster the population, the normalized feature amount of the off-cluster population, corresponding to each feature quantity set Mean vector (centroid vector) μ _o composed of the selected feature quantity, standard deviation σ _o of the learning data vector composed of the feature quantity corresponding to the feature quantity set in the non-cluster population, and within the cluster in the total number of learning data The ratio ω _i of the learning data number of the population and the ratio ω _o of the learning data number of the non-cluster population in the total number of learning data are calculated.

そして、特徴量セット作成部１は、前記重心ベクトルμ_ｉ，μ_ｏ，標準偏差σ_ｉ，σ_ｏ，比率ω_ｉ，ω_ｏを用いて、（３）式により、各クラスタ毎に他のクラスタとの距離を判別する判別基準値λを、各クラスタ毎に、特徴量集合の全ての組合せの特徴量セットに対して計算する。
全ての判別基準値λの計算が終了すると、特徴量セット作成部１は、各クラスタ毎に、大きい順に判別基準値λを列べ、最も大きな判別基準値λに対応する特徴量セットを、各クラスタへの所属を判定する際に、距離の算出に用いる特徴量の組合せの集合を示す特徴量セットとして検出する（ステップＳ２）。Then, the feature quantity set creation unit 1 uses the center-of-gravity vectors μ _i , μ _o , standard deviations σ _i , σ _o , ratios ω _i , ω _o and uses the other clusters for each cluster according to the equation (3). A discriminant reference value λ for discriminating the distance between the feature amount and the feature amount set of all combinations of feature amount sets is calculated for each cluster.
When the calculation of all the discrimination reference values λ is completed, the feature quantity set creation unit 1 lists the discrimination reference values λ in the descending order for each cluster, and sets the feature quantity set corresponding to the largest discrimination reference value λ for each cluster. When determining affiliation to a cluster, it is detected as a feature amount set indicating a set of feature amount combinations used for calculating a distance (step S2).

次に、特徴量セット作成部１は、距離計算部３での距離の計算に用いるため、各特徴量セットに対応した特徴量間の相関係数Ｒと、各クラスタ内母集団における学習データの特徴量の平均値avg.（i）および標準偏差ｓtd.（i）とを算出する（ステップＳ３）。 Next, since the feature value set creation unit 1 is used for distance calculation in the distance calculation unit 3, the correlation coefficient R between the feature values corresponding to each feature value set and the learning data in the population in each cluster are used. The average value avg. (I) and the standard deviation std. (I) of the feature amount are calculated (step S3).

次に、特徴量セット作成部１は、前記判別基準値λから補正係数λ^{−（１／２）}を算出する。この補正係数λ^{−（１／２）}は、各特徴量セット間の標準化をとるものである。クラスタによって、他のクラスタとの距離がばらついているため、分類精度を上げるために、特徴量セット間の標準化を行う必要がある。また、補正係数としてλ^{−（１／２）}ではなく、log（λ）としたり、あるいは単純に（μ_ｏ−μ_ｉ）を用いても良く、λを含む関数であって特徴量セット間の標準化が行えるものであればいずれでもよい。
また、上記（３）式において、対象クラスタ外母集団の特徴量セットにおける重心ベクトルμ_ｏを算出する際、対象クラスタ外母集団における学習データとして以下の３つの種類のいずれかを選択して算出する。
ａ．全学習データにおける対象クラスタ外母集団の全ての学習データ
ｂ．上記対象クラスタ外母集団における分類の目的に対応する特定の学習データ
ｃ．特徴量の選択に用いた学習データにおける対象クラスタ外母集団の学習データ
ここで、ｂ．の分類の目的とは注目しているクラスタと明確に差を付けて区別することであり、学習データとしてはこの差を付けたい他のクラスタに含まれる学習データを用いる。
そして、特徴量セット作成部１は、各クラスタの識別情報毎に対応させて、特徴量セットと、特徴量セットに対応した補正係数、本実施形態にてはλ^{−（１／２）}の値と、逆行列Ｒ^−１と、平均値avg.（i）と、標準偏差ｓtd.（i）を、特徴量セット記憶部４に、距離計算データとして記憶する（ステップＳ４）。Next, the feature quantity set creation unit 1 calculates a correction coefficient λ− ^(1/2) from the discrimination reference value λ. The correction coefficient λ- ^(1/2) is standardized between the feature quantity sets. Since the distance from other clusters varies depending on the cluster, it is necessary to perform standardization between feature quantity sets in order to increase classification accuracy. Further, instead of λ− ^(1/2) as a correction coefficient, log (λ) may be used, or simply (μ _o −μ _i ) may be used, which is a function including λ between feature quantity sets. Any standardization can be used.
Further, in the above equation (3), when calculating the centroid vector μ _o in the feature set of the population outside the target cluster, the calculation is performed by selecting one of the following three types as learning data in the population outside the target cluster: To do.
a. All learning data of the population outside the target cluster in all learning data b. Specific learning data corresponding to the purpose of classification in the population outside the target cluster c. Learning data of the population outside the target cluster in the learning data used for selecting the feature amount, where b. The purpose of the classification is to clearly distinguish the cluster from the focused cluster, and learning data included in another cluster to which the difference is desired is used as the learning data.
Then, the feature quantity set creation unit 1 associates the identification information of each cluster with the feature quantity set and the correction coefficient corresponding to the feature quantity set, which is a value of λ− ^{(1/2) in} the present embodiment. , The inverse matrix R- ¹ , the average value avg. (I), and the standard deviation std. (I) are stored as distance calculation data in the feature value set storage unit 4 (step S4).

Ｂ．クラスタリング処理（図６のフローチャート対応）
分類対象データが入力されると、特徴量抽出部２は、各クラスタの識別信号により、クラスタ毎に対応した特徴量セットを、特徴量セット記憶部４から読み出す。
そして、特徴量抽出部２は、読み出した特徴量セットにおける特徴量の種別に対応して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を内部記憶部に記憶する（ステップＳ１１）。B. Clustering processing (corresponding to the flowchart in FIG. 6)
When the classification target data is input, the feature quantity extraction unit 2 reads a feature quantity set corresponding to each cluster from the feature quantity set storage unit 4 based on the identification signal of each cluster.
Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the type of the feature quantity in the read feature quantity set, and extracts the feature quantity corresponding to each identification information of the cluster. The obtained feature amount is stored in the internal storage unit (step S11).

次に、距離計算部３は、分類対象データから抽出した各特徴量を、特徴量セット記憶部４から、該特徴量に対応する平均値avg.（i）と標準偏差ｓtd.（i）を読み出し、前記（２）式の演算を行うことにより規格化し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。
そして、距離計算部３は、上述のように得られたＶ（i）の要素からなる行列Ｖを生成し、この行列Ｖの転置行列Ｖ^Ｔを計算し、（３）式により、順次、分類対象データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、内部記憶部に記憶する（ステップＳ１２）。Next, the distance calculation unit 3 obtains the average value avg. (I) and the standard deviation std. (I) corresponding to the feature amount from the feature amount set storage unit 4 for each feature amount extracted from the classification target data. Normalization is performed by reading out and performing the calculation of the equation (2), and the feature value stored in the internal storage unit is replaced with the normalized feature value.
Then, the distance calculation unit 3 generates a matrix V of elements of V obtained as described above (i), and calculates a transpose matrix V ^T of the matrix V, by (3), successively, classification The Mahalanobis distance between the target data and each cluster is calculated, and stored in the internal storage unit in correspondence with the identification information of each cluster (step S12).

次に、距離計算部３は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数λ^{−（１／２）}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える（ステップＳ１３）。また、補正係数を乗算する際、マハラノビス距離のlogまたは平方根を計算した後に乗算するようにしてもよい。
そして、距離計算部３は、内部記憶部における各クラスタ間との補正距離を比較し（ステップＳ１４）、最小の補正距離を検出し、その補正距離に対応する識別情報のクラスタを、分類対象データの属するクラスタとし、クラスタデータベース５に対し、分類先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する（ステップＳ１５）。Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient λ ^{− (1/2)} corresponding to the feature amount set to obtain a correction distance, and replaces each with the Mahalanobis distance ( Step S13). Further, when multiplying the correction coefficient, it may be performed after calculating the log or square root of the Mahalanobis distance.
Then, the distance calculation unit 3 compares the correction distances between the clusters in the internal storage unit (step S14), detects the minimum correction distance, and identifies the cluster of identification information corresponding to the correction distance as the classification target data. The classification target data is stored in the cluster database 5 in correspondence with the identification information of the classification destination cluster (step S15).

＜第２の実施形態＞
上述した第１の実施形態は、クラスタリングを行う際に用いる特徴量セットを、クラスタ毎に１種類として説明したが、以下に説明する第２の実施形態のように、クラスタ毎に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノビス距離を演算し、補正距離を算出して、この補正距離を小さい順番に並び替え、上位の所定の順位以内の補正距離により、予め設定された規則に応じて、分類対象データの属するクラスタとしてもよい。<Second Embodiment>
In the first embodiment described above, the feature amount set used for clustering is described as one type for each cluster. However, as in the second embodiment described below, the feature amount set is set for each cluster. Set multiple, calculate the Mahalanobis distance corresponding to each feature amount set, calculate the correction distance, rearrange the correction distance in ascending order, and set in advance by the correction distance within the upper predetermined rank Depending on the rule, it may be a cluster to which the classification target data belongs.

すなわち、本実施形態における距離計算部３は、特徴量セット毎に得られた分類対象データと各クラスタとの距離において、この距離の順位に基づいて設定された分類対象データの各クラスタへの分類基準を示す規則パターンにより、分類対象データがいずれのクラスタに属するかを検出する。
以下、第２の実施形態の構成は、図１に示す第１の実施形態と同様であり、同一の符号を各構成に付し、各構成において第１の実施形態と異なる動作のみを、図７を用いて説明する。第２の実施形態においては、学習データから上記規則パターンを設定する処理がある。図７は規則パターンを設定する距離の順位に対するパターン学習の動作例を示すフローチャートである。図８及び図９は第２の実施形態におけるクラスタリングの動作例を示すフローチャートである。That is, the distance calculation unit 3 in this embodiment classifies the classification target data set for each cluster based on the rank order of the distances between the classification target data obtained for each feature quantity set and each cluster. Which cluster the classification target data belongs to is detected by a rule pattern indicating a reference.
Hereinafter, the configuration of the second embodiment is the same as that of the first embodiment shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the first embodiment in each configuration are illustrated in FIG. 7 for explanation. In the second embodiment, there is a process of setting the rule pattern from learning data. FIG. 7 is a flowchart showing an example of pattern learning operation with respect to the rank order for setting the rule pattern. 8 and 9 are flowcharts showing an example of clustering operation in the second embodiment.

また、第１の実施形態において、特徴量セットを作成する際、特徴量セット作成部１は、各クラスタ毎に、特徴量の組合せとしての複数の特徴量セットに対して判別基準値λを算出し、複数求められた判別基準値λの最大値に対応する特徴量セットを、各クラスタの特徴量セットとして設定した。
一方、第２の実施形態において、特徴量セット作成部１は、各クラスタ毎に、他のクラスタの１つまたは複数の組合せあるいは他の全てのクラスタに対して、それぞれ特徴量の組合せ数に対応する特徴量セットの最大値を設定することにより、複数の判別基準値λを求め、各クラスタ毎に他のクラスタと分離するための複数の特徴量セットを設定する。
そして、特徴量セット作成部１は、各特徴量セット毎に距離計算データを求め、クラスタの識別情報に対応させて、複数の特徴量セットと、各特徴量セットの距離計算データを特徴量セット記憶部４に記憶させる。In the first embodiment, when creating a feature quantity set, the feature quantity set creation unit 1 calculates a discrimination reference value λ for a plurality of feature quantity sets as a combination of feature quantities for each cluster. Then, a feature value set corresponding to the maximum value of the plurality of obtained discrimination reference values λ is set as a feature value set for each cluster.
On the other hand, in the second embodiment, the feature quantity set creation unit 1 corresponds to the number of combinations of feature quantities for each cluster with respect to one or more combinations of other clusters or all other clusters. By setting the maximum value of the feature quantity set to be obtained, a plurality of discrimination reference values λ are obtained, and a plurality of feature quantity sets for separation from other clusters is set for each cluster.
Then, the feature quantity set creation unit 1 obtains distance calculation data for each feature quantity set, and associates the feature calculation sets with a plurality of feature quantity sets and the distance calculation data of each feature quantity set in correspondence with the cluster identification information. Store in the storage unit 4.

そして、図７において、学習データが入力されると、特徴量抽出部２は、各クラスタの識別信号により、クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部４から読み出す。
そして、特徴量抽出部２は、読み出した各特徴量セットにおける特徴量の種別に対応して、学習データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する（ステップＳ２１）。In FIG. 7, when learning data is input, the feature quantity extraction unit 2 reads a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 in accordance with the identification signal of each cluster.
Then, the feature quantity extraction unit 2 extracts the feature quantity from the learning data for each cluster corresponding to the type of feature quantity in each read feature quantity set, and extracts the feature quantity corresponding to each identification information of the cluster. The obtained feature amount is stored in the internal storage unit for each feature amount set (step S21).

次に、距離計算部３は、学習データから抽出した各特徴量を、特徴量セット記憶部４から、特徴量セット毎に該特徴量に対応する平均値avg.（i）と標準偏差ｓtd.（i）を読み出し、前記（２）式の演算を行うことにより規格化し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。
そして、距離計算部３は、上述のように得られたＶ（i）の要素からなる行列Ｖを生成し、この行列Ｖの転置行列Ｖ^Ｔを計算し、（３）式により、順次、学習データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴量セット毎に内部記憶部に記憶する（ステップＳ２２）。Next, the distance calculation unit 3 extracts each feature quantity extracted from the learning data from the feature quantity set storage unit 4 for each feature quantity set, the average value avg. (I) and the standard deviation std. (I) is read out, normalized by performing the calculation of equation (2), and the feature quantity stored in the internal storage unit is replaced with the normalized feature quantity.
Then, the distance calculation unit 3 generates a matrix V composed of the elements of V (i) obtained as described above, calculates a transposed matrix V ^T of this matrix V, and sequentially learns according to equation (3). The Mahalanobis distance between the data and each cluster is calculated, and stored in the internal storage unit for each feature value set in correspondence with the identification information of each cluster (step S22).

次に、距離計算部３は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数λ^{−（１／２）}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える（ステップＳ２３）。
そして、距離計算部３は、内部記憶部における各クラスタ間との補正距離を小さい順に並べ替え（小さい補正距離ほど上位となる順位に並べ替え）、すなわち分類対象データとの補正距離の小さいクラスタの識別情報が上位となる順番に並べる（ステップＳ２４）。Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient λ ^{− (1/2)} corresponding to the feature amount set to obtain a correction distance, and replaces each with the Mahalanobis distance ( Step S23).
Then, the distance calculation unit 3 rearranges the correction distances between the clusters in the internal storage unit in ascending order (rearranges the smaller correction distances to the higher rank), that is, the clusters with the smaller correction distances from the classification target data. The identification information is arranged in the order of higher rank (step S24).

次に、距離計算部３は、小さい方（上位）からｎ番目までの各補正距離に対応するクラスタの識別情報を検出し、そのｎ個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う。
そして、距離計算部３は、各学習データの各クラスタの識別情報のカウント数のパターンが、同一のクラスタに含まれる学習データに共通する規則パターンを検出する。
たとえば、ｎを１０としたとき、クラスタＢの学習データの場合、クラスタＡが５個，クラスタＢが３個、クラスタＣが２個となるカウント数のパターンとなることが検出されるとこれを規則Ｒ１とする。
また、クラスタＣの学習データの場合、クラスタＣが３個検出されると、クラスタＡが７個で、クラスタＢが０個であっても、必ずクラスタＣであることが共通であると、クラスタＣのカウント数が３以上であれば、他のクラスタのカウント数に無関係にクラスタＣとする規則Ｒ２とする。
また、クラスタＡの学習データの場合、クラスタＡが上位から1番目及び２番目を占めた並びのパターンのとき、クラスタＢのカウント数が８個であっても、他のクラスタのカウント数に無関係にクラスタＡとする規則Ｒ３とする。Next, the distance calculation unit 3 detects the identification information of the cluster corresponding to each correction distance from the smaller (higher) to the nth, and counts the number of identification information for each cluster included in the n pieces, That is, a voting process is performed for each cluster.
Then, the distance calculation unit 3 detects a rule pattern in which the number pattern of identification information of each cluster of each learning data is common to learning data included in the same cluster.
For example, when n is 10, in the case of learning data for cluster B, if it is detected that the number of count patterns is 5 for cluster A, 3 for cluster B, and 2 for cluster C, Let rule R1.
Further, in the case of learning data of cluster C, if three clusters C are detected, even if there are seven clusters A and zero clusters B, it is always common to be cluster C. If the count number of C is 3 or more, the rule R2 is set to cluster C regardless of the count numbers of other clusters.
Further, in the case of learning data of cluster A, when cluster A occupies the first and second patterns from the top, even if the count number of cluster B is eight, it is not related to the count number of other clusters. Let R3 be the rule A for cluster A.

上述したように、同一クラスタに分類される各学習データが有する各クラスタのカウント数の規則性を検出し、各クラスタの識別情報毎にパターンテーブルとして内部に記憶しておく。ここで、規則は各クラスタに１つでもよいし、複数設定しておいてもよい。また、上述の説明において、距離計算部３が規則パターンを抽出するとしたが、ユーザが各クラスタへの分類の精度を変えるために、カウント数あるいは並びの規則パターンを任意に設定してもよい。
クラスタによっては、他のクラスタと特徴情報の特性が似ているものもあり、複数のクラスタの関連性、すなわち各クラスタのカウント数あるいは上位からの並びのパターンである対象パターンから、分類対象データの分類を行う方が精度の高い場合もあり、本実施形態はその点を補完するものである。As described above, the regularity of the count number of each cluster included in each learning data classified into the same cluster is detected and stored internally as a pattern table for each identification information of each cluster. Here, one rule may be set for each cluster, or a plurality of rules may be set. In the above description, the distance calculation unit 3 extracts the rule pattern. However, in order to change the accuracy of classification into each cluster, the user may arbitrarily set the count number or the arrangement rule pattern.
Depending on the cluster, the characteristics of the characteristic information may be similar to those of other clusters. From the relevance of multiple clusters, that is, the count pattern of each cluster or the target pattern that is the pattern from the top, the classification target data In some cases, classification is more accurate, and the present embodiment supplements this point.

次に、上述したテーブルに記述された規則を用いた第２の実施形態のクラスタリングの処理について、図８のフローチャートを用いて説明する。
分類対象データが入力されると、特徴量抽出部２は、各クラスタの識別信号により、クラスタ毎に対応した複数の特徴量セットを、特徴量セット記憶部４から読み出す。
そして、特徴量抽出部２は、読み出した各特徴量セットにおける特徴量の種別に対応して、分類対象データから特徴量を、各クラスタ毎に抽出し、クラスタの識別情報それぞれに対応させて、抽出した特徴量を特徴量セット毎に内部記憶部に記憶する（ステップＳ３１）。Next, clustering processing according to the second embodiment using the rules described in the above-described table will be described with reference to the flowchart of FIG.
When the classification target data is input, the feature quantity extraction unit 2 reads a plurality of feature quantity sets corresponding to each cluster from the feature quantity set storage unit 4 in accordance with the identification signal of each cluster.
Then, the feature quantity extraction unit 2 extracts the feature quantity from the classification target data for each cluster corresponding to the type of the feature quantity in each read feature quantity set, and corresponds to each identification information of the cluster, The extracted feature quantity is stored in the internal storage unit for each feature quantity set (step S31).

次に、距離計算部３は、分類対象データから抽出した各特徴量を、特徴量セット記憶部４から、特徴量セット毎に該特徴量に対応する平均値avg.（i）と標準偏差ｓtd.（i）を読み出し、前記（２）式の演算を行うことにより規格化し、内部記憶部に記憶されている特徴量を、規格化した特徴量に置き換える。
そして、距離計算部３は、上述のように得られたＶ（i）の要素からなる行列Ｖを生成し、この行列Ｖの転置行列Ｖ^Ｔを計算し、（３）式により、順次、分類対象データと各クラスタとの間のマハラノビス距離を計算し、各クラスタの識別情報に対応させて、各特徴量セット毎に内部記憶部に記憶する（ステップＳ３２）。Next, the distance calculation unit 3 extracts each feature quantity extracted from the classification target data from the feature quantity set storage unit 4 for each feature quantity set, the average value avg. (I) and the standard deviation std. . (I) is read out, normalized by performing the calculation of the equation (2), and the feature quantity stored in the internal storage unit is replaced with the normalized feature quantity.
Then, the distance calculation unit 3 generates a matrix V of elements of V obtained as described above (i), and calculates a transpose matrix V ^T of the matrix V, by (3), successively, classification The Mahalanobis distance between the target data and each cluster is calculated, and stored in the internal storage unit for each feature value set corresponding to the identification information of each cluster (step S32).

次に、距離計算部３は、計算結果の前記マハラノビス距離に対して、特徴量セットに対応する補正係数λ^{−（１／２）}を乗算し、補正距離を求めて、それぞれマハラノビス距離と置き換える（ステップＳ３３）。
そして、距離計算部３は、内部記憶部における各クラスタ間との補正距離を小さい順に、並べ替え、すなわち分類対象データとの補正距離の小さいクラスタの識別情報が上位となる順番に並べる（ステップＳ３４）。
並べ替えた後、距離計算部３は、小さい方（上位）からｎ番目までの各補正距離に対応するクラスタの識別情報を検出し、そのｎ個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う。Next, the distance calculation unit 3 multiplies the Mahalanobis distance of the calculation result by a correction coefficient λ ^{− (1/2)} corresponding to the feature amount set to obtain a correction distance, and replaces each with the Mahalanobis distance ( Step S33).
Then, the distance calculation unit 3 rearranges the correction distances between the clusters in the internal storage unit in ascending order, that is, arranges the identification information of the clusters with the small correction distances to the classification target data in the order of higher rank (step S34). ).
After the rearrangement, the distance calculation unit 3 detects the identification information of the cluster corresponding to each correction distance from the smaller (higher) to the nth, and calculates the number of identification information for each cluster included in the n pieces. Counting, that is, voting is performed for each cluster.

次に、距離計算部３は、各分類対象データの上位ｎ個における各クラスタに対するカウント数のパターン（あるいは並びのパターン）が、内部に記憶したテーブルに存在するか否かの照合処理を行う（ステップＳ３５）。
そして、距離計算部３は、上述した照合の結果、分類対象データの対象パターンに合致する規則パターンがテーブルに記述されていることを検出すると、この分類対象データがその合致した規則に対応する識別情報のクラスタに属すると判定し、分類対象データをこのクラスタに分類する（ステップＳ３６）。Next, the distance calculation unit 3 performs a collation process to determine whether or not the count number pattern (or arrangement pattern) for each cluster in the top n pieces of each classification target data exists in the table stored therein ( Step S35).
Then, when the distance calculation unit 3 detects that the rule pattern matching the target pattern of the classification target data is described in the table as a result of the above-described collation, the classification target data identifies the rule corresponding to the matched rule. It determines with belonging to the cluster of information, and classify | categorizes object data into this cluster (step S36).

また、上述したテーブルに記述された規則を用いた第２の実施形態の他のクラスタリングの処理について、図９のフローチャートを用いて説明する。
この図９に示す他のクラスタリングの処理において、ステップＳ３１〜ステップＳ３５までの処理は、図８に示す処理と同様であり、距離計算部３は、ステップＳ３５においてすでに述べたように、テーブルに記憶されている規則パターンから、分類対象データの対象パターンとの照合処理を行う。
そして、距離計算部３は、上記照合結果において、上記対象パターンと合致する規則パターンが検索されたか否かを検出し、合致する規則パターンが検索されたことを検出した場合、処理をステップＳ４７へ移行し、一方、合致する規則パターンが検索されないことを検出した場合、処理をステップＳ４８へ移行する（ステップＳ４６）。Further, another clustering process of the second embodiment using the rules described in the above-described table will be described with reference to the flowchart of FIG.
In the other clustering processing shown in FIG. 9, the processing from step S31 to step S35 is the same as the processing shown in FIG. 8, and the distance calculation unit 3 stores it in the table as already described in step S35. A matching process with the target pattern of the classification target data is performed from the rule pattern.
Then, the distance calculation unit 3 detects whether or not a rule pattern that matches the target pattern is found in the collation result. If the distance calculation unit 3 detects that a rule pattern that matches the target pattern is found, the process proceeds to step S47. On the other hand, if it is detected that no matching rule pattern is found, the process proceeds to step S48 (step S46).

合致する規則パターンが検索されたことを検出した場合、距離計算部３は、この分類対象データがその合致した規則に対応する識別情報のクラスタに属すると判定し、分類対象データをこのクラスタに分類し、クラスタデータベース５に対し、分類先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する（ステップＳ４７）。
一方、合致する規則パターンが検索されないことを検出した場合、距離計算部３は、カウント数、すなわち投票数が最も多い識別情報を検出し、この識別情報に対応するクラスタに分類対象データを分類する。
そして、距離計算部３は、クラスタデータベース５に対し、帰属先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する（ステップＳ４８）。When it is detected that a matching rule pattern has been searched, the distance calculation unit 3 determines that the classification target data belongs to a cluster of identification information corresponding to the matched rule, and classifies the classification target data into this cluster. Then, the classified data to be classified is stored in the cluster database 5 in correspondence with the identification information of the classification destination cluster (step S47).
On the other hand, when it is detected that a matching rule pattern is not searched, the distance calculation unit 3 detects identification information having the largest number of counts, that is, the number of votes, and classifies the classification target data into clusters corresponding to the identification information. .
Then, the distance calculation unit 3 stores the classified data to be classified in the cluster database 5 in association with the identification information of the cluster to which it belongs (step S48).

＜第３の実施形態＞
上述した第２の実施形態は、計算した分類対象データの各クラスタとの距離が小さい（類似性が大きい）方から上位ｎ個における規則パターンのテーブルを準備し、このテーブルにある規則パターンに対応するか否かにより、各分類対象データのクラスタリングの処理を行うとして説明したが、以下に説明する第３の実施形態のように、クラスタ毎に特徴量セットを複数設定して、それぞれの特徴量セットに対応したマハラノビス距離を演算し、補正距離を算出して、上位の所定の順位以内の補正距離が多いクラスタを、分類対象データの属するクラスタとしてもよい。
以下、第３の実施形態の構成は、図１に示す第１及び第２の実施形態と同様であり、同一の符号を各構成に付し、各構成において第２の実施形態と異なる動作のみを、図１０を用いて説明する。第３の実施形態においては、学習データから上記規則を設定する処理がなく、直接に図９におけるステップＳ４８を行う。図１０は第３の実施形態におけるクラスタリングの動作例を示すフローチャートである。<Third Embodiment>
In the second embodiment described above, a table of rule patterns in the top n is prepared from the smaller distance (higher similarity) with each cluster of the calculated classification target data, and the rule patterns in this table are supported. Depending on whether or not to perform the clustering processing of each classification target data, as described in the third embodiment described below, by setting a plurality of feature amount sets for each cluster, each feature amount The Mahalanobis distance corresponding to the set is calculated, the correction distance is calculated, and a cluster having a large correction distance within a predetermined upper rank may be a cluster to which the classification target data belongs.
Hereinafter, the configuration of the third embodiment is the same as that of the first and second embodiments shown in FIG. 1, and the same reference numerals are given to the respective components, and only the operations different from those of the second embodiment are performed in the respective configurations. Will be described with reference to FIG. In the third embodiment, there is no processing for setting the rules from the learning data, and step S48 in FIG. 9 is directly performed. FIG. 10 is a flowchart illustrating an example of clustering operation according to the third embodiment.

この図１０に示す他のクラスタリングの処理において、ステップＳ３１〜ステップＳ３４までの処理は、図８に示す処理と同様であり、距離計算部３は、すでに述べたように、ステップＳ３４において、内部記憶部における各クラスタ間との補正距離を小さい順に、並べ替え、すなわち分類対象データとの補正距離の小さいクラスタの識別情報が上位となる順番に並べる（ステップＳ３４）。
次に、距離計算部３は、小さい方（上位）からｎ番目までの各補正距離に対応するクラスタの識別情報を検出し、そのｎ個に含まれる各クラスタ毎の識別情報の数をカウント、すなわち各クラスタに対して投票処理を行う（ステップＳ５５）。
そして、距離計算部３は、投票結果において、最も多いカウント値（投票数）の識別情報を検出し、この識別情報に対応するクラスタを、分類対象データの属するクラスタとし、クラスタデータベース５に対し、帰属先のクラスタの識別情報に対応させ、分類した分類対象データを記憶する（ステップＳ５６）。In the other clustering processing shown in FIG. 10, the processing from step S31 to step S34 is the same as the processing shown in FIG. 8, and the distance calculation unit 3 performs internal storage in step S34 as described above. The correction distances between the clusters in the section are rearranged in ascending order, that is, the identification information of the clusters with the smaller correction distances to the classification target data is arranged in the order of higher rank (step S34).
Next, the distance calculation unit 3 detects the identification information of the cluster corresponding to each correction distance from the smaller (higher) to the nth, and counts the number of identification information for each cluster included in the n pieces, That is, a voting process is performed for each cluster (step S55).
Then, the distance calculation unit 3 detects the identification information of the largest count value (number of votes) in the voting result, sets the cluster corresponding to this identification information as the cluster to which the classification target data belongs, and The classified classification target data is stored in correspondence with the identification information of the cluster to which it belongs (step S56).

また、ユーザが予め足きりのための投票数の閾値を、距離計算部３に識別情報毎に設定し、最も投票数の多い識別情報の投票数がこの閾値に満たない場合、いずれのクラスタにも属さないとする処理を行ってもよい。
例えば、クラスタＡ，Ｂ，Ｃの３つのクラスタに対して、分類対象データを分類する場合、クラスタＡの識別情報に対する投票数が５個であり、クラスタＢの識別情報に対する投票数が３個であり、クラスタＣに対する投票数が２個である場合、最も投票数の多い識別情報はクラスタＡと距離計算部３が検出する。
しかしながら、クラスタＡに対する上記閾値が６個として設定されていると、距離計算部３は、クラスタＡの識別情報に対する投票数が閾値に満たないため、いずれのクラスタにも属さないとの判定を行う。
これにより、特徴量が他のクラスタとわずかな差しかないクラスタに対するクラスタリングにおいて、分類対象データのクラスタに対する分類処理の信頼性を向上させることが可能となる。In addition, when the user sets a threshold value for the number of votes in advance for each piece of identification information in the distance calculation unit 3 and the number of votes of identification information with the largest number of votes is less than this threshold value, You may perform the process which does not belong.
For example, when classifying target data for three clusters A, B, and C, the number of votes for the identification information of cluster A is five, and the number of votes for the identification information of cluster B is three. If the number of votes for cluster C is two, the cluster A and the distance calculation unit 3 detect the identification information with the largest number of votes.
However, if the threshold value for cluster A is set to six, the distance calculation unit 3 determines that it does not belong to any cluster because the number of votes for the identification information of cluster A is less than the threshold value. .
This makes it possible to improve the reliability of classification processing for clusters of classification target data in clustering for clusters whose feature values are not slightly different from other clusters.

＜特徴量の変換方法＞
各特徴量の母集団が正規分布であることを期待してクラスタリングを行うが、特徴量の種類（面積、長さなど）によっては正規分布とならず、母集団が偏った分布を有する場合があり、分類対象データと各クラスタとの間の距離の計算、すなわち分類対象データと各クラスタとの類似性を判定する場合の精度が低下することが考えられる。
そのため、特徴量によっては、母集団の特徴量を所定の方法により変換し、正規分布に近づけて類似性の判定の精度を向上させることを行う必要がある。
この正規分布への変換方法としては、特徴量をlogや平方根（√）、立方根（^３√）などのｎ方根、または階乗、あるいは数値計算により求めた関数を含む演算式のいずれかにより変換する。<Feature conversion method>
Clustering is performed with the expectation that the population of each feature quantity is a normal distribution, but depending on the type of feature quantity (area, length, etc.), the distribution may not be a normal distribution and the population may have a biased distribution. Yes, it is conceivable that the accuracy in calculating the distance between the classification target data and each cluster, that is, determining the similarity between the classification target data and each cluster is lowered.
For this reason, depending on the feature amount, it is necessary to convert the feature amount of the population by a predetermined method so as to be close to a normal distribution and improve the accuracy of similarity determination.
As a method for conversion to the normal distribution, the feature amount is any one of log, square root (√), n-root such as cubic root ( ³ √), factorial, or an arithmetic expression including a function obtained by numerical calculation. Convert.

以下に、各特徴量の変換方法の設定処理について図１１を用いて説明する。図１１は各特徴量の変換方法の設定処理の動作例を示すフローチャートである。なお、この変換方法は、クラスタ毎に、クラスタに含まれる各特徴量単位にて設定する。また、この変換方法の設定は、各クラスタに属する学習データを用いて行う。以下の処理は、特徴量セット作成部１が行うこととして説明するが、この処理に対応した処理部を他に設けてもかまわない。
特徴量セット作成部１は、分類対象のクラスタの識別情報をキーとし、このクラスタに含まれる学習データをクラスタデータベース５から読み出し、各学習データの特徴量を算出（正規化処理）する（ステップＳ６１）。Hereinafter, the setting process of the conversion method of each feature amount will be described with reference to FIG. FIG. 11 is a flowchart illustrating an operation example of setting processing of a conversion method for each feature amount. This conversion method is set for each cluster in units of feature amounts included in the cluster. The conversion method is set using learning data belonging to each cluster. Although the following processing is described as being performed by the feature value set creation unit 1, other processing units corresponding to this processing may be provided.
The feature quantity set creation unit 1 uses the identification information of the cluster to be classified as a key, reads the learning data included in this cluster from the cluster database 5, and calculates (normalizes) the feature quantity of each learning data (step S61). ).

次に、特徴量セット作成部１は、内部に記憶されている特徴量変換を行う演算式のいずれかを用い、読み出した上記各学習データを演算することにより、特徴量の変換を行う（ステップＳ６２）。
全ての学習データの特徴量の変換が終了すると、特徴量セット作成部１は、変換処理にて得られた分布が正規分布に近いか否かを示す評価値を算出する（ステップＳ６３）。Next, the feature value set creation unit 1 performs feature value conversion by calculating each of the read learning data using any of the arithmetic expressions for performing feature value conversion stored therein (step S1). S62).
When the conversion of the feature values of all the learning data is completed, the feature value set creation unit 1 calculates an evaluation value indicating whether the distribution obtained by the conversion process is close to the normal distribution (step S63).

次に、特徴量セット作成部１は、内部に記憶されている、すなわち変換方法として予め設定されている演算式の全てにおいて評価値を算出したか否かの検出を行い、全ての演算式にて特徴量が変換され得られた分布の評価値が算出されていることが検出された場合、処理をステップＳ６５へ進め、一方、全ての演算式による特徴量の算出が終了していないことを検出した場合、次に設定されている演算式の処理を行うため、処理をステップＳ６２へ戻す（ステップＳ６４）。
全ての演算式による特徴量の変換が終了した場合、特徴量セット作成部１は、設定した演算式において得られた分布にて評価値が最も小さな分布、すなわち最も正規分布に近い分布を検出し、検出された分布を作成するために用いた演算式を変換方法として決定し、そのクラスタの特徴量の変換方法として内部に設定する（ステップＳ６５）。
特徴量セット作成部１は、上述した処理を各クラスタの特徴量毎に対して行い、それぞれのクラスタにおける各特徴量に対応して変換方法を設定する。Next, the feature value set creation unit 1 detects whether or not evaluation values have been calculated in all of the arithmetic expressions stored therein, that is, preset as the conversion method. If it is detected that the evaluation value of the distribution obtained by converting the feature quantity is calculated, the process proceeds to step S65, while the calculation of the feature quantity by all the arithmetic expressions is not completed. If detected, the process returns to step S62 in order to perform the processing of the next set equation (step S64).
When the feature value conversion by all the arithmetic expressions is completed, the feature value set creation unit 1 detects the distribution having the smallest evaluation value in the distribution obtained in the set arithmetic expression, that is, the distribution closest to the normal distribution. Then, an arithmetic expression used to create the detected distribution is determined as a conversion method, and set internally as a conversion method of the feature amount of the cluster (step S65).
The feature quantity set creation unit 1 performs the above-described processing for each feature quantity of each cluster, and sets a conversion method corresponding to each feature quantity in each cluster.

次に、上記ステップＳ６３における評価値の計算を、図１２を用いて説明する。図１２は演算式により得られた分布の評価値を求める処理の動作例を説明するフローチャートである。
特徴量セット作成部１は、対象クラスタに属する各学習データの特徴量を、設定されている演算式により変換する（ステップＳ７１）。
全ての学習データの特徴量を変換した後、特徴量セット作成部１は、この変換後の特徴量にて得られた分布（母集団）の平均値μ及び標準偏差σを算出する（ステップＳ７２）。
そして、特徴量セット作成部１は、上記母集団の平均値μと標準偏差σとを用いて（ｘ−μ）／σによりｚ値（１）を算出する（ステップＳ７３）。Next, the calculation of the evaluation value in step S63 will be described with reference to FIG. FIG. 12 is a flowchart for explaining an operation example of processing for obtaining an evaluation value of a distribution obtained by an arithmetic expression.
The feature quantity set creation unit 1 converts the feature quantity of each learning data belonging to the target cluster using a set arithmetic expression (step S71).
After converting the feature values of all the learning data, the feature value set creation unit 1 calculates the average value μ and the standard deviation σ of the distribution (population) obtained from the converted feature values (step S72). ).
Then, the feature value set creation unit 1 calculates the z value (1) by (x−μ) / σ using the average value μ and the standard deviation σ of the population (step S73).

次に、特徴量セット作成部１は、上記母集団における累積確率を算出する（ステップＳ７４）。
算出後、特徴量セット作成部１は、求めた母集団中の累積確率により、標準正規分布の累積分布関数の逆関数の値としてｚ値（２）を算出する（ステップＳ７５）。
そして、特徴量セット作成部１は、特徴量の分布の２つのｚ値、すなわちｚ値（１）及びｚ値（２）の差、すなわち分布における２つのｚ値の誤差を求める（ステップＳ７６）。
ｚ値の誤差を求めると、特徴量セット作成部１は、上記２つｚ値の誤差の和、すなわちその誤差の総和（自乗和）を評価値として算出する（ステップＳ７７）。
上述した２つのｚ値の誤差が小さいほど、分布は正規分布に近く、ｚ値の誤差がなければ正規分布であり、一方、分布が正規分布から外れるほど誤差は大きくなる。Next, the feature value set creation unit 1 calculates the cumulative probability in the population (step S74).
After the calculation, the feature value set creation unit 1 calculates the z value (2) as an inverse function value of the cumulative distribution function of the standard normal distribution based on the calculated cumulative probability in the population (step S75).
Then, the feature quantity set creation unit 1 calculates the difference between two z values of the feature quantity distribution, that is, the z value (1) and the z value (2), that is, the error between the two z values in the distribution (step S76). .
When the error of the z value is obtained, the feature value set creation unit 1 calculates the sum of the errors of the two z values, that is, the total sum (square sum) of the errors as an evaluation value (step S77).
The smaller the error between the two z values described above, the closer the distribution is to the normal distribution. If there is no error in the z value, the distribution is normal. On the other hand, the error increases as the distribution deviates from the normal distribution.

次に、第１〜第３の実施形態におけるクラスタリングの処理を行う前に、分類対象データの特徴量の算出について図１３を用いて説明する。図１３は、分類対象データの特徴量データの算出の動作例を示すフローチャートである。
距離計算部３は、入力される分類対象データから識別対象の特徴量を、各クラスタに対して設定された特徴量セットに対応して抽出し、すでに説明した正規化処理を行う（ステップＳ８１）。
次に、距離計算部３は、分類対象データにおける分類対象のクラスタへの分類に用いられる特徴量を、このクラスタの特徴量に対して設定されている変換方法（演算式）により変換する（ステップＳ８２）。
そして、距離計算部３は、第１〜第３の実施形態に記載されているように、分類対象のクラスタとの距離を算出する（ステップＳ８３）。Next, calculation of the feature amount of the classification target data will be described with reference to FIG. 13 before performing the clustering process in the first to third embodiments. FIG. 13 is a flowchart illustrating an operation example of calculating feature amount data of classification target data.
The distance calculation unit 3 extracts the feature quantity of the identification target from the input classification target data corresponding to the feature quantity set set for each cluster, and performs the normalization process already described (step S81). .
Next, the distance calculation unit 3 converts the feature amount used for classification into the cluster to be classified in the classification target data by a conversion method (calculation formula) set for the feature amount of the cluster (step) S82).
Then, as described in the first to third embodiments, the distance calculation unit 3 calculates a distance from the cluster to be classified (step S83).

次に、距離計算部３は、分類対象のクラスタ全てに対し、各クラスタの特徴量に対応して設定された変換方法により、特徴量を変換し、この変換した特徴量によりクラスタとの距離が計算されたか否かの検出を行い、分類対象の全てのクラスタに対して距離を求めたことが検出された場合、処理をステップＳ８５へ進め、一方、分類対象のクラスタが残っていることを検出した場合、処理をステップＳ８２に戻す（ステップＳ８４）。
そして、第１〜第３の実施形態各々において、距離の計算が終了した時点からの処理を開始する（ステップＳ８５）。
上述した処理により、本実施形態にて用いているマハラノビス距離においては、分類対象データと各クラスタとの間の距離を求める際、特徴量が正規分布であることを期待しているため、母集団の各特徴量の分布が正規分布に近いほど、各クラスタとの間において正確な距離（類似性）を求めることができ各クラスタに対する分類の精度が向上することが期待できる。Next, the distance calculation unit 3 converts the feature amount for all the clusters to be classified by a conversion method set corresponding to the feature amount of each cluster, and the distance from the cluster is determined by the converted feature amount. If it is detected whether the distance has been obtained for all the clusters to be classified, the process proceeds to step S85, while it is detected that the cluster to be classified remains. If so, the process returns to step S82 (step S84).
Then, in each of the first to third embodiments, processing is started from the time point when the distance calculation is completed (step S85).
In the Mahalanobis distance used in the present embodiment by the above-described processing, when the distance between the classification target data and each cluster is obtained, the feature amount is expected to be a normal distribution. It can be expected that the closer the distribution of each feature quantity is to the normal distribution, the more accurate distance (similarity) can be obtained from each cluster, and the classification accuracy for each cluster can be improved.

＜計算例＞
次に、上述した第１，第２及び第３の実施形態のクラスタリングシステムを用いて、図１４に示すサンプルデータによる、従来例との分類の精度を確認した。サンプル数が少ないが、使用している特徴量が少ないにもかかわらず、従来例またはそれ以上の正答率が得られていることが判る。この図１４において、クラスタとして、カテゴリ１，カテゴリ２およびカテゴリ３のそれぞれに学習データを１０個ずつ定義し、各学習データが特徴量ａ，ｂ，ｃ，ｄ，ｅ，ｆ，ｇ，ｈの８つを有している。この例では、図１４に示す各クラスタに属している学習データから、クラスタリングに用いる特徴量セットを決定し、次に、分類対象データとして、同様に学習セットを用いてクラスタリングを行っている。<Calculation example>
Next, using the clustering systems of the first, second, and third embodiments described above, the accuracy of classification with the conventional example based on the sample data shown in FIG. 14 was confirmed. Although the number of samples is small, it can be seen that the accuracy rate of the conventional example or higher is obtained despite the small amount of features used. In FIG. 14, 10 learning data are defined for each of category 1, category 2 and category 3 as a cluster, and each learning data has feature quantities a, b, c, d, e, f, g, h. It has eight. In this example, feature quantity sets used for clustering are determined from the learning data belonging to each cluster shown in FIG. 14, and then clustering is similarly performed using the learning sets as classification target data.

計算結果としては、図１５が従来の計算手法として、特徴量の組合せとして特徴量ａおよびｇを用いて、クラスタ１〜クラスタ３の図１４に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。図１５（ａ）において、Cluster１の列はクラスタ１とのマハラノビス距離であり、Cluster２の列はクラスタ２とのマハラノビス距離であり、Cluster３の列はクラスタ３とのマハラノビス距離を示している。また、カテゴリの列が実際に各学習データが属しているクラスタを示し、判定結果が学習データとマハラノビス距離が最小のクラスタを示している。カテゴリと判定結果との数字が一致しているものが正確に分類された特徴量データを示している。 As a calculation result, FIG. 15 shows a conventional calculation method in which Mahalanobis distance is calculated for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using feature amounts a and g as a combination of feature amounts. The determination result is shown. In FIG. 15A, the column of Cluster 1 is the Mahalanobis distance to cluster 1, the column of Cluster 2 is the Mahalanobis distance to cluster 2, and the column of Cluster 3 represents the Mahalanobis distance to cluster 3. The category column indicates the cluster to which each learning data actually belongs, and the determination result indicates the cluster having the minimum learning data and Mahalanobis distance. The data whose category and determination result have the same number indicates the feature amount data classified correctly.

図１５（ｂ）において、列の番号が学習データが実際に属しているクラスタを示し、行の番号が判定されたクラスタを示している。例えば、マークＲ１の「８」はクラスタ１の１０個のクラスタの内８個がクラスタ１として判定され、マークＲ２の「２」はクラスタ１の１０個のクラスタの内２個がクラスタ３と判定されたことを示している。ｐ0は正解と回答との一致率を示し、ｐ1は両者が偶然一致する確率を示し、κは全体補正判定率であり、以下の式により求められる。このκが高いほど分類の精度が高いことを示している。
κ ＝（ｐ0−ｐ1）／（１−ｐ1）
ｐ0 ＝（ａ＋ｄ）／（ａ＋ｂ＋ｃ＋ｄ）
ｐ1 ＝［（ａ＋ｂ）・（ａ＋ｃ）・（ｂ＋ｄ）・（ｃ＋ｄ）］・（ａ＋ｂ＋ｃ＋ｄ）^２ In FIG. 15B, the column number indicates the cluster to which the learning data actually belongs, and the row number is determined. For example, “8” of the mark R1 is determined as eight clusters out of the ten clusters of the cluster 1, and “2” of the mark R2 is determined as two of the ten clusters of the cluster 1 as the cluster 3. It has been shown. p0 indicates the matching rate between the correct answer and the answer, p1 indicates the probability of coincidence of both, and κ is the overall correction determination rate, which is obtained by the following equation. A higher κ indicates higher classification accuracy.
κ = (p0−p1) / (1−p1)
p0 = (a + d) / (a + b + c + d)
p1 = [(a + b) · (a + c) · (b + d) · (c + d)] · (a + b + c + d) ²

前記式における、ａ，ｂ，ｃ，ｄの関係を、図１６を用いて説明する。
クラスタ１に属するデータがクラスタ１として分類された数がａであり、クラスタ１に属するデータがクラスタ２として分類された数がｂであり、ａ＋ｂがクラスタ１に属するデータ数を示している。また、同様に、クラスタ２に属するデータがクラスタ２として分類された数がｄであり、クラスタ２に属するデータがクラスタ１として分類された数がｃであり、ｃ＋ｄがクラスタ２に属するデータ数を示している。ａ＋ｃは全データａ＋ｂ＋ｃ＋ｄの内でクラスタ１に分類された数であり、ｂ＋ｄは全データａ＋ｂ＋ｃ＋ｄの内でクラスタｂに分類された数である。The relationship between a, b, c, and d in the above equation will be described with reference to FIG.
The number of data belonging to cluster 1 classified as cluster 1 is a, the number of data belonging to cluster 1 classified as cluster 2 is b, and a + b indicates the number of data belonging to cluster 1. Similarly, the number of data belonging to cluster 2 classified as cluster 2 is d, the number of data belonging to cluster 2 classified as cluster 1 is c, and c + d is the number of data belonging to cluster 2 Show. a + c is a number classified into cluster 1 in all data a + b + c + d, and b + d is a number classified into cluster b in all data a + b + c + d.

次に、図１７が第１の実施形態の計算手法を用い、クラスタ１〜クラスタ３の図１４に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。この図１７（ａ）および（ｂ）の見方については、図１５と同様であるためその説明を省略する。正解率ｐ0，偶然一致する確立ｐ１，全体補正判定率κは図１５の従来の計算手法と同等であることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に最大の判別基準値λを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ１に対応した特徴量セットとしては特徴量ａおよびｈの組み合わせを用い、クラスタ２に対応した特徴量セットとしては特徴量ａ，ｄの組み合わせを用い、クラスタ３に対応した特徴量セットとしては特徴量ａ，ｇの組み合わせを用いた。 Next, FIG. 17 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the first embodiment. 17 (a) and 17 (b) are the same as those shown in FIG. It can be seen that the correct answer rate p0, the chance coincidence p1, and the overall correction determination rate κ are equivalent to the conventional calculation method of FIG. Here, a feature value set corresponding to each cluster was calculated using a method of selecting a combination having the maximum discrimination reference value λ for each cluster from the above-described overall combinations. A feature quantity set corresponding to cluster 1 is a combination of feature quantities a and h, a feature quantity set corresponding to cluster 2 is a combination of feature quantities a and d, and a feature quantity set corresponding to cluster 3 is A combination of feature quantities a and g was used.

次に、図１８が第２の実施形態の計算手法を用い、クラスタ１〜クラスタ３の図１４に示す各学習データに対して、マハラノビス距離を演算して、判定結果を示している。この図１８（ａ）および（ｂ）の見方については、図１５と同様であるためその説明を省略する。正解率ｐ0が０．８３３３であり、偶然一致する確立ｐ１が０．３３３３であり，全体補正判定率κが０．７５であり、図１５の従来の計算手法と比較すると分類精度が向上していることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に上位３番目までの判別基準値λを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ１に対応した特徴量セットとしては特徴量ａ・ｈ，ａ・ｇ，ｄ・ｅの３つの組み合わせを用い、クラスタ２に対応した特徴量セットとしては特徴量ａ・ｆ，ａ・ｄ，ａ・ｂの３つの組み合わせを用い、クラスタ３に対応した特徴量セットとしては特徴量ｅ・ｇ，ａ・ｃ，ａ・ｇの３つの組み合わせを用いた。
また、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ないものから３番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象データが属するクラスタとした。Next, FIG. 18 shows the determination result by calculating the Mahalanobis distance for each learning data shown in FIG. 14 of cluster 1 to cluster 3 using the calculation method of the second embodiment. 18 (a) and 18 (b) are the same as those shown in FIG. The correct answer rate p0 is 0.8333, the chance coincidence p1 is 0.3333, the overall correction determination rate κ is 0.75, and the classification accuracy is improved as compared with the conventional calculation method of FIG. I know that. Here, a feature value set corresponding to each cluster was calculated using a method of selecting combinations having the third highest discriminant reference value λ for each cluster from the above-described overall combinations. As a feature quantity set corresponding to cluster 1, three combinations of feature quantities a, h, a, g, and d and e are used. As a feature quantity set corresponding to cluster 2, feature quantities a, f, a, d, and Three combinations of a · b were used, and three combinations of feature amounts e · g, a · c, and a · g were used as the feature amount set corresponding to cluster 3.
In addition, for voting, the number of clusters entering the third from the smallest Mahalanobis distance is counted, and the number of clusters entering the third from the smallest is calculated, and the largest number of clusters is the cluster to which the classification target data belongs.

次に、図１９が第２の実施形態の計算手法を用い、クラスタ１〜クラスタ３の図１４に示す各学習データに対して、マハラノビス距離を演算し、さらに計算結果のマハラノビス距離に対して補正係数（λ）^−１／２を乗算した後、距離の順位付けを行い、判定結果を示している。この図１９（ａ）および（ｂ）の見方については、図１５と同様であるためその説明を省略する。正解率ｐ0が０．８３３３であり、偶然一致する確立ｐ１が０．３３３３であり，全体補正判定率κが０．７５であり、図１５の従来の計算手法と比較すると分類精度が向上していることが判る。ここで、上述した全体の組み合わせのなかから、各クラスタ毎に上位３番目までの判別基準値λを有する組み合わせを選択する方法を用いて、各クラスタに対応する特徴量セットを算出した。クラスタ１に対応した特徴量セットとしては特徴量ａ・ｈ，ａ・ｇ，ｄ・ｅの３つの組み合わせを用い、クラスタ２に対応した特徴量セットとしては特徴量ａ・ｆ，ａ・ｄ，ａ・ｂの３つの組み合わせを用い、クラスタ３に対応した特徴量セットとしては特徴量ｅ・ｇ，ａ・ｃ，ａ・ｇの３つの組み合わせを用いた。
また、投票の判定としては、マハラノビス距離の少ないものから順番に列べ、少ないものから３番目に入るクラスタの数を計算して、最も多い数のクラスタをその分類対象データが属するクラスタとした。Next, FIG. 19 uses the calculation method of the second embodiment, calculates the Mahalanobis distance for each learning data shown in FIG. 14 for cluster 1 to cluster 3, and further corrects the calculation result for the Mahalanobis distance. After multiplying by the coefficient (λ) ^−1/2 , the distances are ranked and the determination results are shown. 19 (a) and 19 (b) are the same as those shown in FIG. The correct answer rate p0 is 0.8333, the chance coincidence p1 is 0.3333, the overall correction determination rate κ is 0.75, and the classification accuracy is improved as compared with the conventional calculation method of FIG. I know that. Here, a feature value set corresponding to each cluster was calculated using a method of selecting combinations having the third highest discriminant reference value λ for each cluster from the above-described overall combinations. As a feature quantity set corresponding to cluster 1, three combinations of feature quantities a, h, a, g, and d and e are used. As a feature quantity set corresponding to cluster 2, feature quantities a, f, a, d, and Three combinations of a · b were used, and three combinations of feature amounts e · g, a · c, and a · g were used as the feature amount set corresponding to cluster 3.
In addition, for voting, the number of clusters entering the third from the smallest Mahalanobis distance is counted, and the number of clusters entering the third from the smallest is calculated, and the largest number of clusters is the cluster to which the classification target data belongs.

上述した図１５，１７，１８，１９に示した各分類結果から、本実施形態が従来例に比して、高速かつ高精度のクラスタリング処理が行われていることが判り、本実施形態の従来例に対する優位性が確認できた。 From the respective classification results shown in FIGS. 15, 17, 18, and 19 described above, it can be seen that the clustering process of this embodiment is faster and more accurate than the conventional example. The superiority to the example was confirmed.

＜本発明の応用例＞
Ａ．検査装置
図２０に示すように被検査物、例えばガラス基板表面のキズの種類を分類する検査装置（欠陥検出装置）を説明する。図２１は特徴量セットの選択の動作例を説明するフローチャートであり、図２２はクラスタリング処理における動作例を説明するフローチャートである。
まず、特徴量セットの選択の動作について説明する。図５のフローチャートにおけるステップＳ１における学習データの収集が、図２１のフローチャートのステップＳ１０１からステップＳ１０５に対応している。
図２１のステップＳ２からステップＳ４は図５のフローチャートと同様であるため、説明を省略する。<Application example of the present invention>
A. Inspection Device As shown in FIG. 20, an inspection device (defect detection device) that classifies the type of scratches on the surface of an inspection object, for example, a glass substrate will be described. FIG. 21 is a flowchart for explaining an operation example for selecting a feature amount set, and FIG. 22 is a flowchart for explaining an operation example in the clustering process.
First, the feature quantity set selection operation will be described. The collection of learning data in step S1 in the flowchart of FIG. 5 corresponds to steps S101 to S105 in the flowchart of FIG.
Steps S2 to S4 in FIG. 21 are the same as those in the flowchart in FIG.

オペレータの操作により、キズの種類を分類したいクラスタにそれぞれ対応する学習データ用のサンプル収集する（ステップＳ１０１）。
画像取得部１０１が学習データとして収集したキズの形状を照明装置１０２にて照射し、キズの部分の画像データを撮像装置１０３により取得する（ステップＳ１０２）。
そして、画像取得部１０１が取得した画像データから、各学習データのキズの特徴量を算出する（ステップＳ１０３）。
得られた学習データの特徴量を目視で得られた分類先にそれぞれ振り分け、各クラスタにおける学習データの特定を行う（ステップＳ１０４）。
そして、各クラスタの学習データが所定数（予め設定したサンプル数）、例えば、３００個ずつ程度になるまで、ステップＳ１０１からステップＳ１０２までの処理を繰り返し、所定数となると、すでに図５説明したステップＳ２以降の処理をクラスタリング部１０５が行う。ここで、クラスタリング部１０５は、第１または第２の実施形態におけるクラスタリングシステムである。By the operation of the operator, a sample for learning data corresponding to each cluster for which the type of scratch is to be classified is collected (step S101).
The illumination device 102 irradiates the scratch shape collected by the image acquisition unit 101 as learning data, and the image device 103 acquires image data of the scratch portion (step S102).
Then, the flaw feature amount of each learning data is calculated from the image data acquired by the image acquisition unit 101 (step S103).
The feature amounts of the obtained learning data are assigned to the classification destinations obtained visually, and the learning data in each cluster is specified (step S104).
Then, the processing from step S101 to step S102 is repeated until the learning data of each cluster reaches a predetermined number (preset number of samples), for example, about 300 pieces. The clustering unit 105 performs the processing after S2. Here, the clustering unit 105 is the clustering system in the first or second embodiment.

次に、図２２を参照して、図４の検査装置におけるクラスタリングの処理を説明する。ここで、図２２のステップＳ３１からステップＳ３４、Ｓ５５及びＳ５６は図１０のフローチャートと同様であるため、説明を省略する。
図２０の検査装置において、検査が開始されると、被検査物１００であるガラス基板に対し、照明装置１０２が照明を行い、撮像装置１０３がガラス基板表面を撮影してその撮像画像を画像取得部１０１へ出力する。これにより、欠陥候補検出部１０４は、画像取得部１０１から入力される撮像画像において平面形状と異なる部分を検出すると、それを分類すべき欠陥候補とする（ステップＳ２０１）。Next, clustering processing in the inspection apparatus of FIG. 4 will be described with reference to FIG. Here, steps S31 to S34, S55, and S56 of FIG. 22 are the same as those in the flowchart of FIG.
In the inspection apparatus of FIG. 20, when inspection is started, the illumination device 102 illuminates the glass substrate that is the inspection object 100, and the imaging device 103 captures the surface of the glass substrate and acquires the captured image. Output to the unit 101. Accordingly, when the defect candidate detection unit 104 detects a portion different from the planar shape in the captured image input from the image acquisition unit 101, the defect candidate detection unit 104 sets it as a defect candidate to be classified (step S201).

次に、欠陥候補検出部１０４は、その欠陥候補の部分の画像データを分類対象データとして、撮像画像から切り出す。
そして、欠陥候補検出部１０４は、分類対象データの画像データから特徴量を算出し、クラスタリング部１０５に対して、抽出した特徴量の集合からなる分類対象データを出力する（ステップＳ２０２）。
後のクラスタリングの処理については、図１０のステップですでに説明してあるため、省略する。上述したように、本発明の検査装置は、ガラス基板上に付いた傷を、キズの種類毎に、高い精度にて分類することができる。Next, the defect candidate detection unit 104 cuts out the image data of the defect candidate portion from the captured image as classification target data.
Then, the defect candidate detection unit 104 calculates the feature amount from the image data of the classification target data, and outputs the classification target data including the extracted feature amount set to the clustering unit 105 (step S202).
Since the subsequent clustering process has already been described in the step of FIG. As described above, the inspection apparatus of the present invention can classify scratches on a glass substrate with high accuracy for each type of scratch.

Ｂ．欠陥種類判定装置
図２３に示す欠陥種類判定装置は、クラスタリング部１０５がすでに説明した本発明のクラスタリングシステムに対応している。
画像取得装置２０１は、図２０における画像取得部１０１，照明装置１０２および撮像装置１０３から構成されている。
すでに分類対象データを分類する先の各クラスタの学習データは取得されており、クラスタリング装置１０５のクラスタデータベース５に準備されている。したがって、図５における特徴量セットの選択も終了している。B. Defect Type Determination Device The defect type determination device shown in FIG. 23 corresponds to the clustering system of the present invention described above by the clustering unit 105.
The image acquisition device 201 includes the image acquisition unit 101, the illumination device 102, and the imaging device 103 in FIG.
The learning data of each cluster to which the classification target data is classified has already been acquired and prepared in the cluster database 5 of the clustering apparatus 105. Therefore, the selection of the feature amount set in FIG. 5 is also completed.

各製造装置に取り付けられている画像取得装置２０２から入力される撮像画像から欠陥候補を検出し、その画像データを切り取り、特徴量を抽出してデータ収集装置２０３へ出力する。制御装置２００は、データ収集装置２０３へ入力される分類対象データを、クラスタリング部１０５へ転送させる。そして、すでに説明したように、クラスタリング部１０５は、入力される分類対象データを、キズの種類に対応した各クラスタに対して分類する。 A defect candidate is detected from a captured image input from an image acquisition device 202 attached to each manufacturing apparatus, the image data is cut out, a feature amount is extracted, and the extracted data is output to the data collection device 203. The control device 200 transfers the classification target data input to the data collection device 203 to the clustering unit 105. As described above, the clustering unit 105 classifies the input classification target data for each cluster corresponding to the type of scratch.

Ｃ．製造管理装置
本発明の製造管理装置は、図２４に示すように、制御装置３００，製造装置３０１，３０２，告知部３０３，記録部３０４，不具合装置判定部３０５および欠陥種別判定装置３０６から構成されている。ここで、欠陥種別判定装置３０６は前記Ｂの項で説明した欠陥種別判定装置と同様である。
欠陥種類判定装置３０６は、製造装置３０１および製造装置３０２にそれぞれ設けられている画像取得装置２０１，２０２からの撮像画像を、対応する欠陥候補検出部１０４において画像処理して特徴量を抽出し、分類対象データの分類を行う。C. Manufacturing Management Device As shown in FIG. 24, the manufacturing management device of the present invention includes a control device 300, manufacturing devices 301 and 302, a notification unit 303, a recording unit 304, a defective device determination unit 305, and a defect type determination device 306. ing. Here, the defect type determination device 306 is the same as the defect type determination device described in the section B above.
The defect type determination device 306 performs image processing on the captured images from the image acquisition devices 201 and 202 provided respectively in the manufacturing device 301 and the manufacturing device 302 in the corresponding defect candidate detection unit 104 to extract feature amounts, Classify the classification target data.

次に、不具合装置判定部３０５は、分類されたクラスタの識別情報と、そのクラスタに対応する発生要因との関係を示すテーブルを有し、前記欠陥種類判定装置３０６から入力される分類先のクラスタの識別情報に対応した発生要因を前記テーブルから読み出し、発生要因となっている製造装置を判定する。すなわち、不具合装置判定部３０５は、クラスタの識別情報に対応して、製品の製造プロセスにおける欠陥の発生要因を検出する。
そして、不具合装置判定部３０５は、告知部３０３からオペレータに通知するとともに、記録部３０４に、判定された日時に対応して、欠陥の分類されたクラスタの識別番号と、発生要因と、その製造装置の識別情報とを履歴として記憶させる。また、制御装置３００は、不具合装置判定部３０５の判定した製造装置の停止、または制御パラメータの制御を行う。Next, the defective device determination unit 305 has a table indicating the relationship between the identification information of the classified cluster and the generation factor corresponding to the cluster, and the classification destination cluster input from the defect type determination device 306 The generation factor corresponding to the identification information is read from the table, and the manufacturing apparatus that is the generation factor is determined. That is, the defective device determination unit 305 detects the cause of the defect in the product manufacturing process in accordance with the cluster identification information.
Then, the defective device determination unit 305 notifies the operator from the notification unit 303, and also notifies the recording unit 304 of the identification number of the cluster into which the defect is classified, the generation factor, and the manufacturing process corresponding to the determined date and time. The device identification information is stored as a history. In addition, the control device 300 stops the manufacturing device determined by the defective device determination unit 305 or controls the control parameter.

Ｄ．製造管理装置
本発明の他の製造管理装置は、図２５に示すように、制御装置３００，製造装置３０１，３０２，告知部３０３，記録部３０４およびクラスタリング部１０５から構成されている。ここで、クラスタリング部１０５は前記Ａ，Ｂの項で説明した構成と同様である。
クラスタリング部１０５においては、上述したＡ〜Ｃの場合と異なり、分類対象データの特徴データが工業製品、例えばガラス基板の製造過程における製造条件（材料の分量、処理温度、圧力、処理速度など）からなる特徴量により、製造プロセスの各工程の製造状態別に分類する。前記特徴量は、各製造装置３０１や３０２に設けられているセンサの検出する工程情報としてクラスタリング部１０５に特徴量として入力される。D. Manufacturing Management Device As shown in FIG. 25, another manufacturing management device of the present invention includes a control device 300, manufacturing devices 301 and 302, a notification unit 303, a recording unit 304, and a clustering unit 105. Here, the clustering unit 105 has the same configuration as that described in the above sections A and B.
In the clustering unit 105, unlike the cases A to C described above, the characteristic data of the classification target data is based on the manufacturing conditions (amount of materials, processing temperature, pressure, processing speed, etc.) in the manufacturing process of industrial products such as glass substrates. Are classified according to the manufacturing state of each step of the manufacturing process. The feature amount is input as a feature amount to the clustering unit 105 as process information detected by a sensor provided in each of the manufacturing apparatuses 301 and 302.

すなわち、クラスタリング部１０５は、前記分類対象データの特徴量により、各製造装置における各工程におけるガラス製造プロセスの製造状態を、「正常な状態」，「欠陥が発生しやすく調整が必要な状態」，「危険で調整が必要な状態」などのクラスタに分類する。そして、クラスタリング部１０５は、前記分類結果を告知部３０３によりオペレータに通知するとともに、分類結果のクラスタの識別情報を制御装置３００へ出力し、また、記録部３０４に、判定された日時に対応して、前記各工程の製造状態の分類されたクラスタの識別番号と、最も問題となる特徴量である製造条件と、その製造装置の識別情報とを履歴として記憶させる。
制御装置３００は、クラスタの識別情報と製造条件を正常に戻す調整項目およびそのデータとの対応を示すテーブルを有しており、クラスタリング部１０５から入力されるクラスタの識別情報に対応した、製造条件を正常に戻す調整項目およびそのデータを読み出し、対応する製造装置を読み出したデータにより制御する。That is, the clustering unit 105 determines the manufacturing state of the glass manufacturing process in each step of each manufacturing apparatus based on the feature amount of the classification target data as “normal state”, “defect is likely to occur, and adjustment is necessary”, Classify into clusters such as “dangerous and needing adjustment”. Then, the clustering unit 105 notifies the operator of the classification result by the notification unit 303, outputs the cluster identification information of the classification result to the control device 300, and corresponds to the determined date and time to the recording unit 304. Thus, the identification number of the classified cluster of the manufacturing state of each process, the manufacturing condition which is the most problematic feature amount, and the identification information of the manufacturing apparatus are stored as a history.
The control device 300 has a table indicating the correspondence between the cluster identification information and the adjustment items for returning the manufacturing conditions to normal and the data thereof, and the manufacturing conditions corresponding to the cluster identification information input from the clustering unit 105. The adjustment items and their data are returned to normal, and the corresponding manufacturing apparatus is controlled by the read data.

なお、図１におけるクラスタリングシステムの機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより分類対象データのクラスタリングの処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 The program for realizing the functions of the clustering system in FIG. 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into the computer system and executed, whereby the classification target data is recorded. Clustering processing may be performed. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer system” includes a WWW system having a homepage providing environment (or display environment). The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” refers to a volatile memory (RAM) in a computer system that becomes a server or a client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In addition, those holding programs for a certain period of time are also included.

また、前記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、前記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, what is called a difference file (difference program) may be sufficient.

本発明は、ガラス物品等の欠点検出などのように多種類の特徴量を有する情報を高精度で分類し判別する分野に応用でき、さらに製造状態検出装置や製品製造管理装置にも利用できる。

なお、２００６年７月６日に出願された日本特許出願２００６−１８６６２８の明細書、特許請求の範囲、図面及び要約書の全内容をここに引用し、本発明の明細書の開示として、取り入れるものである。The present invention can be applied to the field of classifying and discriminating information having various types of feature quantities with high accuracy, such as detection of defects of glass articles and the like, and can also be used for manufacturing state detection devices and product manufacturing management devices.

The entire contents of the specification, claims, drawings, and abstract of Japanese Patent Application 2006-186628 filed on July 6, 2006 are incorporated herein as the disclosure of the specification of the present invention. Is.

Claims

学習データの母集団により形成されたクラスタ各々に、入力データを、該入力データが有する特徴量により分類するクラスタリングシステムにおいて、
前記クラスタ各々に対応して、分類に用いる特徴量の組合せである特徴量セットが記憶されている特徴量セット記憶部と、
入力データから予め設定されている特徴量を抽出する特徴量抽出部と、
各クラスタに対応した特徴量セット毎に、該特徴量セットに含まれる特徴量に基づいて、各クラスタの母集団の中心と前記入力データとの距離を、各々セット距離として計算して出力する距離計算部と、
前記各セット距離を小さい順に配列する順位抽出部と
を有し、
前記特徴量セットが各クラスタ毎に複数設定され、
前記特徴量セット毎に得られた前記セット距離において、該セット距離の順位に基づいて設定された入力データの各クラスタへの分類基準を示す規則パターンにより、前記入力データがいずれのクラスタに属するかを検出するクラスタ分類部を
さらに有することを特徴とするクラスタリングシステム。In a clustering system that classifies input data into each cluster formed by a population of learning data according to the feature amount of the input data,
Corresponding to each of the clusters, a feature value set storage unit in which a feature value set that is a combination of feature values used for classification is stored;
A feature amount extraction unit for extracting a preset feature amount from input data;
For each feature value set corresponding to each cluster, based on the feature values included in the feature value set, the distance between the center of the population of each cluster and the input data is calculated and output as a set distance. A calculation unit;
Possess a rank extracting unit for arranging the each set distance in ascending order,
A plurality of the feature amount sets are set for each cluster,
In the set distance obtained for each feature value set, which cluster the input data belongs to is determined by a rule pattern indicating a classification criterion for each cluster of the input data set based on the rank of the set distance. The cluster classification part that detects
A clustering system, further comprising:

前記クラスタ分類部が、前記セット距離の順位により、前記入力データがいずれのクラスタに属するかを検出し、該順位が上位となったセット距離が多いクラスタを、前記入力データの属するクラスタとして検出する請求項１に記載のクラスタリングシステム。The cluster classification unit detects which cluster the input data belongs to according to the rank of the set distance, and detects a cluster having a large set distance with the rank as a cluster to which the input data belongs. The clustering system according to claim 1 .

前記クラスタ分類部が、順位が上位となった数に対する閾値を有しており、上位となったクラスタが該閾値以上であれば入力データの属するクラスタとして検出する請求項２に記載のクラスタリングシステム。The clustering system according to claim 2 , wherein the cluster classification unit has a threshold for the number having a higher rank, and if the higher rank cluster is equal to or greater than the threshold, the cluster classification unit detects a cluster to which input data belongs.

前記距離計算部が、前記セット距離に対して特徴量セット対応して設定されている補正係数を乗算し、各特徴量セット間におけるセット距離を標準化することを特徴とする請求項１から請求項３のいずれか一項に記載のクラスタリングシステム。The distance calculation unit multiplies the set distance by a correction coefficient set corresponding to the feature amount set to standardize a set distance between each feature amount set. 4. The clustering system according to any one of 3 .

各クラスタ毎の特徴量セットを作成する特徴量セット作成部をさらに有し、
前記特徴量セット作成部が、各特徴量の複数の組合せ毎に、各クラスタの母集団の学習データの平均値を原点とし、この原点と他のクラスタの母集団の各学習データとの距離の平均値を求め、最も大きな平均値となった特徴量の組合せを、各クラスタの他のクラスタとの識別に用いる特徴量セットとして選択する請求項１から請求項４のいずれか一項に記載のクラスタリングシステム。It further has a feature quantity set creation unit that creates a feature quantity set for each cluster,
The feature value set creation unit sets the average value of the learning data of the population of each cluster as the origin for each of a plurality of combinations of the feature values, and sets the distance between the origin and each learning data of the population of other clusters. an average value, most combinations of large average value and since feature amount, according to any one of claims 1 to 4 to select as features set used to identify the other clusters each cluster Clustering system.

前記請求項１から請求項５のいずれか一項に記載のクラスタリングシステムが設けられ、
前記入力データが製品の欠陥の画像データであり、欠陥を示す特徴量により、画像データにおける欠陥を、欠陥の種類別に分類する欠陥種類判定装置。A clustering system according to any one of claims 1 to 5 is provided,
A defect type determination apparatus that classifies defects in image data according to defect types based on feature quantities indicating defects, wherein the input data is product defect image data.

前記製品がガラス物品であり、該ガラス物品の欠陥を、欠陥の種類別に分類する請求項６に記載の欠陥種類判定装置。The defect type determination apparatus according to claim 6 , wherein the product is a glass article, and defects of the glass article are classified according to the type of defect.

請求項６または請求項７に記載の欠陥種類判定装置が設けられた、製品の欠陥の種別を検出する欠陥検出装置。A defect detection apparatus for detecting a defect type of a product, wherein the defect type determination apparatus according to claim 6 or 7 is provided.

請求項６または請求項７に記載の欠陥種類判定装置が設けられた、製品の欠陥の種別を行い、該種別に対応した発生要因との対応に基づき、製造プロセスにおける欠陥の発生要因の検出を行う製造状態判定装置。A defect type determination device according to claim 6 or 7 is provided, wherein a defect type of the product is classified, and a defect generation factor in the manufacturing process is detected based on a correspondence with the generation factor corresponding to the type. Manufacturing state determination device to be performed.

前記請求項１から請求項５のいずれか一項に記載のクラスタリングシステムが設けられ、
前記入力データが製品の製造プロセスにおける製造条件を示す特徴量であり、この特徴量を、製造プロセスの各工程の製造状態別に分類する製造状態判定装置。A clustering system according to any one of claims 1 to 5 is provided,
A manufacturing state determination apparatus for classifying the feature amount according to a manufacturing state in each step of the manufacturing process, wherein the input data is a characteristic amount indicating a manufacturing condition in a manufacturing process of the product.

前記製品がガラス物品であり、該ガラス物品の製造プロセスにおける特徴量を、製造プロセスの各工程の製造状態別に分類する請求項１０に記載の製造状態判定装置。The manufacturing state determination apparatus according to claim 10 , wherein the product is a glass article, and the feature amount in the manufacturing process of the glass article is classified according to the manufacturing state of each step of the manufacturing process.

請求項１０または請求項１１に記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別を検出する製造状態検出装置。A manufacturing state detection device for detecting a type of manufacturing state in each step of a product manufacturing process, wherein the manufacturing state determination device according to claim 10 or 11 is provided.

請求項１０または請求項１１に記載の製造状態判定装置が設けられた、製品の製造プロセスの各工程における製造状態の種別の検出を行い、該種別に対応した制御項目に基づき、製造プロセスの工程におけるプロセス制御を行う製品製造管理装置。A manufacturing state type is detected in each step of a product manufacturing process provided with the manufacturing state determination device according to claim 10 or 11, and the manufacturing process step is performed based on a control item corresponding to the type. Product manufacturing management device that performs process control.