JP2004252972A

JP2004252972A - Device, method, and program for determining input attribute condition, data analyzer, data analysis method, and data analyis program

Info

Publication number: JP2004252972A
Application number: JP2004024769A
Authority: JP
Inventors: Hiroaki Takeuchi; 博明竹内
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-01-31
Filing date: 2004-01-30
Publication date: 2004-09-09
Anticipated expiration: 2024-01-30
Also published as: JP4298531B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device, a method, and a program for determining input attribute conditions, a data analyzer, a data analysis method, and a data analysis program, that make data analysis efficient. <P>SOLUTION: The device for determining input attribute conditions and the data analyzer are provided with; a frequency calculation part 6 for calculating proportions of data of which the input attributes are equal to or smaller than each of all numerical values which input attributes take in an analysis data group to be classified into first and second data groups dependently upon values of output attributes, in the first and second data groups as first and second frequencies; a frequency cumulative difference calculation part 7 for calculating the difference value between the first and second frequencies with respect to each of all numerical values which input attributes take; a threshold determination part 130 for taking a numerical value giving a maximum difference value out of numerical values which one input attribute takes, as a threshold for the input attribute and determining at least one threshold corresponding to at least one input attribute; and an input attribute condition determination part 111 for determining an input attribute condition for dividing the analysis data group into two so that the first and second data groups are collected respectively, on the basis of the threshold. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は、分析対象である出力属性（目的属性）、例えば製造工程で製造される製品の特性等と、出力属性に影響を与える属性である入力属性（説明属性）、例えば製造プロセス条件等とで構成されるデータに対して、出力属性の値がまとまるような、入力属性条件を決定する入力属性条件決定装置および入力属性条件決定方法並びに入力属性条件決定プログラム、および、上記の入力属性条件決定装置を用いて、入力属性と出力属性との因果関係を分析するデータ分析装置およびデータ分析方法並びにデータ分析プログラムに関する。 The present invention relates to an output attribute (object attribute) to be analyzed, for example, characteristics of a product manufactured in a manufacturing process, and an input attribute (description attribute) that affects the output attribute, for example, manufacturing process conditions. Attribute determining apparatus, input attribute condition determining method, input attribute condition determining program, input attribute condition determining program, and input attribute condition determining program for determining input attribute conditions such that output attribute values are collected for data composed of The present invention relates to a data analysis device, a data analysis method, and a data analysis program for analyzing a causal relationship between an input attribute and an output attribute using the device.

出力属性と入力属性との因果関係を分析する有効な手法としては、決定木手法が知られている（特許文献１参照）。この手法では、入力属性の値で順次切り分けた葉の部分で、出力属性の値がうまくまとまるような木構造を作成する。 As an effective method for analyzing a causal relationship between an output attribute and an input attribute, a decision tree method is known (see Patent Document 1). In this method, a tree structure is created such that the values of the output attributes are well organized in the leaf portions sequentially cut by the values of the input attributes.

図１０は、特許文献１の従来技術の項（特許文献１の段落［０００２］〜［０００５］および図２２参照）に記載されている決定木の１例であり、表１（本明細書の［発明を実施するための最良の形態］の項参照）のデータ群を分析対象としている。表１のデータ群は、ｘ１，ｘ２，ｘ３，ｘ４の４つの入力属性の値と、これら入力属性に対する出力属性ｙの値とを組とするデータを１２個集めた集合である。この手法で作成される決定木（以下、「従来の決定木−１」と呼ぶ事にする）では、図１０に示すように、出力属性ｙの値Ｘ，Ｙ，Ｚが入力属性ｘ２，ｘ３，ｘ１の各値によって、うまく切り分けられている。 FIG. 10 is an example of a decision tree described in the section of the prior art in Patent Document 1 (see paragraphs [0002] to [0005] of Patent Document 1 and FIG. 22). The data group described in [Best Mode for Carrying Out the Invention] is an analysis target. The data group in Table 1 is a set of 12 data sets each of which is a set of four input attribute values x1, x2, x3, and x4 and an output attribute y corresponding to these input attributes. In a decision tree created by this method (hereinafter, referred to as “conventional decision tree-1”), as shown in FIG. 10, the values X, Y, and Z of the output attribute y are the input attributes x2, x3 , X1.

しかし、図１０の従来の決定木−１の作成においては、データを分類する際に、入力属性がとる値の数（属性値の種類数）だけのデータ集合に分類される。例えば、入力属性ｘ２は４種類の値（ａ，ｂ，ｃ，ｄ）をとるので、入力属性ｘ２による分類により４つの集合に分類される。そのため、入力属性がとる値の数が増えると、決定木が煩雑になる可能性がある。 However, in the creation of the conventional decision tree-1 of FIG. 10, when classifying data, the data is classified into a data set of the number of values (the number of types of attribute values) taken by the input attribute. For example, since the input attribute x2 takes four values (a, b, c, d), it is classified into four sets by the classification based on the input attribute x2. Therefore, when the number of values that the input attribute takes increases, the decision tree may become complicated.

この課題の解決策として、特許文献１では、各属性において、まとめられる属性値を１つのラベルで表現し、ラベルによりデータ分類する決定木を提案している。 As a solution to this problem, Patent Literature 1 proposes a decision tree in which, for each attribute, an attribute value to be put together is represented by one label and data is classified by the label.

図１１は、特許文献１の実施例（特許文献１の段落［００１０］〜［００２８］および図１３参照）に記載のラベル階層である。この実施例では、例えば、４種の属性値（１，２，３，４）からなるｘ３属性について、ｘ３属性値「１」「２」に「２．５以下」というラベルをつけおよび、ｘ３属性値「３」「４」に「２．５以上」というラベルをつけて階層構造を表現している。 FIG. 11 is a label hierarchy described in the example of Patent Document 1 (see paragraphs [0010] to [0028] of Patent Document 1 and FIG. 13). In this embodiment, for example, for the x3 attribute composed of four types of attribute values (1, 2, 3, 4), the x3 attribute values “1” and “2” are labeled “2.5 or less”, and The attribute values “3” and “4” are labeled “2.5 or more” to express the hierarchical structure.

図１２は、特許文献１に記載された図１１のラベル階層構造を用いて作成される決定木の一例であり、表１のデータ群を分析対象としている（特許文献１の段落［００１０］〜［００２８］および図１４参照）。図１２の決定木では、出力属性ｙの値Ｘ，Ｙ，Ｚが入力属性ｘ２，ｘ３，ｘ１の各値によって、うまく切り分けられている。また、図１２に示す如く、図１１のラベル階層構造を用いて作成される決定木（以下、この決定木を従来の決定木−２と呼ぶ事にする）は、図１０に示す従来の決定木−１に比べて、非常に簡潔である。 FIG. 12 is an example of a decision tree created by using the label hierarchy structure of FIG. 11 described in Patent Document 1, and analyzes the data group in Table 1 (paragraphs [0010] to [0010] of Patent Document 1). [0028] and FIG. 14). In the decision tree of FIG. 12, the values X, Y, and Z of the output attribute y are properly separated according to the values of the input attributes x2, x3, and x1. Also, as shown in FIG. 12, a decision tree created by using the label hierarchical structure of FIG. 11 (hereinafter, this decision tree is referred to as a conventional decision tree-2) is a conventional decision tree shown in FIG. Very simple compared to Tree-1.

ここで、上記のような木構造を形成する際には、各々のノード（図１２のｔ１，ｔ２，ｔ３，…）において、最適な分岐条件を決定するための評価指標が必要となる。このような評価指標としては、Ｇｉｎｉインデックスや最小２乗基準などが知られているが、表１のデータのように、出力属性（目的属性）ｙが質的変数の場合には、Ｇｉｎｉインデックスが用いられることが多い。以下、このＧｉｎｉインデックスを用いた、最適な分岐条件の決定方法である、Ｇｉｎｉインデックス法について説明する。 Here, when the above-described tree structure is formed, an evaluation index for determining an optimal branch condition is required at each node (t1, t2, t3,... In FIG. 12). As such an evaluation index, a Gini index and a least-squares criterion are known. When the output attribute (objective attribute) y is a qualitative variable as shown in the data of Table 1, the Gini index is Often used. Hereinafter, the Gini index method, which is a method of determining an optimum branch condition using the Gini index, will be described.

Ｇｉｎｉインデックスは非特許文献１のｐ４４〜ｐ４７に記載されているように、下式で表される。 The Gini index is represented by the following equation, as described in p44 to p47 of Non-Patent Document 1.

ｉ（ｔ）＝１−Σ｛ｐ（ｊ｜ｔ）｝^２（１）
ここで、ｐ（ｊ｜ｔ）は、ノードｔにおいて、出力属性ｙがｙ＝ｊとなる確率である。 i (t) = 1− {p (j | t)} ² (1)
Here, p (j | t) is the probability that the output attribute y becomes y = j at the node t.

Ｇｉｎｉインデックスｉ（ｔ）が小さいことは、ノードｔにおいて、出力属性ｙの値がうまくまとまっていることを意味している。 The small Gini index i (t) means that the value of the output attribute y at the node t is well organized.

一例として、図１２のルートノードｔ１における、Ｇｉｎｉインデックスは、
ｉ（ｔ１）＝１−Σ｛ｐ（ｊ｜ｔ１）｝^２
＝１−｛（Ｘ｜ｔ１）^２＋（Ｙ｜ｔ１）^２＋（Ｚ｜ｔ１）^２｝
＝１−｛（４／１２）^２＋（４／１２）^２＋（４／１２）^２｝
＝０．６６７（２）
となる。 As an example, the Gini index at the root node t1 in FIG.
i (t1) = 1− {p (j | t1)} ²
= 1 − {(X | t1) ² + (Y | t1) ² + (Z | t1) ² }
= 1 − {(4/12) ² + (4/12) ² + (4/12) ² }
= 0.667 (2)
It becomes.

また、ｘ２＝ａｏｒｂである、ルートノードｔ１の子ノードｔ２におけるＧｉｎｉインデックスは、
ｉ（ｔ２）＝１−Σ｛ｐ（ｊ｜ｔ２）｝^２
＝１−｛（Ｘ｜ｔ２）^２＋（Ｙ｜ｔ２）^２＋（Ｚ｜ｔ２）^２｝
＝１−｛（４／１０）^２＋（２／１０）^２＋（４／１０）^２｝
＝０．６４（３）
また、ｘ２＝ｃｏｒｄである、ルートノードｔ１の子ノードｔ３におけるＧｉｎｉインデックスは、
ｉ（ｔ３）＝１−Σ｛ｐ（ｊ｜ｔ３）｝^２
＝１−｛（Ｘ｜ｔ３）^２＋（Ｙ｜ｔ３）^２＋（Ｚ｜ｔ３）^２｝
＝１−｛（０／２）^２＋（２／２）^２＋（０／２）^２｝
＝０（４）
となる。ここで、ｉ（ｔ３）＝０であることは、ｘ２＝ｃｏｒｄなる条件において、出力属性ｙの値がよくまとまっていることを示している（出力属性ｙの値がＹのみになっている）。 Further, the Gini index of the child node t2 of the root node t1 where x2 = a or b is:
i (t2) = 1− {p (j | t2)} ²
= 1 − {(X | t2) ² + (Y | t2) ² + (Z | t2) ² }
= 1 − {(4/10) ² + (2/10) ² + (4/10) ² }
= 0.64 (3)
Further, the Gini index at the child node t3 of the root node t1 where x2 = cord is:
i (t3) = 1- {p (j | t3)} ²
= 1 − {(X | t3) ² + (Y | t3) ² + (Z | t3) ² }
= 1 − {(0/2) ² + (2/2) ² + (0/2) ² }
= 0 (4)
It becomes. Here, the fact that i (t3) = 0 indicates that the value of the output attribute y is well organized under the condition of x2 = cord (when the value of the output attribute y is only Y, There).

「ルートノードｔ１を子ノードｔ２とｔ３とに分岐させることにより、どの程度、出力属性のまとまりが良くなったか？」は、上記のＧｉｎｉインデックスに基づいて評価することができる。Ｇｉｎｉインデックス法では、この評価指標として、下式で表される改善度△ｉ（ｔ１）が用いられる。 “How much output unity is improved by branching the root node t1 into the child nodes t2 and t3?” Can be evaluated based on the Gini index. In the Gini index method, an improvement degree △ i (t1) represented by the following equation is used as the evaluation index.

△ｉ（ｔ１）
＝ｉ（ｔ１）−｛ｐｔ２・ｉ（ｔ２）＋ｐｔ３・ｉ（ｔ３）｝（５）
ここで、ｐｔ２、ｐｔ３は、ルートノードｔ１（１２データ）を、子ノードｔ２（ｘ２＝ａｏｒｂ；１０データ）と子ノードｔ３（ｘ２＝ｃｏｒｄ；２データ）とに分岐させるときの、分岐割合を示しており、ｐｔ２＝１０／１２、ｐｔ３＝２／１２である。 Δi (t1)
= I (t1)-{pt2 · i (t2) + pt3 · i (t3)} (5)
Here, pt2 and pt3 are used when branching the root node t1 (12 data) into a child node t2 (x2 = a or b; 10 data) and a child node t3 (x2 = cord; 2 data). , Pt2 = 10/12 and pt3 = 2/12.

したがって、図１２の例において、ルートノードｔ１を、子ノードｔ２とｔ３とに分岐させることにより、出力属性のまとまりが改善される程度は、
△ｉ（ｔ１）
＝ｉ（ｔ１）−｛ｐｔ２・ｉ（ｔ２）＋ｐｔ３・ｉ（ｔ３）｝
＝０．６６７−｛（１０／１２）×０．６４＋（２／１２）×０｝
＝０．１３４（６）
となる。 Therefore, in the example of FIG. 12, the degree to which the output attributes are improved by branching the root node t1 into child nodes t2 and t3 is as follows.
Δi (t1)
= I (t1)-{pt2 · i (t2) + pt3 · i (t3)}
= 0.667-{(10/12) × 0.64 + (2/12) × 0}
= 0.134 (6)
It becomes.

なお、特許文献１では、下式（７）を用いて、出力属性のまとまりが改善される程度を評価しているが、基本的な考え方は、Ｇｉｎｉインデックス法の改善度（（５）（６）式）と同じである。 Note that, in Patent Literature 1, the degree of improvement in the unity of the output attributes is evaluated using the following equation (7). The basic idea is that the degree of improvement of the Gini index method ((5) (6) ) Expression).

△ｉ’（ｔ１）＝Σ｛ｐ（ｊ｜ｔ２）｝^２＋Σ｛ｐ（ｊ｜ｔ３）｝^２（７）
上記のＧｉｎｉインデックス（（３）（４）式）、および改善度（（６）式）は、各入力属性が取り得る、全ての分岐条件のパターンについて計算される。そして、これらのうちで、改善度が最大となる条件が、最終的な分岐条件として決定される。図１２における、ルートノードｔ１からの分岐の例では、改善度が△ｉ（ｔ１）が最大となる、ｔ２：「ｘ２＝ａ，ｂ」、ｔ３：「ｘ２＝ｃ，ｄ」なる分岐条件が最終的に選択される。 Δi ′ (t1) = {p (j | t2)} ² + {p (j | t3)} ² (7)
The above Gini index (expressions (3) and (4)) and the degree of improvement (expression (6)) are calculated for all possible branch condition patterns for each input attribute. Then, among these, the condition that maximizes the degree of improvement is determined as the final branch condition. In the example of the branch from the root node t1 in FIG. 12, the branch condition of t2: “x2 = a, b” and t3: “x2 = c, d” in which the improvement degree △ i (t1) is the maximum is shown. Finally selected.

ここで、上記のＧｉｎｉインデックスおよび改善度を計算する際の、分岐条件のパターン数は、入力属性が取り得る値の数によって決まり、例えば、入力属性ｘ２については、とり得る値が、ａ，ｂ，ｃ，ｄの４種のため、
・ｔ２：「ｘ２＝ａ」、ｔ３：「ｘ２＝ｂ，ｃ，ｄ」
・ｔ２：「ｘ２＝ａ，ｂ」、ｔ３：「ｘ２＝ｃ，ｄ」
・ｔ２：「ｘ２＝ａ，ｂ，ｃ」、ｔ３：「ｘ２＝ｄ」
の３パターンとなる。このパターン数は入力属性が取り得る値の数が増えると増加する。 Here, when calculating the Gini index and the degree of improvement, the number of branch condition patterns is determined by the number of possible values of the input attribute. For example, for the input attribute x2, the possible values are a and b. , C, d
T2: “x2 = a”, t3: “x2 = b, c, d”
T2: “x2 = a, b”, t3: “x2 = c, d”
T2: “x2 = a, b, c”, t3: “x2 = d”
3 patterns. The number of patterns increases as the number of possible values of the input attribute increases.

なお、特許文献１では、分岐条件を決定するための計算（（７）式）を、各入力属性が取り得る全ての分岐条件のパターンについてでなく、図１１のラベルによる分類パターンについてのみ行い、計算を簡略化している。
特開平８−３１４７２５号公報（公開日：平成８年(1996)１１月２９日）大滝厚、堀江宥治、Dan Steinberg著、「応用２進木解析法−ＣＡＲＴによる−」日科技連、１９９８年７月６日発行、ｐ４４−ｐ４７ In Patent Literature 1, the calculation (Equation (7)) for determining the branch condition is performed not only on all the branch condition patterns that each input attribute can take, but only on the classification pattern by the label in FIG. Calculations have been simplified.
JP-A-8-314725 (published on: November 29, 1996) Atsushi Otaki, Yoji Horie, and Dan Steinberg, "Applied Binary Tree Analysis Method by CART", Nikkagiren, July 6, 1998, p44-p47.

Ｇｉｎｉインデックス法（Ｇｉｎｉインデックスおよび改善度）による従来の最適分岐条件の決定方法を、デバイス等の製品の製造工程における製品特性不良の要因分析に応用する場合を題材にして、従来のＧｉｎｉインデックス法の課題を説明する。 In the case where the conventional method for determining the optimum branch condition by the Gini index method (Gini index and the degree of improvement) is applied to the factor analysis of product characteristic failure in the manufacturing process of a product such as a device, the conventional Gini index method is used. Explain the task.

いま、表１の入力属性ｘ１，ｘ２，ｘ３，ｘ４が製品製造工程における各種のプロセスデータやインライン検査データ、出力属性ｙが製造された製品の特性データであり、出力属性ｙ＝Ｙが製品特性不良に相当するものとする。そして、プロセス技術者が、製品特性不良ｙ＝Ｙに対し、Ｇｉｎｉインデックス法を用いて、製品特性不良の要因（「どの入力属性がどの値の範囲にあるから製品特性が悪いのか？」）を調査するものとする。なお、このように、製品特性不良の要因を調査する場合には、深い階層の厳密な決定木を形成するよりも、各入力属性に対して、良品と不良品とを切り分ける最適な分岐条件（閾値）を明確にし、これら各入力属性の最適分岐条件のうちで、不良に対する影響度の高い条件を抽出することが要求される場合が多い。 Now, the input attributes x1, x2, x3, and x4 in Table 1 are various process data and in-line inspection data in the product manufacturing process, the output attribute y is the characteristic data of the manufactured product, and the output attribute y = Y is the product characteristic. It shall correspond to a defect. Then, for the product characteristic defect y = Y, the process engineer uses the Gini index method to determine the cause of the product characteristic defect (“Which input attribute is in which value range so that the product characteristic is poor?”). Shall be investigated. As described above, when investigating the cause of the product characteristic failure, rather than forming a strict decision tree of a deep hierarchy, for each input attribute, the optimum branch condition (separate non-defective product and defective product) is determined. In many cases, it is required to clarify the threshold value and to extract a condition having a high degree of influence on a defect from among the optimum branch conditions of these input attributes.

上記の題材で、良品と不良品とを切り分ける（ルートノードから子ノードに分岐させる）ために、各入力属性が取り得る分岐条件パターンに対して計算した、Ｇｉｎｉインデックス法の改善度を、図２０〜図２３に示す。Ｇｉｎｉインデックスおよび改善度の計算を行う分岐条件のパターン数は、入力属性ｘ１，ｘ２，ｘ３，ｘ４が取り得る全ての分岐条件パターンで、合計１２条件（各入力属性について３条件）である。図２０〜図２３より、ｙ＝Ｘ，Ｚなる良品と、ｙ＝Ｙなる不良品とを切り分ける分岐条件として、入力属性ｘ１については、「ｘ１＝Ａ，Ｂと、ｘ１＝Ｃ，Ｄとの分岐」が、入力属性ｘ２については、「ｘ２＝ａ，ｂと、ｘ２＝ｃ，ｄとの分岐」が、入力属性ｘ３については、「ｘ３＝１，２と、ｘ３＝３，４との分岐」が、入力属性ｘ４については、「ｘ４＝１０と、ｘ４＝２０，３０，４０との分岐」が、それぞれ適切であることが分かる。このように、表１のデータ群を分析対象とした場合には、Ｇｉｎｉインデックス法により、製品特性不良の要因となる条件（各入力属性における最適分岐条件）を抽出することが可能である。 In order to separate good products from bad products (branch from a root node to a child node) on the above-mentioned subject, the degree of improvement of the Gini index method calculated for a branch condition pattern that can be taken by each input attribute is shown in FIG. 23 to FIG. The number of patterns of branch conditions for calculating the Gini index and the degree of improvement is a total of 12 conditions (three conditions for each input attribute) for all possible branch condition patterns of the input attributes x1, x2, x3, and x4. From FIG. 20 to FIG. 23, as a branch condition for separating a non-defective product with y = X, Z and a defective product with y = Y, for the input attribute x1, “x1 = A, B and x1 = C, D For the input attribute x2, “branch between x2 = a, b and x2 = c, d” is used. For the input attribute x3, “branch between x3 = 1, 2, x3 = 3, 4” is used. It can be seen that "branch" is appropriate for the input attribute x4 and "branch between x4 = 10 and x4 = 20, 30, 40". As described above, when the data group in Table 1 is an analysis target, it is possible to extract a condition (optimal branch condition in each input attribute) that causes a product characteristic failure by the Gini index method.

しかしながら、実際のデバイス（特に半導体デバイス）のような製品の製造現場では、１工程につき１０〜１００属性程度のプロセスデータやインライン検査データがあり、しかも、その値は有効桁数が多い多値の数値である。例えば、一つの入力属性が取り得る値の数が、数万〜数十万というオーダである。このような場合には、（３）（４）式と同様のＧｉｎｉインデックス、および（６）式と同様の改善度が１属性につき、数万〜数十万回計算され、さらに、このような数万〜数十万回の計算が、入力属性の数だけ行われることになる。このような大がかりの計算には、膨大な時間がかかり、また、場合によってはコンピュータのメモリが不足し、計算が不可能になってしまうことがある。すなわち、従来のＧｉｎｉインデックス法では、計算の負荷が大きいという問題があった。したがって、Ｇｉｎｉインデックス法による最適分岐条件の決定、および、これを用いたデータ分析は効率が悪い。 However, at the manufacturing site of a product such as an actual device (especially a semiconductor device), there is process data or in-line inspection data of about 10 to 100 attributes per process, and the value is a multi-valued data having a large number of significant digits. It is a numerical value. For example, the number of values that one input attribute can take is on the order of tens of thousands to hundreds of thousands. In such a case, the Gini index similar to the expressions (3) and (4) and the degree of improvement similar to the expression (6) are calculated tens of thousands to hundreds of thousands times for one attribute. Tens of thousands to hundreds of thousands of calculations are performed for the number of input attributes. Such a large-scale calculation takes an enormous amount of time, and in some cases, the memory of the computer becomes insufficient, making the calculation impossible. That is, the conventional Gini index method has a problem that the calculation load is large. Therefore, the determination of the optimal branch condition by the Gini index method and the data analysis using the same are inefficient.

また、Ｇｉｎｉインデックス法の他の課題として、不良品の確率ｐ（Ｙ｜ｔ）が良品の確率ｐ（ＸＺ｜ｔ）に比べて極端に小さい場合には、
ｉ（ｔ）＝１−Σ｛ｐ（ｊ｜ｔ）｝^２
＝１−｛ｐ（ＸＺ｜ｔ）^２＋ｐ（Ｙ｜ｔ）^２｝（８）
で表されるＧｉｎｉインデックスに、不良品の確率ｐ（Ｙ｜ｔ）がほとんど反映されず、良品と不良品とを切り分ける条件（各入力属性における最適分岐条件）を抽出する上で、その確度が低下するという問題があった。この問題は、ルートノード（総サンプル）にほとんど不良品が含まれない場合に顕著となるが、特定の不良カテゴリに着目すれば、このようなケースも少なくない。したがって、Ｇｉｎｉインデックス法による最適分岐条件の決定、および、これを用いたデータ分析は、その確度が低い。 As another problem of the Gini index method, when the probability p (Y | t) of a defective product is extremely smaller than the probability p (XZ | t) of a good product,
i (t) = 1− {p (j | t)} ²
= 1− {p (XZ | t) ² + p (Y | t) ² } (8)
The probability p (Y | t) of a defective product is hardly reflected in the Gini index represented by the following expression, and the accuracy of extracting a condition (optimal branch condition in each input attribute) for separating a non-defective product from a defective product is as follows. There was a problem of lowering. This problem is remarkable when the root node (total sample) hardly contains any defective products. However, if attention is paid to a specific defective category, such cases are not rare. Therefore, the determination of the optimum branch condition by the Gini index method and the data analysis using the same have low accuracy.

上記特許文献１の決定木生成手法をデバイス等の製品の製造工程における製品特性不良の要因分析に応用する場合を題材にして、従来技術の他の課題を説明する。 Another problem of the prior art will be described with reference to a case where the decision tree generation method of Patent Document 1 is applied to an analysis of a cause of a product characteristic defect in a manufacturing process of a product such as a device.

いま、表１の入力属性ｘ１，ｘ２，ｘ３，ｘ４が製品製造工程における各種のプロセスデータやインライン検査データ、出力属性ｙが製造された製品の特性データであり、出力属性ｙ＝Ｙが製品特性不良に相当するものとする。そして、プロセス技術者が、製品特性不良ｙ＝Ｙに対し、特許文献１の従来技術に記載された手法で生成された決定木−１（図１０）、または特許文献１に記載された手法で生成された従来の決定木−２（図１２）を用いて、製品特性不良の要因を調査するものとする。 Now, the input attributes x1, x2, x3, and x4 in Table 1 are various process data and in-line inspection data in the product manufacturing process, the output attribute y is the characteristic data of the manufactured product, and the output attribute y = Y is the product characteristic. It shall correspond to a defect. Then, the process technician responds to the product characteristic defect y = Y using the decision tree-1 (FIG. 10) generated by the method described in the related art of Patent Document 1 or the method described in Patent Document 1. Using the generated conventional decision tree-2 (FIG. 12), it is assumed that the cause of the product characteristic defect is investigated.

このとき、特許文献１の従来技術に記載された手法で生成された決定木−１では、注目すべきｙ＝Ｙが樹形の中の複数箇所（図１０の例では４箇所）に分散しているため煩雑であり、「どの入力属性がどの値の範囲にあるから製品特性が悪いのか？」という製品特性不良の要因をプロセス技術者が判断しにくい。図１０の例では、入力属性が４属性だけでかつ各属性値の種類も４つだけであるため、何とか、プロセス技術者が製品特性不良の要因を判断することも可能である。しかしながら、実際のデバイス（特に半導体デバイス）のような製品の製造現場では、１工程につき１０〜１００属性程度のプロセスデータやインライン検査データがあり、しかも、その値は、有効桁数が多い多値の数値で、非常に広い範囲で分布している。例えば、一つの入力属性が取り得る値の数が、数万〜数十万というオーダである。さらに、外乱（入力属性として検出できていない属性）の影響により、各入力属性の値が同じであっても、出力属性の値がばらつく事も多い。これらのような場合に特許文献１の従来技術に記載された手法を用いると、厳密な分析を目指すがあまり、無限数のデータ集合に分類されてしまい、もはや、プロセス技術者が、適正に製品特性不良の要因を特定する事ができなくなる。 At this time, in the decision tree-1 generated by the method described in the prior art of Patent Document 1, noticeable y = Y is distributed to a plurality of locations (four locations in the example of FIG. 10) in the tree. Therefore, it is difficult for a process engineer to determine a factor of a product characteristic failure such as "Which input attribute is in which value range causes poor product characteristic?" In the example of FIG. 10, since there are only four input attributes and only four types of each attribute value, the process engineer can somehow determine the cause of the product characteristic failure. However, at the manufacturing site of a product such as an actual device (especially a semiconductor device), there is process data or in-line inspection data of about 10 to 100 attributes per process, and the value is multi-valued with a large number of significant digits. And is distributed over a very wide range. For example, the number of values that one input attribute can take is on the order of tens of thousands to hundreds of thousands. Furthermore, even if the values of the input attributes are the same, the values of the output attributes often vary due to the influence of disturbance (attributes that cannot be detected as input attributes). In such a case, if the method described in the prior art of Patent Document 1 is used, a strict analysis is aimed at, but it is too often classified into an infinite number of data sets, and the process engineer can no longer properly process the product. The cause of the characteristic failure cannot be specified.

一方、特許文献１に開示された手法により生成される決定木−２（図１２）では、ラベル階層による分類がなされているので、決定木が簡潔である。そのため、プロセス技術者が、ｙ＝Ｙなる製品特性不良の要因を特定しやすい。 On the other hand, in the decision tree-2 (FIG. 12) generated by the method disclosed in Patent Literature 1, since the classification is performed by the label hierarchy, the decision tree is simple. Therefore, it is easy for the process engineer to specify the factor of the product characteristic failure where y = Y.

しかし、この図１２に示す簡潔な決定木−２を作成するには、図１１に示すラベル階層構造を予め定義しておく必要がある。そのため、特許文献１の決定木生成手法は、まとめられる属性値の見当がつかない場合には適用できない。上述したように、実際のデバイスのような製品の製造現場では、１工程につき１０〜１００属性程度の、プロセスデータやインライン検査データがあり、しかも、その値は、有効桁数が多い多値の数値で、非常に広い範囲で分布している。さらに、外乱（入力属性として検出できていない属性）の影響により、各入力属性の値が同じであっても、出力属性の値がばらつく事も多い。これらのような状況下で、各入力属性に対し、一つのラベルとしてまとめられる属性値を見出す事は、経験豊富なプロセス技術者であっても、非常に困難である。したがって、特許文献１のデータ分析は効率が悪い。 However, in order to create the simple decision tree-2 shown in FIG. 12, it is necessary to define the label hierarchical structure shown in FIG. 11 in advance. Therefore, the decision tree generation method of Patent Document 1 cannot be applied when there is no idea of the attribute values to be put together. As described above, in a manufacturing site of a product such as an actual device, process data and in-line inspection data of about 10 to 100 attributes are provided for each process, and the values are multi-valued with a large number of significant digits. Numerically distributed over a very wide range. Furthermore, even if the values of the input attributes are the same, the values of the output attributes often vary due to the influence of disturbance (attributes that cannot be detected as input attributes). Under such circumstances, it is very difficult for even an experienced process engineer to find an attribute value that is put together as one label for each input attribute. Therefore, the data analysis of Patent Document 1 is inefficient.

本発明は、上記従来の問題点を鑑みてなされたものであり、その目的は、データ分析の効率化を図ることにある。より詳細には、本発明の目的は、良品と不良品とを切り分ける入力属性の条件を求める際の計算の負荷を大幅に低減することでデータ分析を効率化できる入力属性条件決定装置および入力属性条件決定方法並びに入力属性条件決定プログラムと、ラベル階層構造を予め定義する事なく、簡潔な形で、出力属性と入力属性との因果関係を導き出せることでデータ分析を効率化できるデータ分析装置およびデータ分析方法並びにデータ分析プログラムを提供することにある。 The present invention has been made in view of the above-mentioned conventional problems, and an object of the present invention is to improve the efficiency of data analysis. More specifically, an object of the present invention is to provide an input attribute condition determining apparatus and an input attribute capable of efficiently performing data analysis by drastically reducing a calculation load when obtaining a condition of an input attribute for separating a non-defective product from a defective product. A condition analysis method and an input attribute condition determination program, and a data analysis device and a data analysis system capable of deriving a causal relationship between an output attribute and an input attribute in a simple manner without predefining a label hierarchical structure, thereby improving data analysis efficiency and data An object of the present invention is to provide an analysis method and a data analysis program.

本発明の他の目的は、上記課題に鑑み、良品と不良品とを切り分ける入力属性の条件を高い確度で決定をすることができる入力属性条件決定装置および入力属性条件決定方法並びに入力属性条件決定プログラムを提供することにある。 Another object of the present invention is to provide an input attribute condition determining apparatus, an input attribute condition determining method, and an input attribute condition determining method capable of determining, with high accuracy, an input attribute condition for separating a non-defective product from a defective product. To provide a program.

本発明に係る入力属性条件決定装置は、上記の課題を解決するために、数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定装置であって、入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段と、入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段と、１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定手段と、上記閾値決定手段で決定された閾値に基づいて、上記入力属性条件を決定する入力属性条件決定手段とを含むことを特徴としている。 In order to solve the above-mentioned problem, an input attribute condition determination device according to the present invention is a set of data composed of at least one input attribute that is a numerical attribute and an output attribute, and depends on the value of the output attribute. An input attribute condition for dividing the analysis data group into two such that the first data group and the second data group are combined with respect to the analysis data group classified into the first data group and the second data group An input attribute condition determination device for determining an input attribute condition, wherein for each of all numerical values of the input attribute, a ratio of data whose input attribute is equal to or less than the numerical value in the first data group is set to a first value. Frequency calculating means for calculating, as the second frequency, the proportion of data whose input attribute is equal to or less than the numerical value in the second data group, and for each of the numerical values of the input attribute. A difference calculating means for calculating a difference value between the first frequency and the second frequency; and a numerical value having the maximum difference value among the numerical values taken by one input attribute is set to a threshold value in the input attribute. Threshold value determining means for determining at least one threshold value corresponding to at least one input attribute; and input attribute condition determining means for determining the input attribute condition based on the threshold value determined by the threshold value determining means. It is characterized by:

本発明に係る入力属性条件決定方法は、上記の課題を解決するために、前記の入力属性条件決定装置を用いて、数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定方法であって、上記頻度演算手段により、入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算ステップと、上記差分演算手段により、入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算ステップと、上記閾値決定手段により、１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定ステップと、上記入力属性条件決定手段により、上記閾値決定手段で決定された閾値に基づいて、上記入力属性条件を決定する入力属性条件決定ステップとを含むことを特徴としている。 In order to solve the above-mentioned problem, an input attribute condition determination method according to the present invention uses the above-described input attribute condition determination device to convert data including at least one input attribute that is a numeric attribute and an output attribute. The analysis data group is classified into a first data group and a second data group according to the value of the output attribute, and the analysis data is divided into a first data group and a second data group. An input attribute condition determining method for determining an input attribute condition, which is a condition of an input attribute for dividing a group into two groups, wherein the frequency calculating means calculates a first data group for each of all numerical values of the input attribute. , The ratio of data whose input attribute is equal to or less than the numerical value is calculated as the first frequency, and the ratio of data whose input attribute is equal to or less than the numerical value is defined as the second frequency in the second data group. A frequency calculation step of calculating, a difference calculation step of calculating a difference value between a first frequency and a second frequency for each of all numerical values of the input attribute by the difference calculation means, A threshold value determining step of determining, as a threshold value in the input attribute, a numerical value in which the difference value is the largest among numerical values taken by one input attribute, and determining at least one threshold value corresponding to at least one input attribute; An input attribute condition determining step of determining the input attribute condition by the input attribute condition determining means based on the threshold value determined by the threshold value determining means.

本発明に係る入力属性条件決定プログラムは、上記の課題を解決するために、数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、コンピュータを、入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段、入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段、１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定手段、および、上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定手段として機能させるための入力属性条件決定プログラムであることを特徴としている。 An input attribute condition determination program according to the present invention is a set of data composed of at least one input attribute that is a numerical attribute and an output attribute, and solves the above problem. With respect to the analysis data group classified into the first data group and the second data group, the computer uses a computer for each of the numerical values that the input attribute takes, and the input attribute is equal to or less than the numerical value in the first data group. Frequency calculating means for calculating the ratio of data as the first frequency, and calculating, as the second frequency, the ratio of data whose input attribute is equal to or less than the numerical value in the second data group, all numerical values of the input attribute , A difference calculating means for calculating a difference value between the first frequency and the second frequency, and among the numerical values taken by one input attribute, a numerical value having the maximum difference value is set as the input attribute. You Threshold value determining means for determining at least one threshold value corresponding to at least one input attribute, and the first data group and the second data group are respectively determined based on the threshold value determined by the threshold value determining means. It is an input attribute condition determination program for functioning as input attribute condition determination means for determining input attribute conditions, which are input attribute conditions for dividing the analysis data group into two.

本発明に係るコンピュータ読み取り可能な記録媒体は、上記の課題を解決するために、上記の入力属性条件決定プログラムを記録したものであることを特徴としている。 A computer-readable recording medium according to the present invention is characterized by recording the above-described input attribute condition determining program in order to solve the above-mentioned problem.

上記装置、方法、プログラム、あるいは記録媒体によれば、入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算し、この差分値を、第１データ群と第２データ群とがそれぞれまとまるように分析データ群を２分化するための閾値の評価指標としている。そして、１つの入力属性がとる各数値の中で、評価指標（差分値）が最大となる数値を、該入力属性の閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定している。これにより、入力属性がとる全ての数値の各々について第１の頻度と第２の頻度との差分値を演算するのみの非常に簡単な演算処理で、Ｇｉｎｉインデックス法の改善度に相当する閾値評価指標を得ることができる。すなわち、上記構成では、Ｇｉｎｉインデックス法のように、入力属性が取り得る全ての分岐条件のパターン毎にＧｉｎｉインデックス（（３）（４）式）や改善度（（６）式）を計算するような膨大な演算処理を行う必要がなく、入力属性が取り得る値の数だけのデータに対して差分値を求める演算処理を行うだけでよい。したがって、分析データ群が実際のデバイス（特に半導体デバイス）のような製品の製造工程のデータである場合のように、一つの入力属性が取り得る値の数が数万〜数十万というオーダであっても、ほとんど計算負荷がかからず、短時間で処理を行うことができる。すなわち、計算負荷がかからず、短時間で、第１データ群と第２データ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を決定することができる。したがって、データ分析の効率化を図ることができる。 According to the apparatus, the method, the program, or the recording medium, a difference value between the first frequency and the second frequency is calculated for each of all numerical values of the input attribute, and the difference value is calculated as the first data. This is used as a threshold evaluation index for dividing the analysis data group into two such that the group and the second data group are integrated. Then, among the numerical values of one input attribute, the numerical value with the largest evaluation index (difference value) is set as the threshold value of the input attribute, and at least one threshold value corresponding to at least one input attribute is determined. I have. This makes it possible to calculate a threshold value corresponding to the degree of improvement of the Gini index method by a very simple calculation process in which only a difference value between the first frequency and the second frequency is calculated for each of all numerical values of the input attribute. An index can be obtained. That is, in the above configuration, like the Gini index method, the Gini index (equations (3) and (4)) and the degree of improvement (equation (6)) are calculated for each of all possible branch condition patterns that the input attribute can take. There is no need to perform an enormous amount of arithmetic processing, and it is only necessary to perform arithmetic processing for obtaining a difference value for data of the number of values that the input attribute can take. Therefore, as in the case where the analysis data group is data of a manufacturing process of a product such as an actual device (especially a semiconductor device), the number of values that one input attribute can take is in the order of tens of thousands to hundreds of thousands. Even if there is, the processing can be performed in a short time with little calculation load. That is, an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined in a short time without applying a calculation load. Therefore, the efficiency of data analysis can be improved.

また、上記構成によれば、第１の頻度および第２の頻度は、それぞれ、対応するデータ群中において入力属性がその数値以下であるデータ数を、そのデータ群中のデータ総数で規格化したものであるから、これらの差分値は、分析データ群中における第１データ群の割合と第２データ群の割合とが極端に異なる場合であっても、その確度を落とすことがなく、この差分値を閾値評価指標とすることによって、第１データ群と第２データ群とを切り分けるための入力属性条件（各入力属性における最適分岐条件）を、高い確度で決定することができる。 Further, according to the above configuration, the first frequency and the second frequency are each obtained by standardizing the number of data whose input attribute is equal to or less than the numerical value in the corresponding data group by the total number of data in the data group. Therefore, even if the ratio of the first data group and the ratio of the second data group in the analysis data group are extremely different, these difference values can be calculated without decreasing the accuracy. By using the value as the threshold evaluation index, an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined with high accuracy.

本発明に係る入力属性条件決定装置は、上記閾値決定手段で決定された閾値における、第１の頻度と第２の頻度との大小関係を判定する極性判定手段をさらに含み、上記入力属性条件決定手段は、入力属性条件を満たすデータ群に第２データ群がまとまり、入力属性条件を満たさないデータ群に第１データ群がまとまるように、上記極性判定手段により第１の頻度が第２の頻度より大きいと判定された場合には、上記入力属性条件を「入力属性が閾値を超える」という条件に決定し、上記極性判定手段により第２の頻度が第１の頻度より大きいと判定された場合には、上記入力属性条件を「入力属性が閾値以下」という条件に決定することがより好ましい。これにより、入力属性条件を満たすデータ群に第２データ群がまとまり、入力属性条件を満たさないデータ群に第１データ群がまとまるような入力属性条件を、具体的に、「入力属性が閾値を超える」という条件、または「入力属性が閾値以下」という条件に決定することができる。 The input attribute condition determination device according to the present invention further includes a polarity determination unit that determines a magnitude relationship between a first frequency and a second frequency in the threshold value determined by the threshold value determination unit, The polarity determining means sets the first frequency to the second frequency so that the data group satisfying the input attribute condition is grouped into the second data group and the data group not satisfying the input attribute condition is grouped by the first data group. If it is determined that the second frequency is greater than the first frequency, the input attribute condition is determined to be a condition that “the input attribute exceeds the threshold”, and the polarity determination means determines that the second frequency is greater than the first frequency. It is more preferable to determine the input attribute condition as a condition that “the input attribute is equal to or less than a threshold”. As a result, the input attribute condition in which the second data group is collected into the data group satisfying the input attribute condition and the first data group is collected into the data group not satisfying the input attribute condition, specifically, “input attribute is set to a threshold value Condition "or" input attribute is below threshold ".

本発明に係る入力属性条件決定装置の実施の一形態においては、上記入力属性は、製品の製造工程における製造プロセス条件および／またはインライン検査結果であり、上記出力属性は、製品の品質判定結果であり、上記第２データ群は、品質判定結果が不良のデータ群である。 In one embodiment of the input attribute condition determining apparatus according to the present invention, the input attribute is a manufacturing process condition and / or an in-line inspection result in a product manufacturing process, and the output attribute is a product quality determination result. The second data group is a data group in which the quality determination result is defective.

この場合、不良品（品質判定結果が不良の製品）が発生する要因となる、製造工程における製造プロセス条件や製造途中での特性（インライン検査結果）を特定することができる。 In this case, it is possible to specify manufacturing process conditions in the manufacturing process and characteristics (in-line inspection results) in the course of manufacturing, which are factors that cause defective products (products with poor quality determination results).

本発明に係るデータ分析装置は、上記の課題を解決するために、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段と、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段と、請求項１または２に記載の入力属性条件決定装置とを含み、上記頻度演算手段および差分演算手段は、分析データ群の各々の入力属性がとる全ての数値の各々について上記演算を行い、上記閾値決定手段は、分析データ群の各々の入力属性について、それぞれ、閾値を決定することを特徴としている。 In order to solve the above problem, a data analysis device according to the present invention provides a basic data group that is a set of data composed of a plurality of input attributes and output attributes. A data analyzer for analyzing a causal relationship and extracting information indicating a causal relationship, wherein the basic data group is classified into a first data group and a second data group according to an output attribute value, and a classification flag is provided. 3. An input attribute condition determination device according to claim 1 or 2, wherein: a classifying unit for assigning the attribute; an analysis data group extraction unit for extracting an analysis data group to be analyzed from the basic data group after the classification; Wherein the frequency calculation means and the difference calculation means perform the calculation for each of all numerical values taken by each of the input attributes of the analysis data group, and the threshold value determination means, for each of the input attributes of the analysis data group, Each It is characterized by determining the threshold value.

本発明に係るデータ分析方法は、上記の課題を解決するために、前記のデータ分析装置を用いて、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析方法であって、上記分類手段により、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類ステップと、上記分析データ群抽出手段により、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出ステップと、上記入力属性条件決定装置の上記頻度演算手段により、分析データ群の各々の入力属性がとる全ての数値の各々について、分析データ群の第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、分析データ群の第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算ステップと、上記入力属性条件決定装置の上記差分演算手段により、分析データ群の各々の入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算ステップと、上記入力属性条件決定装置の上記閾値決定手段により、各々の入力属性について、それぞれ、上記差分値が最大となる数値を該入力属性の閾値として決定する閾値決定ステップと、上記入力属性条件決定装置の上記入力属性条件決定手段により、上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性条件を決定する入力属性条件決定ステップとを含むことを特徴としている。 The data analysis method according to the present invention, in order to solve the above problems, using the data analysis device, a plurality of input attributes, a basic data group that is a set of data composed of output attributes A data analysis method for analyzing a causal relationship between an input attribute and an output attribute, and extracting information indicating the causal relationship, wherein the classifying means divides the basic data group into a first set based on a value of the output attribute. A classification step of classifying the data into a data group and a second data group and assigning a classification flag, and extracting the analysis data group to be analyzed from the classified basic data group by the analysis data group extraction means. The analysis data group extracting step, and the frequency calculating means of the input attribute condition determining apparatus, for each numerical value of each input attribute of the analysis data group, includes in the first data group of the analysis data group Calculating the ratio of the data whose input attribute is equal to or less than the numerical value as the first frequency, and calculating the ratio of the data whose input attribute is equal to or less than the numerical value in the second data group of the analysis data group to the second frequency. And a difference between the first frequency and the second frequency for each numerical value of each input attribute of the analysis data group by the difference calculating means of the input attribute condition determination device. A difference calculating step of calculating a value, and a threshold value determining step of determining, by the threshold value determining means of the input attribute condition determining device, a numerical value that maximizes the difference value for each input attribute as a threshold value of the input attribute. A first data group and a second data group based on the threshold value determined by the threshold value determining means by the input attribute condition determining means of the input attribute condition determining device. Bets is characterized in that it comprises an input attribute condition determining step of determining an input attribute conditions for 2 differentiating the analytical data group as settled respectively.

本発明に係るデータ分析プログラムは、上記の課題を解決するために、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、コンピュータを、上記基本データ群を、出力属性の値に依って、第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段、分析データ群の各々の入力属性がとる全ての数値の各々について、分析データ群の第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、分析データ群の第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段、分析データ群の各々の入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段、各々の入力属性について、それぞれ、上記差分値が最大となる数値を、該入力属性の閾値として決定する閾値決定手段、上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定手段として機能させるためのデータ分析プログラムであることを特徴としている。 In order to solve the above problem, a data analysis program according to the present invention, for a basic data group which is a set of data composed of a plurality of input attributes and output attributes, Classifying means for classifying the data into a first data group and a second data group according to the value of the output attribute, and assigning a classification flag; and analyzing data to be analyzed from among the basic data groups after the classification. Analysis data group extraction means for extracting a group, for each numerical value of each input attribute of the analysis data group, a ratio of data whose input attribute is equal to or less than the numerical value in the first data group of the analysis data group; A frequency calculating means for calculating, as a second frequency, a ratio of data whose input attribute is equal to or less than the numerical value in a second data group of the analysis data group as a second frequency; A difference calculating means for calculating a difference value between the first frequency and the second frequency for each of all numerical values of each input attribute, and for each input attribute, a numerical value having the maximum difference value. Threshold determining means for determining as a threshold of the input attribute, based on the threshold determined by the threshold determining means, to divide the analysis data group into two such that the first data group and the second data group are united, respectively. This is a data analysis program for functioning as input attribute condition determining means for determining an input attribute condition, which is a condition of the input attribute.

本発明に係るコンピュータ読み取り可能な記録媒体は、上記の課題を解決するために、上記のデータ分析プログラムを記録したものであることを特徴としている。 A computer-readable recording medium according to the present invention is characterized by recording the above-mentioned data analysis program in order to solve the above-mentioned problems.

上記装置、方法、プログラム、あるいは記録媒体によれば、前記の入力属性条件決定装置、方法、プログラム、あるいは記録媒体を含むので、データ分析の効率化を図ることができると共に、第１データ群と第２データ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を、高い確度で決定することができる。 According to the above-described apparatus, method, program, or recording medium, since the above-described input attribute condition determining apparatus, method, program, or recording medium is included, the efficiency of data analysis can be improved, and the first data group and An input attribute condition (optimal branch condition for each input attribute) for separating the input data from the second data group can be determined with high accuracy.

本発明に係るデータ分析装置は、数値属性でない入力属性を含む基本データ群に対し、入力属性を数値に変換する処理を行う数値変換手段をさらに備えていてもよい。これにより、少なくとも１つの数値でない入力属性と出力属性とで構成されるデータの集合である基本データ群に対しても、入力属性条件を決定することが可能となる。 The data analysis device according to the present invention may further include a numerical value conversion unit that performs a process of converting the input attribute into a numerical value for a basic data group including an input attribute that is not a numerical attribute. As a result, it is possible to determine the input attribute condition for a basic data group that is a set of data including at least one non-numeric input attribute and output attribute.

本発明に係るデータ分析装置は、上記入力属性条件決定装置は、複数の入力属性条件を決定するようになっており、上記入力属性条件決定装置で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という相関ルールの確からしさを表す分割ルール評価値を演算する分割ルール評価手段と、上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件に基づいて、上記分析データ群を、該入力属性条件を満たす要因データ群と、該入力属性条件を満たさない他データ群とに分割する分割手段とを含んでいてもよい。 In the data analysis device according to the present invention, the input attribute condition determination device determines a plurality of input attribute conditions. For each of the input attribute conditions determined by the input attribute condition determination device, the If the attribute satisfies the input attribute condition, the data is included in the second data group in the analysis data group. " Based on the input attribute condition having the largest division rule evaluation value among the input attribute conditions determined by the determining device, the analysis data group is converted into a factor data group satisfying the input attribute condition and the input attribute condition And a dividing unit for dividing the data into another data group that does not satisfy the condition.

本発明に係るデータ分析方法は、上記分割ルール評価手段により、上記入力属性条件決定装置で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という相関ルールの確からしさを表す分割ルール評価値を演算する分割ルール評価ステップと、上記分割手段により、上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件に基づいて、上記分析データ群を、該入力属性条件を満たす要因データ群と、該入力属性条件を満たさない他データ群とに分割する分割ステップとを含んでいてもよい。 In the data analysis method according to the present invention, the division rule evaluating means may determine, for each of the input attribute conditions determined by the input attribute condition determination device, that “if the input attribute satisfies the input attribute condition, A partition rule evaluation step for calculating a partition rule evaluation value representing the likelihood of the correlation rule that the data is included in two data groups. "The division means evaluates the input attribute condition determined by the input attribute condition determination device. And dividing the analysis data group into a factor data group that satisfies the input attribute condition and another data group that does not satisfy the input attribute condition, based on the input attribute condition having the largest division rule evaluation value. Steps may be included.

上記装置、方法によれば、上記入力属性条件決定装置で決定された複数の入力属性条件の中から、最大の分割ルール評価値を持つ入力属性条件を満たす要因データ群、すなわち第２データ群に対応する問題事象（例えば不良品の発生）が起こる最大の要因（入力属性条件）を持つデータ群を導き出せる。 According to the apparatus and the method, from among the plurality of input attribute conditions determined by the input attribute condition determination device, a factor data group satisfying the input attribute condition having the largest division rule evaluation value, that is, a second data group A data group having the largest factor (input attribute condition) at which a corresponding problem event (for example, occurrence of a defective product) occurs can be derived.

本発明に係るデータ分析装置は、上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうちの少なくとも一方を新たな分析データ群として抽出し、分析データ群抽出手段による処理、入力属性条件決定装置による処理、分割ルール評価手段による処理、および、分割手段による処理からなる一連の処理が繰り返し実行されるようになっていることが好ましい。 In the data analyzer according to the present invention, the analysis data group extraction unit extracts at least one of the data groups divided by the division unit as a new analysis data group, and performs processing and input by the analysis data group extraction unit. It is preferable that a series of processes including a process by the attribute condition determination device, a process by the division rule evaluation unit, and a process by the division unit are repeatedly executed.

上記構成によれば、繰り返しの処理によって、より詳細な要因分析結果が得られ、複数の要因を節点として木構造を作成できる。それゆえ、単独の相関ルールでは表現し難い複数の要因の絡み合った分析対象であっても、十分高い精度で要因を究明できる。 According to the above configuration, a more detailed factor analysis result is obtained by the repetitive processing, and a tree structure can be created using a plurality of factors as nodes. Therefore, even for an analysis target in which a plurality of factors are difficult to be expressed by a single association rule, the factors can be determined with sufficiently high accuracy.

また、繰り返し処理を行わない場合に、外乱の影響で閾値評価指標（差分値）の確度が低かったとしても、繰り返し処理を行うことにより、この問題を解消できる。 In addition, when the repetition processing is not performed, even if the accuracy of the threshold evaluation index (difference value) is low due to the influence of disturbance, the problem can be solved by performing the repetition processing.

さらに、ある入力属性において、第２データ群に対応する出力属性条件の要因が、「入力属性が閾値以下である」、および、「入力属性が閾値を超える」という２タイプの場合においても、繰り返しの処理によって、それらの双方の要因を抽出することができる。 Furthermore, in a certain input attribute, even if the factors of the output attribute condition corresponding to the second data group are two types of “the input attribute is equal to or less than the threshold” and “the input attribute exceeds the threshold”, By the processing described above, both of these factors can be extracted.

本発明に係るデータ分析装置は、上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうち他データ群のみを、新たな分析データ群として抽出するものであることが好ましい。 In the data analysis device according to the present invention, it is preferable that the analysis data group extraction unit extracts only another data group among the data groups divided by the division unit as a new analysis data group.

上記構成によれば、分割手段で分割されたデータ群のうち他データ群のみを、新たな分析データ群として上記の繰り返し処理を行っているから、第２データ群に対応する出力属性条件の要因分析に対して、簡潔で、かつ、十分な要因分析結果が得られる。 According to the above configuration, since only the other data group among the data groups divided by the dividing means is subjected to the above-described repetition processing as a new analysis data group, the factor of the output attribute condition corresponding to the second data group is changed. A simple and sufficient factor analysis result can be obtained for the analysis.

また、他データ群を新たな分析データ群として処理を行っているから、それ以前の繰り返し処理の過程で抽出された要因（入力属性条件）の影響を除外でき、第２データ群に対応する出力属性条件の、新たな要因を高い確度で抽出することができる。 In addition, since the other data group is processed as a new analysis data group, the influence of the factors (input attribute conditions) extracted in the process of the previous repetition processing can be excluded, and the output corresponding to the second data group can be eliminated. A new factor of the attribute condition can be extracted with high accuracy.

上記分割ルール評価手段は、上記入力属性条件決定装置で決定された入力属性条件の各々について、上記分析データ群の第１データ群中で該入力属性条件を満たすデータの割合に対する、上記分析データ群の第２データ群中で該入力属性条件を満たすデータの割合の比率を、分割ルール評価値として演算するものであることが好ましい。これにより、ルール評価値を容易に演算することができる。 For each of the input attribute conditions determined by the input attribute condition determination device, the division rule evaluating means may determine the ratio of the analysis data group to the ratio of data satisfying the input attribute condition in the first data group of the analysis data group. It is preferable to calculate the ratio of the ratio of the data satisfying the input attribute condition in the second data group as the division rule evaluation value. Thus, the rule evaluation value can be easily calculated.

本発明に係るデータ分析装置は、分類条件を設定する分類条件設定手段をさらに含み、上記分類手段は、分類条件設定手段で設定された分類条件に基づいて基本データ群を分類するようになっていてもよい。これにより、ユーザが分類条件を任意に設定してすることが可能となるので、それに対応した多様な入力属性条件（要因）を導き出すことができる。 The data analyzer according to the present invention further includes a classification condition setting means for setting a classification condition, and the classification means classifies the basic data group based on the classification condition set by the classification condition setting means. May be. As a result, the user can arbitrarily set the classification condition, so that various input attribute conditions (factors) corresponding to the classification condition can be derived.

本発明に係るデータ分析装置は、上記基本データ群は、複数の出力属性を含み、上記分類条件設定手段は、上記複数の出力属性の各々に対して分類条件を設定し、上記分類手段は、分類条件設定手段で設定された各々の分類条件の論理和または論理積に依って、基本データ群を分類するようになっていてもよい。これにより、複数の出力属性条件がともに満たされる要因や、複数の出力属性条件のいずれかが満たされる要因を導き出すことができる。 In the data analysis device according to the present invention, the basic data group includes a plurality of output attributes, the classification condition setting unit sets a classification condition for each of the plurality of output attributes, and the classification unit includes The basic data group may be classified based on a logical sum or a logical product of the respective classification conditions set by the classification condition setting means. As a result, it is possible to derive a factor that satisfies the plurality of output attribute conditions together and a factor that satisfies any of the plurality of output attribute conditions.

本発明に係るデータ分析装置は、上記入力属性条件決定装置は、複数の入力属性条件を決定するようになっており、上記入力属性条件決定装置で決定された入力属性条件の各々について、上記基本データ群中で該入力属性条件を満たすデータの中に第２データ群が含まれる割合を表す、第２データ群分離度を演算する第２データ群分離度演算手段と、上記入力属性条件決定装置で決定された入力属性条件の中で、上記基本データ群中の第２データ群の割合を表す第２データ群含有率よりも大きい値の、第２データ群分離度をもつ入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第１の要因抽出手段とを含む構成であってもよい。 In the data analysis device according to the present invention, the input attribute condition determination device determines a plurality of input attribute conditions, and for each of the input attribute conditions determined by the input attribute condition determination device, A second data group separability calculating means for calculating a second data group separability indicating a ratio of the second data group included in data satisfying the input attribute condition in the data group; Among the input attribute conditions determined in the above, an input attribute condition having a second data group separability of a value larger than the second data group content rate representing the ratio of the second data group in the basic data group is A first factor extracting means for extracting as information indicating a factor of an output attribute condition corresponding to the second data group may be included.

上記構成によれば、第２データ群に対応する出力属性条件の最大の要因である入力属性条件（決定木における分岐条件）だけでなく、それ以外の入力属性条件についても、第２データ群分離度の高い入力属性条件を全て抽出できる。したがって、第２データ群に対応する出力属性条件の最大の要因（決定木における分岐条件）に競合する要因（競合因子）が存在しても、その要因を逃すことなく確実に捉えることができる。また、上記構成によれば、決定した複数の要因（入力属性条件）に対して、第２データ群分離度を評価指標として優先順位（第２データ群に対応する出力属性条件に対する影響度の順位）を付けることが可能となる。 According to the above configuration, the second data group separation is performed not only for the input attribute condition (branch condition in the decision tree) which is the largest factor of the output attribute condition corresponding to the second data group, but also for other input attribute conditions. All input attribute conditions with high degrees can be extracted. Therefore, even if there is a competing factor (competitive factor) in the largest factor (branch condition in the decision tree) of the output attribute condition corresponding to the second data group, the factor can be reliably grasped without missing the factor. Further, according to the above configuration, for the plurality of determined factors (input attribute conditions), the priority order (the order of the influence degree on the output attribute condition corresponding to the second data group) is set using the second data group separation as an evaluation index. ) Can be added.

さらに、上記構成によれば、第２データ群が第１データ群から分離される確度を示す第２データ群分離度という明確な指標に基づいて、出力属性条件の要因を示す情報を抽出している。そのため、如何に複雑な決定木となろうとも明確に問題事象の要因を把握することができる。 Further, according to the above configuration, information indicating a factor of the output attribute condition is extracted based on a clear index of the second data group separation indicating the probability that the second data group is separated from the first data group. I have. Therefore, no matter how complicated the decision tree becomes, the cause of the problem event can be clearly grasped.

本発明に係るデータ分析装置は、上記入力属性条件決定装置は、複数の入力属性条件を決定するようになっており、上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含む構成であってもよい。 In the data analysis device according to the present invention, the input attribute condition determining device determines a plurality of input attribute conditions, and among the input attribute conditions determined by the input attribute condition determining device, the largest one. A second factor extracting means for extracting an input attribute condition having a division rule evaluation value as information indicating a factor of an output attribute condition corresponding to the second data group may be included.

上記構成によれば、ラベル階層構造を予め定義する事なく、簡潔な形で、第２データ群に対応する出力属性条件（結果）の要因を抽出でき、データ分析の効率化を図ることができる。 According to the above configuration, the factor of the output attribute condition (result) corresponding to the second data group can be extracted in a simple manner without defining the label hierarchical structure in advance, and the efficiency of data analysis can be improved. .

本発明に係るデータ分析装置は、上記の課題を解決するために、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段と、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段と、分析データ群の各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価手段と、分析データ群の各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定手段と、上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す第２評価指標を演算する第２の評価手段と、上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含むことを特徴としている。 In order to solve the above problem, a data analysis device according to the present invention provides a basic data group that is a set of data composed of a plurality of input attributes and output attributes. A data analyzer for analyzing a causal relationship and extracting information indicating a causal relationship, wherein the basic data group is classified into a first data group and a second data group according to an output attribute value, and a classification flag is provided. , An analysis data group extraction means for extracting an analysis data group to be analyzed from among the basic data groups after the classification, and all possible inputs of each input attribute of the analysis data group. For each of the attribute conditions, "If the input attribute satisfies the input attribute condition, the data belongs to the second data group in the analysis data group. If the input attribute does not satisfy the input attribute condition, the first data in the analysis data group. Belongs to data group First evaluation means for calculating an input attribute condition evaluation index indicating the likelihood of the first correlation rule that the first association rule is the data to be analyzed, and the maximum input attribute condition evaluation for each input attribute of the analysis data group. An input attribute condition determining means for determining an input attribute condition having an index as an input attribute condition satisfying the first association rule, and an input attribute condition determined by the input attribute condition determining means, wherein " If the input attribute condition is satisfied, the data is included in the second data group in the analysis data group "; a second evaluation means for calculating a second evaluation index indicating the likelihood of the second correlation rule; Among the input attribute conditions determined by the attribute condition determining means, the input attribute condition with the second evaluation index being the maximum is defined as information indicating the factor of the output attribute condition corresponding to the second data group. It is characterized in that it comprises a second factor extraction means for output.

本発明に係るデータ分析方法は、上記の課題を解決するために、前記のデータ分析装置を用いて、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析方法であって、上記分類手段により、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類ステップと、上記分析データ群抽出手段により、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出ステップと、上記第１の評価手段により、各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価ステップと、上記入力属性条件決定手段により、各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定ステップと、上記第２の評価手段により、上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す、第２評価指標を演算する第２の評価ステップと、第２の要因抽出手段により、上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出ステップとを含むことを特徴としている。 The data analysis method according to the present invention, in order to solve the above problems, using the data analysis device, a plurality of input attributes, a basic data group that is a set of data composed of output attributes A data analysis method for analyzing a causal relationship between an input attribute and an output attribute, and extracting information indicating the causal relationship, wherein the classifying means divides the basic data group into a first set based on a value of the output attribute. A classification step of classifying the data into a data group and a second data group and assigning a classification flag, and extracting the analysis data group to be analyzed from the classified basic data group by the analysis data group extraction means. The analysis data group extraction step, and the first evaluation means, for each of the input attribute conditions that each of the input attributes can take, "If the input attribute satisfies the input attribute condition, the second in the analysis data group Data belonging to the data group, and if the input attribute does not satisfy the input attribute condition, the data belongs to the first data group in the analysis data group. " A first evaluation step of calculating an evaluation index, and the input attribute condition determination means satisfies the first correlation rule for an input attribute condition having a maximum input attribute condition evaluation index for each input attribute. An input attribute condition determining step of determining as an input attribute condition, and the second evaluation means determines, for each of the input attribute conditions determined by the input attribute condition determining means, that if the input attribute satisfies the input attribute condition, the analysis is performed. A second evaluation step for calculating a second evaluation index, which represents the likelihood of the second correlation rule that "the data is included in the second data group in the data group." And an input attribute condition that maximizes the second evaluation index among the input attribute conditions determined by the input attribute condition determining means by the second factor extracting means, the output attribute corresponding to the second data group. And a second factor extracting step of extracting the information as the information indicating the factor of the condition.

本発明に係るデータ分析プログラムは、上記の課題を解決するために、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、コンピュータを、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段、各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価手段、各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定手段、上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す、第２評価指標を演算する第２の評価手段、上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段として機能させるためのデータ分析プログラムであることを特徴としている。 In order to solve the above problem, a data analysis program according to the present invention, for a basic data group which is a set of data composed of a plurality of input attributes and output attributes, executes a computer on the basic data group Classifying means for classifying the data into a first data group and a second data group according to the value of the output attribute, and assigning a classification flag; and an analysis data group to be analyzed from among the basic data groups after the classification. Analysis data group extracting means for extracting the input data, all input attribute conditions that each input attribute can take, "If the input attribute satisfies the input attribute condition, the data belongs to the second data group in the analysis data group. If the input attribute does not satisfy the input attribute condition, it is data belonging to the first data group in the analysis data group. " A first evaluation means, for each input attribute, an input attribute condition determining means for determining an input attribute condition having a maximum input attribute condition evaluation index as an input attribute condition satisfying the first correlation rule, For each of the input attribute conditions determined by the attribute condition determining means, it is confirmed that the second correlation rule that "if the input attribute satisfies the input attribute condition, the data is included in the second data group in the analysis data group". Second evaluation means for calculating a second evaluation index, which represents the likelihood, and among the input attribute conditions determined by the input attribute condition determination means, an input attribute condition in which the second evaluation index is the maximum is defined as second data. It is a data analysis program for functioning as a second factor extracting means for extracting as information indicating a factor of an output attribute condition corresponding to a group.

上記装置、方法、プログラム、あるいは記録媒体によれば、ラベル階層構造を予め定義する事なく、簡潔な形で、第２データ群に対応する出力属性条件（結果）の要因を抽出できる。それゆえ、例えば第２データ群が悪い結果（例えば不良品の発生）に対応するデータ群であれば、その悪い結果の要因をユーザが容易に把握できる。逆に、第２データ群が良い結果（例えば優れた特性を持つ製品の発生）に対応するデータ群であれば、その良い結果の要因をユーザが容易に把握できる。したがって、データ分析の効率化を図ることができる。 According to the above-described apparatus, method, program, or recording medium, it is possible to extract the factor of the output attribute condition (result) corresponding to the second data group in a simple form without defining the label hierarchical structure in advance. Therefore, for example, if the second data group is a data group corresponding to a bad result (for example, occurrence of a defective product), the user can easily grasp the factor of the bad result. Conversely, if the second data group is a data group corresponding to a good result (for example, occurrence of a product having excellent characteristics), the user can easily grasp the factor of the good result. Therefore, the efficiency of data analysis can be improved.

上記データ分析装置は、上記第１の評価手段は、各入力属性の全ての数値の各々について、入力属性がその数値以下のデータと、入力属性がその数値を超えるデータとの２分化による、第１データ群と第２データ群との切り分けの程度を表す閾値評価指標を、上記入力属性条件評価指標として演算し、上記第２の評価手段は、上記入力属性条件決定手段で決定された入力属性条件の各々について、上記分析データ群の第１データ群中で該入力属性条件を満たすデータの割合に対する、上記分析データ群の第２データ群中で該入力属性条件を満たすデータの割合の比率を、第２評価指標として演算するものであることがより好ましい。 In the data analysis device, the first evaluation means may be configured such that, for each of all numerical values of each input attribute, a data is divided into data having an input attribute equal to or less than the numerical value and data having an input attribute exceeding the numerical value. A threshold evaluation index indicating the degree of separation between the first data group and the second data group is calculated as the input attribute condition evaluation index, and the second evaluation means determines the input attribute determined by the input attribute condition determination means. For each of the conditions, the ratio of the ratio of the data satisfying the input attribute condition in the second data group of the analysis data group to the ratio of the data satisfying the input attribute condition in the first data group of the analysis data group is It is more preferable to calculate as the second evaluation index.

本発明に係るデータ分析装置は、上記の課題を解決するために、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群を分析対象とし、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、基本データ群を出力属性に依って第１データ群と第２データ群とに分類する分類手段と、各入力属性の全ての数値について、入力属性がその数値以下であるデータが第１データ群および第２データ群のうちの一方に偏っている度合いを表す閾値評価指標を演算する第１の評価手段と、第１の評価手段で演算された閾値評価指標に基づいて、各入力属性について最大の閾値評価指標を持つ数値を各入力属性の閾値として決定する閾値決定手段と、閾値決定手段で決定された各入力属性の閾値に基づいて、「入力属性が閾値以下であれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す第１のルール評価値と、「入力属性が閾値を超えていれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す第２のルール評価値とを各入力属性について演算する第２の評価手段と、第２の評価手段でルール評価値が演算された、全ての入力属性に関する相関ルールのうちで最も高いルール評価値を持つ相関ルールの入力属性条件を示すデータを、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含むことを特徴としている。 The data analysis device according to the present invention, in order to solve the above problems, a plurality of input attributes, a basic data group that is a set of data composed of output attributes and the analysis target, input attributes and output attributes and A data analyzing apparatus for analyzing a causal relationship of the data and extracting information indicating the causal relationship, wherein a classifying means for classifying the basic data group into a first data group and a second data group according to an output attribute; First evaluation means for calculating a threshold evaluation index indicating a degree of bias of data whose input attribute is equal to or less than the numerical value of the attribute to one of the first data group and the second data group, A threshold determining unit that determines a numerical value having a maximum threshold evaluation index for each input attribute as a threshold of each input attribute based on the threshold evaluation index calculated by the first evaluation unit; Input attribute And a first rule evaluation value indicating the certainty of the correlation rule that “the input attribute is less than or equal to the threshold, the data is included in the second data group” and “the input attribute exceeds the threshold. A second rule evaluation value that represents the likelihood of the correlation rule that the data is included in the second data group. " The data indicating the input attribute condition of the correlation rule having the highest rule evaluation value among the correlation rules regarding all the input attributes whose values have been calculated are used as information indicating the factors of the output attribute condition corresponding to the second data group. And a second factor extracting means for extracting.

本発明に係るデータ分析装置は、上記第２の要因抽出手段で抽出された入力属性条件に基づいて、分析データ群を、上記入力属性条件を満たす要因データ群と、上記入力属性条件を満たさない他データ群とに分割する分割手段をさらに含み、上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうちの少なくとも一方を新たな分析データ群として抽出し、分析データ群抽出手段による処理、第１の評価手段による処理、入力属性条件決定手段による処理、第２の評価手段による処理、第２の要因抽出手段による処理、および分割手段による処理からなる一連の処理が繰り返し実行されるようになっていることがより好ましい。 The data analysis device according to the present invention converts the analysis data group into a factor data group satisfying the input attribute condition and a factor data group not satisfying the input attribute condition based on the input attribute condition extracted by the second factor extraction unit. Further comprising a dividing means for dividing the data into other data groups, wherein the analysis data group extracting means extracts at least one of the data groups divided by the dividing means as a new analysis data group; , A process by the first evaluator, a process by the input attribute condition determiner, a process by the second evaluator, a process by the second factor extractor, and a process by the divider are repeatedly executed. It is more preferred that it is so.

また、他データ群を新たな分析データ群として処理を行っているから、それ以前の繰り返し処理の過程で抽出された要因（入力属性条件）の影響を除外でき、第２データ群に対応する出力属性条件の、新たな要因を高い確度で抽出する事ができる。 In addition, since the other data group is processed as a new analysis data group, the influence of the factors (input attribute conditions) extracted in the process of the previous repetition processing can be excluded, and the output corresponding to the second data group can be eliminated. A new factor of the attribute condition can be extracted with high accuracy.

上記第１の評価手段は、各入力属性の全ての数値について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段と、各入力属性の全ての数値について、第１の頻度と第２の頻度との差分を演算する差分演算手段とを含むことがより好ましい。これにより、閾値評価指標を容易に演算することができる。すなわち、計算負荷がかからず、短時間で、第１データ群と第２データ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を決定することができる。また、分析データ群中における第１データ群の割合と第２データ群の割合とが極端に異なる場合であっても、その確度を落とすことがなく、第１データ群と第２データ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を、高い確度で決定することができる。 The first evaluation means calculates, for all numerical values of each input attribute, a ratio of data whose input attribute is equal to or less than the numerical value as a first frequency in the first data group, and calculates a ratio of data in the second data group. And a frequency calculating means for calculating, as a second frequency, a ratio of data whose input attribute is equal to or less than the numerical value, and calculating a difference between the first frequency and the second frequency for all numerical values of each input attribute. It is more preferable to include a difference calculation means. Thus, the threshold evaluation index can be easily calculated. That is, an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined in a short time without applying a calculation load. Further, even when the ratio of the first data group and the ratio of the second data group in the analysis data group are extremely different, the accuracy is not reduced and the first data group and the second data group are separated. Input attribute conditions to be separated (optimal branch conditions for each input attribute) can be determined with high accuracy.

上記第２の評価手段は、第１のルール評価値として、第１データ群中において入力属性が閾値以下であるデータの割合に対する、第２データ群中において入力属性が閾値以下であるデータの割合の比率を第１の比率として演算すると共に、第２のルール評価値として、第１データ群中において入力属性が閾値を超えるデータの割合に対する、第２データ群中において入力属性が閾値を超えるデータの割合の比率を第２の比率として演算するものであることがより好ましい。これにより、第１および第２のルール評価値を容易に演算することができる。 The second evaluation means calculates, as a first rule evaluation value, a ratio of data having an input attribute equal to or smaller than a threshold in the second data group to a ratio of data having an input attribute equal to or smaller than a threshold in the first data group. Is calculated as a first ratio, and as a second rule evaluation value, a ratio of data whose input attribute exceeds the threshold in the second data group to a ratio of data whose input attribute exceeds the threshold in the first data group It is more preferable that the ratio of the ratio is calculated as the second ratio. This makes it possible to easily calculate the first and second rule evaluation values.

本発明に係るデータ分析装置は、終了条件を満たしているか否かを判定する終了条件判定手段をさらに含み、上記終了条件判定手段は、上記分析データ群抽出手段で抽出した分析データ群における第２データ群のデータ数が０であるかを終了条件として判定を行い、上記終了条件判定手段において終了条件を満たしていると判定されると、上記一連の処理の実行を終了するようになっていることがより好ましい。これにより、必要以上の無駄な処理が行われることを回避できる。 The data analyzer according to the present invention further includes an end condition judging unit for judging whether or not an end condition is satisfied, wherein the end condition judging unit is configured to execute the second condition in the analysis data group extracted by the analysis data group extraction unit. A determination is made as to whether the number of data in the data group is 0 as an end condition. When the end condition determining means determines that the end condition is satisfied, the execution of the series of processes is terminated. Is more preferable. Thereby, it is possible to avoid performing unnecessary processing more than necessary.

本発明の入力属性条件決定装置、入力属性条件決定方法、入力属性条件決定プログラムおよびそれを記録した記録媒体によれば、第１のデータ群と第２のデータ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を決定するための評価指標として第１の頻度と第２の頻度との差分値を用いており、第１の頻度と第２の頻度との差分値を演算するだけの簡単な演算処理で評価指標（差分値）を演算できる。そのため、計算負荷がかからず、短時間で、第１のデータ群と第２のデータ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を決定することができる。 According to the input attribute condition determining apparatus, the input attribute condition determining method, the input attribute condition determining program, and the recording medium storing the same according to the present invention, the input attribute condition (each of the first data group and the second data group) is separated. The difference value between the first frequency and the second frequency is used as an evaluation index for determining the optimal branch condition in the input attribute), and only the difference value between the first frequency and the second frequency is calculated. The evaluation index (difference value) can be calculated by the simple calculation processing. Therefore, an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined in a short time without applying a calculation load.

また、第１の頻度および第２の頻度は、それぞれ、各々に対応するデータ群中のデータ総数で規格化したものであるから、それらの差分値は、全データに対する第１のデータ群の割合と、全データに対する第２のデータ群の割合とが極端に異なっていても、入力属性条件（各入力属性における最適分岐条件）の正確な評価指標となる。したがって、第１のデータ群と第２のデータ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を、高い確度で決定することができる。 Further, since the first frequency and the second frequency are each normalized by the total number of data in the corresponding data group, their difference value is the ratio of the first data group to the total data. Even if the ratio of the second data group to all the data is extremely different, it becomes an accurate evaluation index of the input attribute condition (the optimal branch condition for each input attribute). Therefore, an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined with high accuracy.

本発明のデータ分析装置、データ分析方法、データ分析プログラムおよびそれを記録した記録媒体によれば、前記の入力属性条件決定装置、方法、プログラム、あるいは記録媒体とを含むので、データ分析の効率化を図ることができると共に、第１データ群と第２データ群とを切り分ける入力属性条件（各入力属性における最適分岐条件）を高い確度で決定することができ、データ分析の信頼性を高めることができる。 According to the data analysis device, the data analysis method, the data analysis program and the recording medium recording the same of the present invention, the input attribute condition determining device, the method, the program, or the recording medium are included, so that the efficiency of data analysis is improved. And an input attribute condition (optimal branch condition for each input attribute) for separating the first data group and the second data group can be determined with high accuracy, and the reliability of data analysis can be improved. it can.

また、ラベル階層構造を予め定義する事なく、「入力属性が入力属性条件を満たす」、例えば「入力属性が閾値以下」あるいは「入力属性が閾値を超える」といった非常に簡潔な形で、問題事象である特定の出力属性条件（問題事象）が発生する要因を導き出すことが可能となる。 Also, without defining the label hierarchical structure in advance, the problem event can be expressed in a very simple form such as "input attribute satisfies input attribute condition", for example, "input attribute is below threshold" or "input attribute exceeds threshold". It is possible to derive a factor that causes a specific output attribute condition (problem event).

以上のことから、本発明は、データ分析の効率化を図ることができるという効果を奏する。また、本発明の入力属性決定装置、入力属性決定方法、入力属性決定プログラムおよびそれを記録した記録媒体は、良品と不良品とを切り分ける入力属性の条件を高い確度で決定をすることができるという効果を奏する。 As described above, the present invention has an effect that the efficiency of data analysis can be improved. Further, the input attribute determining apparatus, the input attribute determining method, the input attribute determining program, and the recording medium recording the same according to the present invention can determine the condition of the input attribute for separating a non-defective product from a defective product with high accuracy. It works.

〔実施の形態１〕
次に、本発明の一実施形態を以下に説明する。 [Embodiment 1]
Next, an embodiment of the present invention will be described below.

まず、本実施形態のデータ分析装置１００、および、その構成要素である入力属性条件決定装置１００Ａを図１３に基づいて説明する。図１３に示すように、データ分析装置１００は、基本データ群格納部１０２、文字−数値データ変換部（数値変換手段）１、分類条件設定部（分類条件設定手段）１０３、データ分類部（分類手段）１０４、分類後基本データ群格納部１０５、分析データ群抽出部（分析データ群抽出手段）１０６、データ行分離部１０７、データ列抽出部５、頻度演算部（頻度演算手段）６、頻度累積差演算部（差分演算手段）７、閾値決定部（閾値決定手段）１３０、極性判定部（極性判定手段）１３１、入力属性条件決定部（入力属性条件決定手段）１１１、不良品分離度演算部（第２データ群分離度演算手段）１１２、第１の要因抽出部（第１の要因抽出手段）１０９、頻度累積比率演算部（分割ルール評価手段）１６、データ分割部（分割手段）１１５、終了条件判定部（終了条件判定手段）１１、要因決定部１１７、複合要因不良数計算部１１８、数値−文字データ変換部１１９、分析結果データ格納部１４、および出力部１５を備えている。 First, the data analysis device 100 of the present embodiment and the input attribute condition determination device 100A that is a component thereof will be described with reference to FIG. As shown in FIG. 13, the data analyzer 100 includes a basic data group storage unit 102, a character-numerical data conversion unit (numerical conversion unit) 1, a classification condition setting unit (classification condition setting unit) 103, and a data classification unit (classification). Means) 104, classified basic data group storage unit 105, analysis data group extraction unit (analysis data group extraction unit) 106, data row separation unit 107, data string extraction unit 5, frequency calculation unit (frequency calculation unit) 6, frequency Cumulative difference calculation unit (difference calculation unit) 7, threshold value determination unit (threshold value determination unit) 130, polarity determination unit (polarity determination unit) 131, input attribute condition determination unit (input attribute condition determination unit) 111, defective product separation degree calculation Unit (second data group separability calculating unit) 112, first factor extracting unit (first factor extracting unit) 109, frequency accumulation ratio calculating unit (dividing rule evaluating unit) 16, data dividing unit (dividing unit) 1 5, an end condition judging unit (end condition judging means) 11, a factor determining unit 117, a complex factor defect number calculating unit 118, a numerical value-character data converting unit 119, an analysis result data storing unit 14, and an output unit 15. .

上記のデータ分析装置１００（図１３）の中で、データ行分離部１０７、データ列抽出部５、頻度演算部（頻度演算手段）６、頻度累積差演算部（差分演算手段）７、閾値決定部（閾値決定手段）１３０、極性判定部（極性判定手段）１３１、および、入力属性条件決定部（入力属性条件決定手段）１１１が、入力属性条件決定装置１００Ａを構成する。 In the data analyzer 100 (FIG. 13), the data row separating unit 107, the data string extracting unit 5, the frequency calculating unit (frequency calculating unit) 6, the frequency cumulative difference calculating unit (difference calculating unit) 7, the threshold value determining unit The unit (threshold determining unit) 130, the polarity determining unit (polarity determining unit) 131, and the input attribute condition determining unit (input attribute condition determining unit) 111 constitute the input attribute condition determining device 100A.

基本データ群格納部１０２は、実施の形態２のデータ分析装置における分析対象データ格納部２と同一の機能を有するものであり、基本データ群ＤＡを格納しているハードディスク等の記憶装置である。 The basic data group storage unit 102 has the same function as the analysis target data storage unit 2 in the data analyzer according to the second embodiment, and is a storage device such as a hard disk that stores the basic data group DA.

文字−数値データ変換部１は、実施の形態２と同様、数値でない入力属性と出力属性とで構成されるデータの集合である基本データ群ＤＡに対し、数値属性である入力属性ｘ１，ｘ２，ｘ３，ｘ４と出力属性ｙとで構成されるデータの集合である基本データ群ＤＡ０が得られるように、入力属性を数値に変換する処理を行うものである。 As in the second embodiment, the character-numerical data conversion unit 1 converts input attributes x1, x2, which are numerical attributes, into a basic data group DA, which is a set of data composed of non-numeric input attributes and output attributes. The input attribute is converted into a numerical value so that a basic data group DA0, which is a set of data composed of x3, x4 and an output attribute y, is obtained.

分類条件設定部１０３は、実施の形態２のデータ分析装置における閾値設定部３に代わるものであり、出力属性ｙの閾値ｙ_thに代えて出力属性ｙの分類条件を設定する。 The classification condition setting unit 103 replaces the threshold value setting unit 3 in the data analysis device according to the second embodiment, and sets a classification condition of the output attribute y instead of the threshold value y _th of the output attribute y.

データ分類部１０４、分類後基本データ群格納部１０５、およびデータ行分離部１０７は、実施の形態２のデータ分析装置におけるデータ分類部４に代わるものである。 The data classifying unit 104, the classified basic data group storage unit 105, and the data line separating unit 107 replace the data classifying unit 4 in the data analyzer according to the second embodiment.

データ分類部１０４は、分類条件設定部１０３で設定された分類条件に基づいて、基本データ群ＤＡ０を、出力属性ｙの値に依って、良品の第１データ群ＤＡ１と不良品の第２データ群ＤＡ２とに分類し、その分類結果を表す分類フラグを各データに付与するものである。 Based on the classification condition set by the classification condition setting unit 103, the data classification unit 104 converts the basic data group DA0 into a non-defective first data group DA1 and a defective second data group according to the value of the output attribute y. The data is classified into the group DA2, and a classification flag indicating the classification result is added to each data.

分類後基本データ群格納部１０５は、分類フラグが付与された基本データ群（ＤＡ００）を格納するハードディスク等の記憶装置である。 The classified basic data group storage unit 105 is a storage device such as a hard disk for storing a basic data group (DA00) to which a classification flag has been added.

分析データ群抽出部１０６は、分類後基本データ群ＤＡ００の中で分析対象とする分析データ群ＤＡ００’を抽出するものである。分析データ群抽出部１０６は、データ分割部１１５で分割されたデータ群のうち、他データ群を次の分析データ群ＤＡ００’（新たな分析データ群）として抽出する。なお、分析データ群抽出部１０６は、データ分割部１１５で分割されたデータ群の全てを次の分析データ群ＤＡ００’として抽出してもよい。 The analysis data group extraction unit 106 extracts an analysis data group DA00 'to be analyzed from the classified basic data group DA00. The analysis data group extraction unit 106 extracts another data group from the data group divided by the data division unit 115 as the next analysis data group DA00 '(new analysis data group). Note that the analysis data group extraction unit 106 may extract all of the data groups divided by the data division unit 115 as the next analysis data group DA00 '.

データ行分離部１０７は、分析データ群ＤＡ００’のデータを各々の分類フラグに基づいて２分化し、良品の第１データ群ＤＡ１および不良品の第２データ群ＤＡ２を抽出するものである。 The data line separation unit 107 divides the data of the analysis data group DA00 'into two based on the respective classification flags, and extracts a first data group DA1 of non-defective products and a second data group DA2 of defective products.

データ列抽出部５は、良品の第１データ群ＤＡ１から、入力属性ｘｊの各々のデータ列である１−ｘｊデータ群を抽出し、また、不良品の第２データ群ＤＡ２から、入力属性ｘｊの各々のデータ列である２−ｘｊデータ群を抽出するものである。 The data string extraction unit 5 extracts a 1-xj data group, which is each data string of the input attribute xj, from the non-defective first data group DA1, and also extracts an input attribute xj from the defective second data group DA2. Is a data string of 2-xj data groups.

頻度演算部６および頻度累積差演算部７は、実施の形態２のデータ分析装置における頻度演算部６および頻度累積差演算部７と同一の機能を有するものである。 The frequency calculating section 6 and the frequency cumulative difference calculating section 7 have the same functions as the frequency calculating section 6 and the frequency cumulative difference calculating section 7 in the data analyzer of the second embodiment.

頻度演算部６は、データ列抽出部５で抽出された１−ｘｊデータ群および２−ｘｊデータ群を用い、入力属性ｘｊの個々の数値について、良品の第１データ群中において、入力属性ｘｊがその数値以下であるデータ個数の割合である第１の頻度（１−ｘｊ頻度累積％）と、不良品の第２データ群中において、入力属性ｘｊがその数値以下であるデータ個数の割合である第２の頻度（２−ｘｊ頻度累積％）とを計算するものである。 The frequency calculation unit 6 uses the 1-xj data group and the 2-xj data group extracted by the data string extraction unit 5 to determine, for each numerical value of the input attribute xj, the input attribute xj in the non-defective first data group. Is the ratio of the number of data whose number is equal to or less than the numerical value, and the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the second data group of defective products. This is to calculate a certain second frequency (2-xj frequency cumulative%).

頻度累積差演算部７は、入力属性ｘｊの各値に対して、１−ｘｊ頻度累積％と２−ｘｊ頻度累積％との差分を表す、ｘｊ頻度累積差％を計算するものである。 The frequency cumulative difference calculator 7 calculates an xj frequency cumulative difference% representing a difference between 1-xj frequency cumulative% and 2-xj frequency cumulative% for each value of the input attribute xj.

閾値決定部１３０は、実施の形態２のデータ分析装置における入力属性閾値決定部８と同一の機能を有するものであり、各入力属性ｘｊについて、それぞれ、入力属性ｘｊの個々の値に対するｘｊ頻度累積差％の中で、その値が最大となる入力属性ｘｊの値を、第１データ群ＤＡ１と第２データ群ＤＡ２とがそれぞれまとまるように分析データ群ＤＡ００’を２分化するための入力属性ｘｊの閾値ｘｊ−ｔｈとして決定するものである。 The threshold determining unit 130 has the same function as the input attribute threshold determining unit 8 in the data analyzer of the second embodiment. For each input attribute xj, the xj frequency accumulation for each value of the input attribute xj is performed. The value of the input attribute xj having the maximum value in the difference% is defined as an input attribute xj for dividing the analysis data group DA00 'into two so that the first data group DA1 and the second data group DA2 are combined. Is determined as the threshold value xj-th.

極性判定部１３１は、閾値決定部１３０で決定された閾値ｘｊ−ｔｈにおける、第１の頻度（１−ｘｊ頻度累積％）と第２の頻度（２−ｘｊ頻度累積％）との大小関係を判定するものである。 The polarity determining unit 131 determines the magnitude relationship between the first frequency (1-xj frequency cumulative%) and the second frequency (2-xj frequency cumulative%) in the threshold value xj-th determined by the threshold value determining unit 130. It is to judge.

入力属性条件決定部１１１は、分析データ群ＤＡ００’中で入力属性条件を満たすデータ群に不良品の第２データ群ＤＡ２がまとまり、入力属性条件を満たさないデータ群に良品の第１データ群ＤＡ１がまとまるように、極性判定部１３１により第１の頻度が第２の頻度より大きいと判定された場合には、入力属性条件を「入力属性ｘｊが閾値ｘｊ−ｔｈを超える」という条件に決定し、極性判定部１３１により第２の頻度が第１の頻度より大きいと判定された場合には、入力属性条件を「入力属性ｘｊが閾値ｘｊ−ｔｈ以下」という条件に決定するものである。 The input attribute condition determination unit 111 determines that the data group satisfying the input attribute condition in the analysis data group DA00 'is a second data group DA2 of defective products, and the data group not satisfying the input attribute condition is a first data group DA1 of non-defective products. When the polarity determining unit 131 determines that the first frequency is greater than the second frequency, the input attribute condition is determined to be "the input attribute xj exceeds the threshold value xj-th". If the polarity determining unit 131 determines that the second frequency is greater than the first frequency, the input attribute condition is determined to be "the input attribute xj is equal to or less than the threshold value xj-th".

不良品分離度演算部１１２は、入力属性条件決定部１１１で決定された入力属性条件の各々について、分類後基本データ群ＤＡ００中で該入力属性条件を満たすデータの中に不良品の第２データ群ＤＡ２が含まれる割合を表す不良品分離度を演算するものである。 For each of the input attribute conditions determined by the input attribute condition determination unit 111, the defective product separation degree calculation unit 112 includes the second data of the defective product in the data satisfying the input attribute condition in the classified basic data group DA00. This is for calculating the degree of rejection of defective products that represents the ratio of the group DA2.

第１の要因抽出部１０９は、入力属性条件決定部１１１で決定された入力属性条件の中で、分類後基本データ群ＤＡ００中の第２データ群ＤＡ２の割合を表す不良品含有率よりも大きい値の不良品分離度をもつ入力属性条件を、不良品の第２データ群ＤＡ２に対応する出力属性条件の要因を示す情報として抽出するものである。 The first factor extraction unit 109 is larger than the defective content rate indicating the ratio of the second data group DA2 in the classified basic data group DA00 in the input attribute condition determined by the input attribute condition determination unit 111. An input attribute condition having a value of defective product separation is extracted as information indicating a factor of an output attribute condition corresponding to the second data group DA2 of defective products.

頻度累積比率演算部１６は、実施の形態２と同様、入力属性条件決定部１１１で決定された入力属性条件の各々について、ルール評価値を演算するものである。ただし、本実施形態の頻度累積比率演算部１６は、実施の形態２のように２種類のルール評価値を計算するのではなく、頻度累積下比率または頻度累積上比率（後述）を、「入力属性が入力属性条件を満たせば、分析データ群ＤＡ００’中の第２データ群ＤＡ２に含まれるデータである」という相関ルールの確からしさを表す分割ルール評価値として計算するものである。 As in the second embodiment, the frequency accumulation ratio calculation unit 16 calculates a rule evaluation value for each of the input attribute conditions determined by the input attribute condition determination unit 111. However, the frequency accumulation ratio calculation unit 16 of the present embodiment does not calculate two types of rule evaluation values as in the second embodiment, but inputs the lower frequency accumulation ratio or the upper frequency accumulation ratio (described later) into “input”. If the attribute satisfies the input attribute condition, the data is included in the second data group DA2 in the analysis data group DA00 '. "

データ分割部１１５は、実施の形態２のデータ分析装置における要因未発見データ抽出部１０に対応するものであり、入力属性条件決定部１１１で決定された入力属性条件の中から、上記分割ルール評価値の値が最大となる入力属性条件を抽出し、分析データ群ＤＡ００’を、該入力属性条件を満たす要因データ群と該入力属性条件を満たさない他データ群とに分割するものである。 The data division unit 115 corresponds to the factor undiscovered data extraction unit 10 in the data analysis device according to the second embodiment. The data division unit 115 evaluates the division rule evaluation from the input attribute conditions determined by the input attribute condition determination unit 111. An input attribute condition having a maximum value is extracted, and the analysis data group DA00 'is divided into a factor data group satisfying the input attribute condition and another data group not satisfying the input attribute condition.

要因決定部１１７は、第１の要因抽出部１０９の繰り返しの処理で抽出された、同一の入力属性に関する複数の入力属性条件のうちで、優先度の高い入力属性条件のみを選択するものである。 The factor determining unit 117 selects only a high-priority input attribute condition from among a plurality of input attribute conditions relating to the same input attribute extracted in the repetitive processing of the first factor extracting unit 109. .

複合要因不良数計算部１１８は、要因決定部１１７で選択された入力属性条件のうちの、２つの条件の複合要因による不良数を計算するものである。 The complex factor failure number calculation unit 118 calculates the number of failures due to the complex factor of two of the input attribute conditions selected by the factor determination unit 117.

数値−文字データ変換部１１９は、決定された要因を表す情報、例えば後述する決定要因一覧テーブルや複合要因テーブルにおける入力属性閾値ｘｊ−ｔｈの数値を文字データに変換するものである。 The numerical value-character data conversion unit 119 converts information indicating the determined factor, for example, the numerical value of the input attribute threshold value xj-th in a determining factor list table or a composite factor table described later into character data.

次に、表１のデータ群ＤＡを基本データ群とする場合を例として、本実施形態のデータ分析方法および入力属性条件決定方法を図１４および図１５に基づいて説明する。なお、図１４は、本実施形態のデータ分析方法を示すフローチャートであり、図１５は、図１４のステップ１０７の処理（後述する）に対応する本実施形態の入力属性条件決定方法を示すフローチャートである。 Next, a data analysis method and an input attribute condition determination method according to the present embodiment will be described with reference to FIGS. 14 and 15, taking a case where the data group DA in Table 1 is used as a basic data group. FIG. 14 is a flowchart illustrating the data analysis method of the present embodiment, and FIG. 15 is a flowchart illustrating the input attribute condition determining method of the present embodiment corresponding to the processing (described later) of step 107 in FIG. is there.

表１の基本データ群ＤＡは、ハードディスク等の基本データ群格納部１０２に格納されており、１〜１２のｉｄ（識別子）を持つ１２個のデータから構成されている。表１において、ｘ１，ｘ２，ｘ３，ｘ４は入力属性である。入力属性ｘ１は４つの文字Ａ，Ｂ，Ｃ，Ｄのいずれかをとる文字属性である。入力属性ｘ２は４つの文字ａ，ｂ，ｃ，ｄのいずれかをとる文字属性である。入力属性ｘ３は４つの離散値１，２，３，４のいずれかをとる離散属性である。入力属性ｘ４は４つの離散値１０，２０，３０，４０のいずれかをとる離散属性である。なお、入力属性は、文字属性、離散の数値属性、連続の数値属性の何れでもよい。 The basic data group DA in Table 1 is stored in the basic data group storage unit 102 such as a hard disk, and is composed of twelve data having ids (identifiers) of 1 to 12. In Table 1, x1, x2, x3, and x4 are input attributes. The input attribute x1 is a character attribute that takes one of the four characters A, B, C, and D. The input attribute x2 is a character attribute that takes one of the four characters a, b, c, and d. The input attribute x3 is a discrete attribute that takes one of four discrete values 1, 2, 3, and 4. The input attribute x4 is a discrete attribute that takes one of four discrete values 10, 20, 30, and 40. The input attribute may be any of a character attribute, a discrete numerical attribute, and a continuous numerical attribute.

また、表１において、ｙは出力属性である。出力属性は、文字属性、離散の数値属性、連続の数値属性の何れでもよいが、ここでは、３つの文字Ｘ，Ｙ，Ｚのいずれかをとる文字属性である。 In Table 1, y is an output attribute. The output attribute may be any of a character attribute, a discrete numeric attribute, and a continuous numeric attribute. In this case, the output attribute is one of three characters X, Y, and Z.

本実施形態のデータ分析方法では、ｙ＝Ｙなる場合を問題事象として、出力属性ｙがＹとなる要因を分析する。 In the data analysis method according to the present embodiment, the case where y = Y is regarded as a problem event, and the cause of the output attribute y being Y is analyzed.

なお、基本データ群ＤＡの例としては、例えば、入力属性が、製品の製造工程における製造プロセス条件および／またはインライン検査結果（製造ライン途中での検査結果）、出力属性が製品の品質判定結果、ｙ＝Ｙなる問題事象が品質判定結果の不良であるデータが挙げられる。この場合、本実施形態のデータ分析方法により入力属性と出力属性との因果関係を分析し、ｙ＝Ｙなる問題事象の要因を導き出すことで、不良品の発生を解消する対策を容易に図ることが可能となる。したがって、歩留まりの向上等のような製造プロセスの改善を容易に図ることが可能となる。 As an example of the basic data group DA, for example, the input attribute is a manufacturing process condition and / or an in-line inspection result (inspection result in the middle of a manufacturing line) in a product manufacturing process, the output attribute is a product quality determination result, Data in which the problem event of y = Y is a defect in the quality determination result. In this case, the causal relationship between the input attribute and the output attribute is analyzed by the data analysis method of the present embodiment, and the cause of the problem event y = Y is derived to easily take a measure for eliminating the occurrence of defective products. Becomes possible. Therefore, it is possible to easily improve a manufacturing process such as an improvement in yield.

基本データ群ＤＡのより具体的な例としては、例えば、入力属性ｘ１、ｘ２、ｘ３、ｘ４が、プラズマＣＶＤプロセスの、ガス流量、ガス圧力、投入電力、成膜時間などのプロセスデータで、出力属性ｙが、形成される薄膜の膜厚であるようなデータが挙げられる。また、これら入力属性および出力属性の値は、連続の数値属性、離散の数値属性、文字属性の何れであってもよい。文字属性の場合には、例えば、出力属性が膜厚の例で、‘大’、‘中’、‘小’といった具合に表現される。 As a more specific example of the basic data group DA, for example, the input attributes x1, x2, x3, and x4 are output as process data such as gas flow rate, gas pressure, input power, and film formation time of the plasma CVD process. There is data such that the attribute y is the thickness of the thin film to be formed. The values of the input attribute and the output attribute may be any of a continuous numeric attribute, a discrete numeric attribute, and a character attribute. In the case of the character attribute, for example, the output attribute is an example of the film thickness, and is expressed as “large”, “medium”, “small”.

［ステップ１００］
まず、文字−数値データ変換部１が、ハードディスク等の基本データ群格納部１０２に格納された表１の基本データ群ＤＡにおける文字属性を、下記の変換ルールに従って数値属性（数値データ）に変換する（Ｓ１００）。このステップ１００での処理は、実施の形態２のデータ分析方法におけるステップ０での処理と同様である。
（ｘ１）Ａ→１、Ｂ→２、Ｃ→３、Ｄ→４
（ｘ２）ａ→１、ｂ→２、ｃ→３、ｄ→４
（ｘ３）変換せず
（ｘ４）変換せず
（ｙ）Ｘ→１、Ｙ→２、Ｚ→３
なお、基本データ群ＤＡの入力属性および出力属性が、元々数値属性である場合には、この処理は省略される。したがって、基本データ群ＤＡの入力属性および出力属性が元々数値属性である場合には、文字−数値データ変換部１は省略可能である。 [Step 100]
First, the character-numerical data conversion unit 1 converts character attributes in the basic data group DA of Table 1 stored in the basic data group storage unit 102 such as a hard disk into numerical attributes (numerical data) according to the following conversion rules. (S100). The processing in step 100 is the same as the processing in step 0 in the data analysis method according to the second embodiment.
(X1) A → 1, B → 2, C → 3, D → 4
(X2) a → 1, b → 2, c → 3, d → 4
(X3) No conversion (x4) No conversion (y) X → 1, Y → 2, Z → 3
If the input attribute and the output attribute of the basic data group DA are originally numeric attributes, this processing is omitted. Therefore, when the input attribute and the output attribute of the basic data group DA are originally numeric attributes, the character-numerical data conversion unit 1 can be omitted.

上記処理により、各データは、数値データに変換される。そして、文字−数値データ変換部１は、変換されたデータ群ＤＡ０をデータ分類部１０４に送る。 By the above processing, each data is converted into numerical data. Then, the character / numerical data conversion unit 1 sends the converted data group DA0 to the data classification unit 104.

ここで、上記変換ルールは、可能な限り、変換後の入力属性の数値が大きいほど出力属性の数値が大きくなるように、あるいはその逆順となるように設定される事が好ましいが、一義性さえあればよく上記の例に限らない。上記変換ルールにて数値データに変換されたデータ群ＤＡ０は、表２に示す通りである。 Here, it is preferable that the conversion rules are set so that the numerical value of the output attribute increases as the numerical value of the converted input attribute increases, or the reverse order, as much as possible. The present invention is not limited to the above example. The data group DA0 converted into numerical data by the above conversion rule is as shown in Table 2.

この変換により得られたデータ群ＤＡ０は、数値属性からなる、複数の入力属性（説明属性）と出力属性（目的属性）とで構成されるデータの集合となる。以下、データ群ＤＡ０も基本データ群と呼ぶ事にする。 The data group DA0 obtained by this conversion is a set of data composed of a plurality of input attributes (description attributes) and output attributes (object attributes), which are numerical attributes. Hereinafter, the data group DA0 is also referred to as a basic data group.

［ステップ１０１］
分類条件設定部１０３は、予め定められた設定情報に従って、あるいは使用者が図示しないキーボードやマウス等の入力部から問題事象の属性値ｙ＝Ｙを入力したことに応答して、基本データ群ＤＡのｙ＝Ｙなる問題事象に対応する基本データ群ＤＡ０の出力属性ｙの分類条件を設定し、データ分類部１０４に出力する（Ｓ１０１）。この例においては、基本データ群ＤＡのｙ＝Ｙなる問題事象に対応する基本データ群ＤＡ０の出力属性ｙの分類条件は、ｙ＝２である。このステップ１０１での処理は、出力属性ｙの閾値ｙ_ｔｈではなく出力属性ｙの分類条件を設定する点以外は、実施の形態２のデータ分析方法におけるステップ１での処理と同様である。 [Step 101]
The classification condition setting unit 103 transmits the basic data group DA according to predetermined setting information or in response to the user inputting the attribute value y = Y of the problem event from an input unit such as a keyboard or a mouse (not shown). The classification condition of the output attribute y of the basic data group DA0 corresponding to the problem event y = Y is set and output to the data classification unit 104 (S101). In this example, the classification condition of the output attribute y of the basic data group DA0 corresponding to the problem event y = Y of the basic data group DA is y = 2. Processing at step 101, except that it sets the classification condition threshold y _th instead output attribute y of output attribute y, is the same as the process in step 1 in the data analysis method of the second embodiment.

［ステップ１０２］
次に、データ分類部１０４が、基本データ群ＤＡ０の出力属性ｙの値と、分類条件設定部１０３から出力された分類条件（下記の比較論理（１）（２））とに基づいて、基本データ群ＤＡ０を、第１データ群ＤＡ１と第２データ群ＤＡ２とに分類する（Ｓ１０２）。 [Step 102]
Next, based on the value of the output attribute y of the basic data group DA0 and the classification condition (comparison logic (1) and (2) described below) output from the classification condition setting unit 103, the data classification unit 104 The data group DA0 is classified into a first data group DA1 and a second data group DA2 (S102).

（１）ｙ≠２→ＤＡ１
（２）ｙ＝２→ＤＡ２
この場合、比較論理（１）、すなわち「ｙ≠２」が、基本データ群ＤＡのｙ≠Ｙなる事象（問題事象でない事象；以下、「非問題事象」と称する）に対応する分類条件であり、比較論理（２）、すなわち「ｙ＝２」が、基本データ群ＤＡのｙ＝Ｙなる問題事象（以下、単に「問題事象」と称する）に対応する分類条件である。 (1) y ≠ 2 → DA1
(2) y = 2 → DA2
In this case, the comparison logic (1), that is, “y ≠ 2” is a classification condition corresponding to an event y ≠ Y of the basic data group DA (an event that is not a problem event; hereinafter, referred to as a “non-problem event”). , Comparison logic (2), that is, “y = 2” is a classification condition corresponding to a problem event of y = Y in the basic data group DA (hereinafter, simply referred to as “problem event”).

そして、表３に示すように、各々のデータ群に対応する分類フラグ（「ＤＡ１」または「ＤＡ２」）を付与する（Ｓ１０２）。以下、表３のデータ群を、分類後基本データ群ＤＡ００と呼ぶ。 Then, as shown in Table 3, a classification flag ("DA1" or "DA2") corresponding to each data group is given (S102). Hereinafter, the data group of Table 3 is referred to as a classified basic data group DA00.

分類後基本データ群ＤＡ００は、ハードディスク等の分類後基本データ群格納部１０５に保存される。 The classified basic data group DA00 is stored in the classified basic data group storage unit 105 such as a hard disk.

ここで、第２データ群ＤＡ２は問題事象（例えば、デバイス特性不良など）を表すデータ群である。すなわち、第２データ群ＤＡ２は出力属性ｙが問題事象を表す属性値（２）であるデータ群であり、第１データ群ＤＡ１は出力属性ｙが問題事象を表していない属性値（１または３）であるデータ群である。 Here, the second data group DA2 is a data group representing a problem event (for example, a device characteristic failure or the like). That is, the second data group DA2 is a data group in which the output attribute y is the attribute value (2) representing the problem event, and the first data group DA1 is the attribute value (1 or 3) in which the output attribute y does not represent the problem event. ).

このステップ１０２での処理は、基本データ群ＤＡ０を第１データ群ＤＡ１と第２データ群ＤＡ２とに分類した後に、基本データ群ＤＡ０を第１データ群ＤＡ１と第２データ群ＤＡ２とに２分化するのではなく、第１データ群ＤＡ１および第２データ群ＤＡ２に対応する分類フラグを付与する点以外は、実施の形態２のデータ分析方法におけるステップ２での処理と同様である。 In the process in step 102, after the basic data group DA0 is classified into a first data group DA1 and a second data group DA2, the basic data group DA0 is divided into a first data group DA1 and a second data group DA2. Rather than performing the processing, the processing is the same as the processing in step 2 in the data analysis method according to the second embodiment, except that classification flags corresponding to the first data group DA1 and the second data group DA2 are added.

なお、データ分類部１０４による分類は、上記論理に限らず、出力属性閾値ｙ_thに基づく論理、例えば下記のような論理（１’）（２’）に基づいて行ってもよい。 Note that the classification by the data classification unit 104 is not limited to the above logic, and may be performed based on the logic based on the output attribute threshold value y _th , for example, the following logics (1 ′) and (2 ′).

（１’）ｙ＞ｙ_th→ＤＡ１
（２’）ｙ≦ｙ_th→ＤＡ２
この場合、比較論理（１’）、すなわち「ｙ＞ｙ_th」が、非問題事象に対応する分類条件であり、比較論理（２’）、すなわち「ｙ≦ｙ_th」が、問題事象に対応する分類条件である。 (1 ') y> y _th → DA1
(2 ′) y ≦ y _th → DA2
In this case, the comparison logic (1 ′), that is, “y> y _th ” is a classification condition corresponding to the non-problem event, and the comparison logic (2 ′), “y ≦ y _th ” corresponds to the problem event. Classification condition.

また、データ分類部１０４による分類は、複数の条件の論理和または論理積に基づく論理、例えば下記のような論理（１’’）（２’’）に基づいて行ってもよい。 The classification by the data classification unit 104 may be performed based on the logic based on the logical sum or the logical product of a plurality of conditions, for example, the following logic (1 '') (2 '').

（１’’）ｙ_th１＜ｙ≦ｙ_th２→ＤＡ１
（ｙ_th１＜ｙＡＮＤｙ≦ｙ_th２）
ここで、ｙ_th１，ｙ_th２は、ｙ_th１＜ｙ_th２を満たす出力属性閾値である
（２’’）ｙ≦ｙ_th１ＯＲｙ＞ｙ_th２→ＤＡ２
この場合、比較論理（１’’）、すなわち「ｙ_th１＜ｙ≦ｙ_th２」が、非問題事象に対応する分類条件であり、比較論理（２’’）、すなわち「ｙ≦ｙ_th１ＯＲｙ＞ｙ_th２」が、問題事象に対応する分類条件である。 (1 '') y _th1 <y ≦ y _th2 → DA1
(Y _th1 <y AND y ≦ y _th2 )
Here, y _{th1 and} y _th2 are output attribute threshold values satisfying y _th1 <y _th2 (2 ″) y ≦ y _th1 OR y> y _th2 → DA2
In this case, the comparison logic (1 ″), that is, “y _th1 <y ≦ y _th2 ” is the classification condition corresponding to the non-problem event, and the comparison logic (2 ″), that is, “y ≦ y _th1 OR y > Y _th2 ”is a classification condition corresponding to the problem event.

さらに、基本データ群ＤＡ０が複数の出力属性（例えば、複数種類の検査の結果）を含む場合、データ分類部１０４による分類は、各々の出力属性ｙ_１，ｙ_２に対して分類条件設定部１０３で設定された複数の分類条件の論理和または論理積に基づく論理、例えば、下記のような論理（１’’’）（２’’’）に基づいて行ってもよい。 Further, if the base data group DA0 comprises a plurality of output attributes (for example, the result of a plurality of types of inspection), classified by the data classifying section 104 classifies for each output attribute y _1, y ₂ condition setting unit 103 May be performed based on the logic based on the logical sum or the logical product of the plurality of classification conditions set in the above, for example, the following logics (1 ′ ″) and (2 ′ ″).

（１’’’）ｙ_１≦ｙ_th１ＯＲｙ_２＞ｙ_th２→ＤＡ１
（２’’’）ｙ_１＞ｙ_th１ＡＮＤｙ_２≦ｙ_th２→ＤＡ２
この場合、比較論理（１’’）、すなわち「ｙ_１≦ｙ_th１ＯＲｙ_２＞ｙ_th２」が、非問題事象に対応する分類条件であり、比較論理（２’’）、すなわち「ｙ_１＞ｙ_th１ＡＮＤｙ_２≦ｙ_th２」が、問題事象に対応する分類条件である。 (1 ′ ″) y ₁ ≦ y _th1 OR y ₂ > y _th2 → DA1
(2 ′ ″) y ₁ > y _th1 AND y ₂ ≦ y _th2 → DA2
In this case, the comparison logic (1 ″), that is, “y ₁ ≦ y _th1 OR y ₂ > y _th2 ” is the classification condition corresponding to the non-problem event, and the comparison logic (2 ″), that is, “y ₁ > Y _th1 AND y ₂ ≦ y _th2 ”is a classification condition corresponding to the problem event.

また、複数の出力属性が存在する場合、データ分類部１０４による分類は、複数の出力属性ｙ_１，ｙ_２から選択した１つの出力属性ｙに対する論理、例えば、前記論理（１）（２）、前記論理（１’）（２’）、前記論理（１’’）（２’’）等に基づいて行ってもよい。 When there are a plurality of output attributes, the classification by the data classifying unit 104 is based on the logic for one output attribute y selected from the plurality of output attributes y ₁ and y ₂ , for example, the logic (1) (2), It may be performed based on the logic (1 ′) (2 ′), the logic (1 ″) (2 ″), or the like.

［ステップ１０３〜１０５］
分析データ群抽出部１０６が、分類後基本データ群ＤＡ００の中で分析対象とする分析データ群ＤＡ００’を抽出し、データ行分離部１０７に送る。 [Steps 103 to 105]
The analysis data group extraction unit 106 extracts an analysis data group DA00 ′ to be analyzed from the classified basic data group DA00, and sends it to the data row separation unit 107.

この１回目の処理では、分類後基本データ群ＤＡ００と同一のデータが分析データ群ＤＡ００’として抽出される（Ｓ１０５）が、後述する繰り返し処理の過程（２回目以降の処理）では、データ分割部１１５が出力する他データ群が抽出される（Ｓ１０４）。すなわち、ステップ１０３〜１０５では、分析データ群抽出部１０６が、１回目の処理である（それまでに分析データ群ＤＡ００’の抽出が行われていない）か否かを判断し（Ｓ１０３）、その結果に基づいて、１回目の処理である場合には分類後基本データ群ＤＡ００を、１回目の処理でない場合には他データ群をそれぞれ分析データ群ＤＡ００’として抽出する（Ｓ１０４・Ｓ１０５）。 In the first processing, the same data as the post-classification basic data group DA00 is extracted as the analysis data group DA00 '(S105). However, in the later-described repetitive processing (second and subsequent processing), the data dividing unit The other data group output by 115 is extracted (S104). That is, in steps 103 to 105, the analysis data group extraction unit 106 determines whether or not the processing is the first processing (the extraction of the analysis data group DA00 'has not been performed so far) (S103). Based on the result, if it is the first processing, the classified basic data group DA00 is extracted, and if it is not the first processing, other data groups are extracted as the analysis data group DA00 '(S104 and S105).

［ステップ１０６］
データ行分離部１０７が、分析データ群ＤＡ００’（１回目の処理においては、分類後基本データ群ＤＡ００：表３）における、第１データ群ＤＡ１および第２データ群ＤＡ２の各々の分類フラグに基づいて、分析データ群ＤＡ００’を２分化し、各々のデータ群、すなわち、第１データ群ＤＡ１および第２データ群ＤＡ２を抽出する（Ｓ１０６）。データ行分離部１０７から出力される第１データ群ＤＡ１を表４に、第２データ群ＤＡ２を表５に示す。 [Step 106]
The data row separation unit 107 determines the classification flag of each of the first data group DA1 and the second data group DA2 in the analysis data group DA00 ′ (in the first processing, the classified basic data group DA00: Table 3). Then, the analysis data group DA00 'is divided into two, and each data group, that is, the first data group DA1 and the second data group DA2 is extracted (S106). Table 4 shows the first data group DA1 output from the data row separation unit 107, and Table 5 shows the second data group DA2.

なお、以下では、適宜、第１データ群ＤＡ１を良品（ＯＫ品）データ群、第２データ群ＤＡ２を不良品（ＮＧ品）データ群と呼ぶ事にする。 Hereinafter, the first data group DA1 is referred to as a non-defective (OK product) data group and the second data group DA2 is referred to as a defective (NG product) data group.

以上のステップ１０２〜１０５は、実施の形態２のデータ分析方法におけるステップ２に代わるものである。 The above steps 102 to 105 replace step 2 in the data analysis method according to the second embodiment.

［ステップ１０７］
次に、入力属性条件決定装置１００Ａが、良品データ群ＤＡ１（第１データ群）と不良品データ群ＤＡ２（第２データ群）とがそれぞれまとまるように、具体的には、入力属性条件を満たすデータ群に不良品データ群ＤＡ２（第２データ群）がまとまり（偏り）、入力属性条件を満たさないデータ群に良品データ群ＤＡ１（第１データ群）がまとまる（偏る）ように、上記分析データ群ＤＡ００’を２分化するための入力属性の条件である、入力属性条件を決定する（Ｓ１０７）。入力属性条件を決定するステップＳ１０７は、図１５に示すように、ステップＳ２０３〜Ｓ２０８を含んでいる。 [Step 107]
Next, the input attribute condition determination device 100A specifically satisfies the input attribute condition so that the non-defective data group DA1 (first data group) and the defective data group DA2 (second data group) are combined. The analysis data described above is such that the defective data group DA2 (second data group) is grouped (skewed) in the data group, and the non-defective data group DA1 (first data group) is grouped (skewed) in the data group that does not satisfy the input attribute condition. An input attribute condition, which is a condition of an input attribute for dividing the group DA00 'into two, is determined (S107). Step S107 for determining the input attribute condition includes steps S203 to S208 as shown in FIG.

［ステップ２０３］
ステップ２０３では、まず、データ列抽出部５が、良品データ群ＤＡ１（表４）から、入力属性ｘｊ（１≦ｊ≦４）の各々のデータ列を抽出する（Ｓ２０３）。このデータ列を１−ｘｊデータ群と呼ぶ事にする。 [Step 203]
In step 203, first, the data string extraction unit 5 extracts each data string of the input attribute xj (1 ≦ j ≦ 4) from the non-defective data group DA1 (Table 4) (S203). This data sequence will be referred to as a 1-xj data group.

同様に、データ列抽出部５は、不良品データ群ＤＡ２（表５）からも、入力属性ｘｊ（１≦ｊ≦４）の各々のデータ列を抽出する（Ｓ２０３）。このデータ列を２−ｘｊデータ群と呼ぶ事にする。 Similarly, the data string extraction unit 5 extracts each data string of the input attribute xj (1 ≦ j ≦ 4) from the defective product data group DA2 (Table 5) (S203). This data string will be referred to as a 2-xj data group.

１−ｘｊデータ群を表６〜表９に、２−ｘｊデータ群を表１０〜表１３に示す。このステップ２０３での処理は、実施の形態２のデータ分析方法におけるステップ３での処理と同一である。 Tables 6 to 9 show 1-xj data groups, and Tables 10 to 13 show 2-xj data groups. The processing in step 203 is the same as the processing in step 3 in the data analysis method according to the second embodiment.

［ステップ２０４］
頻度演算部６は、ステップ２０３で良品データ群ＤＡ１から抽出された１−ｘｊデータ群の各々、およびステップ２０３で不良品データ群ＤＡ２から抽出された２−ｘｊデータ群の各々を、入力属性ｘｊの値で昇順に行を並べ替える（並べ替え処理１）。そして、入力属性ｘｊの個々の数値について、良品データ群ＤＡ１中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す１−ｘｊ頻度累積％（第１の頻度）と、不良品データ群ＤＡ２中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す２−ｘｊ頻度累積％（第２の頻度）とを計算する（Ｓ２０４）。 [Step 204]
The frequency calculation unit 6 determines each of the 1-xj data groups extracted from the non-defective data group DA1 in step 203 and each of the 2-xj data groups extracted from the defective data group DA2 in step 203 as the input attribute xj Are sorted in ascending order by the value of (Sort Processing 1). Then, for each numerical value of the input attribute xj, 1-xj frequency cumulative% (first frequency) representing the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the non-defective data group DA1, In the group DA2, 2-xj frequency cumulative% (second frequency) representing the ratio of the number of data whose input attribute xj is equal to or less than the numerical value is calculated (S204).

ここでは、表６〜表９のデータ群を入力属性ｘｊの値で昇順に並べ替えた表１４〜表１７を用い、各行（ｉｄ）のデータについて表中でそのデータの位置以上の位置にあるデータ個数の、第１データ群の全データ数（＝８）に対する割合を１−ｘｊ頻度累積％として計算している。同様に、表１０〜表１３を入力属性ｘｊの値で昇順に並べ替えた表１８〜表２１を用い、各行（ｉｄ）のデータについて表中でそのデータの位置以上の位置にあるデータ個数の、第２データ群の全データ数（＝４）に対する割合を２−ｘｊ頻度累積％として計算している。 Here, Tables 14 to 17 in which the data groups of Tables 6 to 9 are rearranged in ascending order by the value of the input attribute xj are used, and the data of each row (id) is located at a position higher than the position of the data in the table. The ratio of the number of data to the total number of data (= 8) in the first data group is calculated as 1-xj frequency cumulative%. Similarly, by using Tables 18 to 21 in which Tables 10 to 13 are rearranged in ascending order by the value of the input attribute xj, the data of each row (id) is determined by the number of data located at a position equal to or more than the position of the data in the table. , The ratio of the second data group to the total number of data (= 4) is calculated as 2-xj frequency cumulative%.

ここで計算した１−ｘｊ頻度累積％および２−ｘｊ頻度累積％の値を表１４〜表２１に示す。 Tables 14 to 21 show values of the 1-xj frequency cumulative% and the 2-xj frequency cumulative% calculated here.

さらに、頻度演算部６は、１−ｘｊ頻度累積％が計算された良品データ群である１−ｘｊデータ群のテーブルと、２−ｘｊ頻度累積％が計算された不良品データ群である２−ｘｊデータ群のテーブルとを結合する（結合処理）。具体的には、入力属性ｘ１について、表１４と表１８とを結合して表２２のｘ１頻度累積テーブルを、入力属性ｘ２について、表１５と表１９とを結合して表２３のｘ２頻度累積テーブルを、入力属性ｘ３について、表１６と表２０とを結合して表２４のｘ３頻度累積テーブルを、入力属性ｘ４について、表１７と表２１とを結合して表２５のｘ４頻度累積テーブルを、それぞれ作成する(Ｓ２０４)。 Further, the frequency calculation unit 6 includes a table of a 1-xj data group, which is a non-defective data group in which 1-xj frequency cumulative% is calculated, and a 2-, defective product data group in which 2-xj frequency cumulative% is calculated. Combine with the table of xj data group (combination processing). Specifically, for the input attribute x1, Table 14 and Table 18 are combined to form the x1 frequency accumulation table of Table 22, and for the input attribute x2, Table 15 and Table 19 are combined to form the x2 frequency accumulation table of Table 23. For the input attribute x3, Table 16 and Table 20 are combined to form the x3 frequency accumulation table of Table 24. For the input attribute x4, Table 17 and Table 21 are combined to form the x4 frequency accumulation table of Table 25. , Respectively (S204).

さらに、頻度演算部６は、表２２〜表２５の各々の頻度累積テーブルを、入力属性ｘｊの値で昇順に行を並べ替える（並べ替え処理２）。並べ替え処理２の後、１−ｘｊ頻度累積％および２−ｘｊ頻度累積％の空欄に、上の空欄から順に、その直上の値（１行上のデータの値）を代入する（代入処理）。その後、入力属性ｘｊにおいて同じ値が続いている行に対し、それらの行のうちの最終行のデータのみを採用する（重複処理）。こうして、表２６〜表２９に示すように、入力属性ｘｊの各値に対して、良品データ群である第１データ群ＤＡ１中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す１−ｘｊ頻度累積％（Ａ；第１の頻度）と、不良品データ群である第２データ群ＤＡ２中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す２−ｘｊ頻度累積％（Ｂ；第２の頻度）との双方が算出される（Ｓ２０４）。このステップ２０４での処理は、実施の形態２のデータ分析方法におけるステップ４での処理と同一である。 Further, the frequency calculation unit 6 rearranges the rows of the frequency accumulation tables in Tables 22 to 25 in ascending order by the value of the input attribute xj (rearrangement process 2). After the rearrangement process 2, the value immediately above (the value of the data on one line) is substituted into the blanks of 1-xj frequency cumulative% and 2-xj frequency cumulative% in order from the upper blank (substitution process). . After that, for the rows where the same value continues in the input attribute xj, only the data of the last row among those rows is adopted (duplication processing). Thus, as shown in Tables 26 to 29, for each value of the input attribute xj, the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the first data group DA1 which is a non-defective data group is shown. 1-xj frequency accumulation% (A; first frequency) and 2-xj frequency accumulation representing the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the second data group DA2 which is a defective data group. % (B; second frequency) is calculated (S204). The processing in step 204 is the same as the processing in step 4 in the data analysis method according to the second embodiment.

なお、上記ステップ２０３、２０４では、表２６〜表２９のｘｊ頻度累積テーブルを作成するのに、データ列抽出処理（表６〜表１３）→並べ替え処理１→１−ｘｊ頻度累積％および２−ｘｊ頻度累積％の計算処理（表１４〜表２１）→結合処理（表２２〜表２５）→並べ替え処理２→代入処理→重複処理（表２６〜表２９）を施していたが、これらの個別の処理を行わずに、直接的に、表２６〜表２９のｘｊ頻度累積テーブルを作成するように、計算しても構わない。 In the above steps 203 and 204, in order to create the xj frequency accumulation tables of Tables 26 to 29, data string extraction processing (Tables 6 to 13) → rearrangement processing 1 → 1-xj frequency accumulation% and 2 −xj frequency cumulative% calculation processing (Tables 14 to 21) → combination processing (Tables 22 to 25) → rearrangement processing 2 → substitution processing → duplication processing (Tables 26 to 29) The calculation may be performed such that the xj frequency accumulation tables of Tables 26 to 29 are directly created without performing the individual processing of.

［ステップ２０５］
次に、頻度累積差演算部７が、入力属性ｘｊの各値に対して、良品の１−ｘｊ頻度累積％（Ａ）と不良品の２−ｘｊ頻度累積％（Ｂ）との差分（＝｜Ａ−Ｂ｜）を計算する（Ｓ２０５）。この差分値を、ｘｊ頻度累積差％と呼ぶ。ｘｊ頻度累積差％の計算結果を表３０〜表３３に示す。 [Step 205]
Next, for each value of the input attribute xj, the frequency cumulative difference calculation unit 7 calculates the difference (= (x) between 1-xj frequency cumulative% (A) of non-defective product and 2-xj frequency cumulative% (B) of defective product (= | AB |) is calculated (S205). This difference value is called xj frequency cumulative difference%. Tables 30 to 33 show the calculation results of the xj frequency cumulative difference%.

また、入力属性ｘｊの値と、良品の１−ｘｊ頻度累積％（Ａ）、不良品の２−ｘｊ頻度累積％（Ｂ）、ｘｊ頻度累積差％｜Ａ−Ｂ｜との関係を図３〜図６に示す。 FIG. 3 shows the relationship between the value of the input attribute xj and the 1-xj frequency cumulative% (A) of the non-defective product, the 2-xj frequency cumulative% (B) of the defective product, and the xj frequency cumulative difference | A-B | 6 to FIG.

入力属性ｘｊの各数値に対するｘｊ頻度累積差％は、入力属性ｘｊがその数値以下の範囲と、入力属性ｘｊがその数値を超える範囲との２分化によって、良品の第１データ群ＤＡ１と不良品の第２データ群ＤＡ２とがうまく切り分けられているかどうかを表す閾値評価指標であり、Ｇｉｎｉインデックス法の改善度に相当するものである。 The xj frequency cumulative difference% with respect to each numerical value of the input attribute xj is obtained by dividing the first data group DA1 of non-defective products into the defective product by dividing the range into which the input attribute xj is less than the numerical value and the range where the input attribute xj exceeds the numerical value. This is a threshold evaluation index indicating whether or not the second data group DA2 is properly separated, and corresponds to the degree of improvement of the Gini index method.

すなわち、入力属性ｘｊの各数値におけるｘｊ頻度累積差％（＝｜Ａ−Ｂ｜）は、「入力属性ｘｊがその数値以下であれば不良品の第２データ群に属するデータであり、入力属性ｘｊがその数値を超えていれば良品の第１データ群に属するデータである」という相関ルール、または、「入力属性ｘｊがその数値を超えていれば不良品の第２データ群に属するデータであり、入力属性ｘｊがその数値以下であれば良品の第１データ群に属するデータである」という相関ルールの確からしさを表す。 That is, the xj frequency cumulative difference% (= | AB−) for each numerical value of the input attribute xj is “data belonging to the second data group of defective products if the input attribute xj is equal to or less than the numerical value. If xj exceeds the numerical value, the data belongs to the first data group of non-defective products ", or if the input attribute xj exceeds the numerical value, the data belongs to the second data group of defective products. And if the input attribute xj is equal to or less than the numerical value, the data belongs to the first data group of non-defective products. "

なお、本実施形態による、「入力属性ｘｊの値と、ｘｊ頻度累積差％｜Ａ−Ｂ｜との関係」（図３〜図６）は、Ｇｉｎｉインデックス法の「入力属性ｘｊの分岐条件と、改善度との関係」（図２０〜図２３）と、同様の傾向を示している。 Note that the “relationship between the value of the input attribute xj and the xj frequency cumulative difference% | A−B |” (FIGS. 3 to 6) according to the present embodiment is based on “the branch condition of the input attribute xj and the branch condition of the input attribute xj” in the Gini index method. , Relationship with improvement degree ”(FIGS. 20 to 23).

このステップ２０５での処理は、実施の形態２のデータ分析方法におけるステップ５での処理と同一である。 The processing in step 205 is the same as the processing in step 5 in the data analysis method according to the second embodiment.

［ステップ２０６〜２０８］
閾値決定部１３０が、各入力属性ｘｊについて、それぞれ、入力属性ｘｊの個々の値に対するｘｊ頻度累積差％の中で、その値が最大となる入力属性ｘｊの値を抽出する（Ｓ２０６）。このステップ２０６での処理は、実施の形態２のデータ分析方法におけるステップ６での処理と同一である。 [Steps 206 to 208]
The threshold determination unit 130 extracts the value of the input attribute xj having the maximum value among the xj frequency cumulative differences% with respect to the individual values of the input attribute xj for each input attribute xj (S206). The processing in step 206 is the same as the processing in step 6 in the data analysis method according to the second embodiment.

表３０〜表３３には、この抽出した値をグレーで示している。上記抽出された入力属性ｘｊの値を入力属性閾値ｘｊ−ｔｈと呼ぶことにする。入力属性閾値ｘｊ−ｔｈは、図３〜図６を参照して分かるように、ｘｊ≦ｘｊ−ｔｈの範囲と、ｘｊ＞ｘｊ−ｔｈの範囲との２分化によって、良品の第１データ群ＤＡ１と、不良品の第２データ群ＤＡ２との切分けが最も容易となる入力属性ｘｊの値を示している。 In Tables 30 to 33, the extracted values are shown in gray. The value of the extracted input attribute xj is referred to as an input attribute threshold xj-th. As can be seen with reference to FIGS. 3 to 6, the input attribute threshold value xj-th is obtained by dividing the non-defective first data group DA1 into two ranges: xj ≦ xj-th and xj> xj-th. And the value of the input attribute xj that makes it easy to separate the defective data from the second data group DA2.

次に、極性判定部１３１が、各入力属性ｘｊの閾値ｘｊ−ｔｈにおいて、良品の１−ｘｊ頻度累積％（Ａ）と、不良品の２−ｘｊ頻度累積％（Ｂ）との大小関係を判定する（Ｓ２０７）。表３０〜表３３には、閾値ｘｊ−ｔｈにおいて大きいと判定された方のｘｊ頻度累積のタイプ（１−ｘｊ頻度累積、または、２−ｘｊ頻度累積）を、併記している。 Next, the polarity determination unit 131 determines the magnitude relationship between the 1-xj frequency cumulative% (A) of the non-defective product and the 2-xj frequency cumulative% (B) of the defective product at the threshold value xj-th of each input attribute xj. A determination is made (S207). Tables 30 to 33 also show the type of xj frequency accumulation (1-xj frequency accumulation or 2-xj frequency accumulation) which is determined to be larger at the threshold value xj-th.

次に、閾値決定部１３０で決定（抽出）された閾値ｘｊ−ｔｈと、極性判定部１３１で大きいと判定されたｘｊ頻度累積のタイプとに基づいて、入力属性条件決定部１１１が、不良品の第２データ群ＤＡ２に対応する入力属性条件、すなわち「入力属性が入力属性条件を満たせば不良品の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、良品の第１データ群に属するデータである」という具体的な相関ルールを満たす入力属性条件を決定する（Ｓ２０８）。 Next, based on the threshold value xj-th determined (extracted) by the threshold value determination unit 130 and the type of xj frequency accumulation determined to be large by the polarity determination unit 131, the input attribute condition determination unit 111 , The input attribute condition corresponding to the second data group DA2, that is, “if the input attribute satisfies the input attribute condition, the data belongs to the second data group of the defective product. The input attribute condition that satisfies the specific correlation rule of “the data belongs to the first data group” is determined (S208).

なお、極性判定部１３１で大きいと判定された方のｘｊ頻度累積のタイプが、良品の１−ｘｊ頻度累積である場合には、「ｘｊ＞ｘｊ−ｔｈ」なる入力属性条件が不良品の第２データ群ＤＡ２に対応し、極性判定部１３１で大きいと判定された方のｘｊ頻度累積のタイプが、不良品の２−ｘｊ頻度累積である場合には、「ｘｊ≦ｘｊ−ｔｈ」なる入力属性条件が不良品の第２データ群ＤＡ２に対応する。したがって、入力属性条件決定部１１１は、極性判定部１３１により１−ｘｊ頻度累積％（Ａ）の方が大きいと判定された場合には、不良品の第２データ群ＤＡ２に対応する上記入力属性条件を「ｘｊ＞ｘｊ−ｔｈ（入力属性が閾値を超える）」という条件に決定し、極性判定部１３１により２−ｘｊ頻度累積％（Ｂ）の方が大きいと判定された場合には、不良品の第２データ群ＤＡ２に対応する上記入力属性条件を「ｘｊ≦ｘｊ−ｔｈ（入力属性が閾値以下）」という条件に決定する。 If the type of xj frequency accumulation determined to be larger by the polarity determining unit 131 is 1-xj frequency accumulation of a non-defective product, the input attribute condition of “xj> xj-th” is set to a value of a defective product. If the type of xj frequency accumulation corresponding to the two data group DA2 and determined to be larger by the polarity determination unit 131 is 2-xj frequency accumulation of defective products, the input “xj ≦ xj-th” is input. The attribute condition corresponds to the second data group DA2 of defective products. Therefore, if the polarity determining unit 131 determines that the 1-xj frequency cumulative% (A) is larger, the input attribute condition determining unit 111 determines that the input attribute corresponding to the defective second data group DA2 is present. If the condition is determined to be “xj> xj-th (input attribute exceeds threshold value)” and the polarity determining unit 131 determines that the 2-xj frequency cumulative% (B) is larger, the condition is not satisfied. The input attribute condition corresponding to the non-defective second data group DA2 is determined to be “xj ≦ xj-th (input attribute is less than or equal to a threshold)”.

上記のようにして、入力属性条件決定部１１１が、不良品の第２データ群ＤＡ２に対応する条件として決定した入力属性条件を表３４に示す。 Table 34 shows the input attribute conditions determined by the input attribute condition determining unit 111 as the conditions corresponding to the second data group DA2 of defective products as described above.

一例として、入力属性ｘ２については「ｘ２＞２」なる入力属性条件が決定されている。この入力属性条件は、良品の第１データ群ＤＡ１と分離して、不良品の第２データ群ＤＡ２を高い確度で検出できる条件を示している。また、決定された入力属性条件「ｘ２＞２」に対する排他的条件である「ｘ２≦２」は、不良品の第２データ群ＤＡ２と分離して、良品の第１データ群ＤＡ１を高い確度で検出できる条件を示している。これらのことは、図４を参照すると、より理解しやすい。 As an example, an input attribute condition of “x2> 2” is determined for the input attribute x2. This input attribute condition indicates a condition under which the second data group DA2 of a defective product can be detected with high accuracy separately from the first data group DA1 of a good product. The exclusive condition “x2 ≦ 2” with respect to the determined input attribute condition “x2> 2” is separated from the defective second data group DA2 and the non-defective first data group DA1 is accurately determined. It shows the conditions that can be detected. These can be better understood with reference to FIG.

なお、上記では、複数の入力属性についてステップ２０３〜ステップ２０８の処理を一括して行っているが、ｊの値を１から４まで順次増加させてステップ２０３〜ステップ２０８の処理を繰り返してもよい。 In the above description, the processes of steps 203 to 208 are collectively performed for a plurality of input attributes, but the values of j may be sequentially increased from 1 to 4 and the processes of steps 203 to 208 may be repeated. .

上記のステップ２０３〜ステップ２０８の処理が、特許請求の範囲における入力属性条件決定方法に相当する。この本実施形態の入力属性条件決定方法では、良品の１−ｘｊ頻度累積％と不良品の２−ｘｊ頻度累積％との差分である、ｘｊ頻度累積差％を計算するのみの非常に簡単な処理で、閾値評価指標（Ｇｉｎｉインデックス法の改善度に相当）を得ている。すなわち、Ｇｉｎｉインデックス法のように、入力属性が取り得る全ての分岐条件のパターン毎に、Ｇｉｎｉインデックス（（３）（４）式）や改善度（（６）式）を計算するような膨大な演算処理を行わずに、入力属性が取り得る値の数だけの行数のデータ（表３０〜表３３）に対して、頻度累積差を求める演算処理を行っているのみである。したがって、実際のデバイス（特に半導体デバイス）のような製品の製造工程のデータのように、一つの入力属性が取り得る値の数が数万〜数十万というオーダであっても、表３０〜表３３のデータ行数が増えるのみであるので、ほとんど計算負荷がかからず、短時間で処理を行うことができる。 The processing of steps 203 to 208 described above corresponds to the input attribute condition determination method in the claims. In the input attribute condition determination method of the present embodiment, it is very simple to calculate only the xj frequency cumulative difference%, which is the difference between the 1-xj frequency cumulative% of non-defective products and the 2-xj frequency cumulative% of defective products. In the process, a threshold evaluation index (corresponding to the degree of improvement of the Gini index method) is obtained. That is, as in the case of the Gini index method, a huge amount of calculation such as calculating the Gini index (Equation (3) and (4)) and the degree of improvement (Equation (6)) for every branch condition pattern that the input attribute can take. The arithmetic processing is not performed, but only the arithmetic processing for calculating the frequency cumulative difference is performed on the data of the number of rows (Tables 30 to 33) as many as the possible values of the input attribute. Therefore, even if the number of values that one input attribute can take is of the order of tens of thousands to hundreds of thousands, as in the data of a manufacturing process of a product such as an actual device (especially a semiconductor device), the values in Tables 30 to Since only the number of data rows in Table 33 increases, the processing can be performed in a short time with almost no calculation load.

また、入力属性ｘｊの各値に対する良品の１−ｘｊ頻度累積％、および、不良品の２−ｘｊ頻度累積％は、それぞれ、対応するデータ群中において入力属性がその数値以下であるデータ数を、そのデータ群中のデータ総数で規格化したものであるから、これらの差分であるｘｊ頻度累積差％（閾値評価指標）は、分析データ群中における良品（第１データ群）の割合と不良品（第２データ群）の割合とが極端に異なる場合であっても、その確度を落とすことがなく、良品と不良品とを切り分ける入力属性条件（各入力属性における最適分岐条件）を、高い確度で決定することができる。なお、表１のデータ群を用いた上記の例では、不良品の割合が４／１２、良品の割合が８／１２となっており、双方の間に極端な（桁違いの）相違がないので、本実施形態で抽出した不良品の条件（表３４）は、Ｇｉｎｉインデックス法で抽出した分岐条件（［発明が解決しようとする課題］に記載した条件）と一致している。 The 1-xj frequency cumulative% of non-defective products and the 2-xj frequency cumulative% of defective products with respect to each value of the input attribute xj represent the number of data whose input attribute is equal to or less than the numerical value in the corresponding data group. , Xj frequency cumulative difference% (threshold evaluation index), which is the difference between them, is the same as the ratio of non-defective (first data group) in the analysis data group. Even when the ratio of non-defective products (second data group) is extremely different, the input attribute condition (optimal branch condition in each input attribute) for separating non-defective products from non-defective products is high without lowering the accuracy. It can be determined with accuracy. In the above example using the data group of Table 1, the ratio of defective products is 4/12, and the ratio of non-defective products is 8/12, and there is no extreme difference between them. Therefore, the condition of the defective product extracted in the present embodiment (Table 34) matches the branch condition extracted by the Gini index method (the condition described in [Problems to be Solved by the Invention]).

以上のように、本実施形態の入力属性条件決定方法によれば、本発明の第１の目的、および第２の目的を、ともに達成できる。 As described above, according to the input attribute condition determining method of the present embodiment, both the first object and the second object of the present invention can be achieved.

ステップ１０９以降の処理は、上記の入力属性条件決定方法で決定された入力属性条件を活用した好適なデータ分析方法であり、以下では、その処理内容を説明する。 The processing after step 109 is a suitable data analysis method utilizing the input attribute conditions determined by the above-described input attribute condition determination method, and the details of the processing will be described below.

［ステップ１０９］
不良品分離度演算部１１２が、入力属性条件決定部１１１で決定した入力属性条件（表３４）の各々に対し、分類後基本データ群ＤＡ００（分析データ群ＤＡ００’ではない）の中で、該入力属性条件を満たすデータの個数（表３５の「ＤＡ１＋ＤＡ２」列）と、該入力属性条件を満たし、かつ、不良品の第２データ群ＤＡ２に該当するデータの個数（表３５の「ＤＡ２」列）とを集計する。そして、表３５の「ＤＡ２」列の値を、「ＤＡ１＋ＤＡ２」列の値で除算した、不良品分離度を演算する（Ｓ１０９）。各入力属性条件の不良品分離度は、該入力属性条件による不良品切り出しの確度（分類後基本データ群ＤＡ００の中で該入力属性条件に属するデータを母集団としたときの不良率）を表しており、特許請求の範囲における第２データ群分離度に対応する。 [Step 109]
The defective product separation degree calculation unit 112 divides each of the input attribute conditions (Table 34) determined by the input attribute condition determination unit 111 into the classified basic data group DA00 (not the analysis data group DA00 ′). The number of data satisfying the input attribute condition ("DA1 + DA2" column in Table 35) and the number of data satisfying the input attribute condition and corresponding to the second data group DA2 of defective products ("DA2" column in Table 35) ). Then, the value of the "DA2" column of Table 35 is divided by the value of the "DA1 + DA2" column to calculate the degree of defective product separation (S109). The degree of defective product isolation under each input attribute condition indicates the accuracy of defective product cutout based on the input attribute condition (the defect rate when data belonging to the input attribute condition in the classified basic data group DA00 is used as a population). And corresponds to the second data group separation degree in the claims.

表３５は、上記不良品分離度演算部１１２による演算結果と併せて、その「Ｔｏｔａｌ」行に、分類後基本データ群ＤＡ００の中における、データ総数（「ＤＡ１＋ＤＡ２」列の値＝１２）、不良品の第２データ群ＤＡ２の個数（「ＤＡ２」列の値＝４）、および、不良品含有率（「不良品分離度」列の値＝４/１２＝０．３３３）を示したテーブルである。不良品含有率は、分類後基本データ群ＤＡ００の全データを母集団としたときの不良率を表しており、特許請求の範囲における第２データ群含有率に対応する。 Table 35 shows the total number of data (the value of the “DA1 + DA2” column = 12) in the “Total” row in the “Total” row, together with the calculation result by the defective product separation degree calculation unit 112, A table showing the number of non-defective second data groups DA2 (the value in the “DA2” column = 4) and the defective content ratio (the value in the “defective product separation” column = 4/12 = 0.333) is there. The defective content rate indicates a defective rate when all the data of the classified basic data group DA00 is used as a population, and corresponds to the second data group content rate in the claims.

表３５の各列の意味については、これらをベン図で表現した図１６を参照すると理解しやすい。表３５の各列の意味については、これらをベン図で表現した図１６（ａ）〜図１６（ｄ）を参照すると理解しやすい。図１６（ａ）〜図１６（ｄ）は、それぞれ、表３５の各列の入力属性条件を満たすデータの集合と不良品の第２データ群ＤＡ２の集合との関係を示すベン図である。 The meaning of each column in Table 35 can be easily understood by referring to FIG. The meaning of each column in the table 35 can be easily understood by referring to FIGS. 16A to 16D in which these are expressed in a Venn diagram. FIGS. 16A to 16D are Venn diagrams each showing a relationship between a set of data that satisfies the input attribute condition of each column of the table 35 and a set of the second data group DA2 of defective products.

［ステップ１１０］
第１の要因抽出部１０９が、表３５の、各入力属性条件の中で、分類後基本データ群ＤＡ００の不良品含有率（「Ｔｏｔａｌ」行の「不良品分離度」列の値＝０．３３３）よりも大きい値の不良品分離度をもつ入力属性条件を、不良品の第２データ群ＤＡ２の要因を示す情報として抽出する。そして、この結果を、分析結果データ格納部１４に保存する。 [Step 110]
The first factor extracting unit 109 determines, in each of the input attribute conditions in Table 35, the defective content ratio of the classified basic data group DA00 (the value of the “defective product separation degree” column of the “Total” row = 0. The input attribute condition having a defective product separation degree larger than 333) is extracted as information indicating a factor of the second data group DA2 of defective products. Then, the result is stored in the analysis result data storage unit 14.

表３５の例では、ｘ１〜ｘ４についての全ての入力属性条件とも、分類後基本データ群ＤＡ００の不良品含有率よりも高い不良品分離度を有しているので、全ての入力属性条件が抽出される（表３６）。 In the example of Table 35, since all the input attribute conditions for x1 to x4 have a higher defective product separation rate than the defective product content rate of the classified basic data group DA00, all the input attribute conditions are extracted. (Table 36).

なお、表３６の各入力属性条件は、分類後基本データ群ＤＡ００の中からランダムに選んだサンプルよりも高い割合で、不良品の第２データ群ＤＡ２を含む条件であり、不良品の第２データ群に対応する出力属性条件の要因を示している。 Note that each input attribute condition in Table 36 is a condition including the second data group DA2 of defective products at a higher ratio than a sample randomly selected from the post-classification basic data group DA00. The factors of the output attribute condition corresponding to the data group are shown.

以上のようにして、問題事象（不良品の第２データ群ＤＡ２）の要因として、「ｘ１＞２」、「ｘ２＞２」、「ｘ３＞２」、「ｘ４≦１０」という入力属性条件が抽出された。 As described above, the input attribute conditions of “x1> 2”, “x2> 2”, “x3> 2”, and “x4 ≦ 10” are the causes of the problem event (the second data group DA2 of defective products). Extracted.

上記のステップ１０１〜ステップ１１０にて、問題事象（不良品の第２データ群ＤＡ２）の要因を抽出できた。 In steps 101 to 110, the cause of the problem event (the second data group DA2 of defective products) could be extracted.

しかし、その過程（ステップ２０４）で、個々の入力属性がとる各値に対して演算した閾値評価指標（ｘｊ頻度累積差％）には、当該入力属性以外の入力属性の影響が外乱として含まれており、場合によっては、閾値評価指標（ｘｊ頻度累積差％）の確度を落としてしまっている恐れがある。また、ある入力属性ｘｊにおいて、問題事象の要因が、「ｘｊ≦ｘｊ−ｔｈ１」、および、「ｘｊ＞ｘｊ−ｔｈ２」という２タイプとなるべき場合に、ステップ２０３〜２０８の処理のみでは、それらのうちの一方の要因しか抽出されない。これらの点を解消するには、さらに、下記ステップによる処理を行うことが好ましい。 However, in the process (step 204), the threshold evaluation index (xj frequency cumulative difference%) calculated for each value of each input attribute includes the influence of the input attribute other than the input attribute as a disturbance. In some cases, the accuracy of the threshold evaluation index (xj frequency cumulative difference%) may be reduced. Further, in a certain input attribute xj, if the causes of the problem event are to be of two types, “xj ≦ xj-th1” and “xj> xj-th2”, only the processing of steps 203 to 208 Only one of the factors is extracted. In order to eliminate these points, it is preferable to further perform the following steps.

［ステップ１１１］
頻度累積比率演算部１６が、閾値決定部１３０（ステップ２０６）で決定（抽出）した各入力属性閾値ｘｊ−ｔｈ（表３０〜表３３参照）について、良品の１−ｘｊ頻度累積％（Ａ）に対する、不良品の２−ｘｊ頻度累積％（Ｂ）の比率（＝Ｂ／Ａ：以下、頻度累積下比率と呼ぶ）、または、１００から良品の１−ｘｊ頻度累積％（Ａ）を引いた値（＝１００−Ａ）に対する、１００から不良品の２−ｘｊ頻度累積％（Ｂ）を引いた値（＝１００−Ｂ）の比率（＝（１００−Ｂ）／（１００−Ａ）：以下、頻度累積上比率と呼ぶ）を、分割ルール評価値として計算する。 [Step 111]
For each input attribute threshold value xj-th (see Tables 30 to 33) determined (extracted) by the threshold value determining unit 130 (Step 206), the frequency cumulative ratio calculation unit 16 calculates 1-xj frequency cumulative% (A) of non-defective products. To the ratio of 2-xj frequency cumulative% (B) of defective products (= B / A: hereinafter referred to as the lower frequency cumulative ratio) or 100 minus 1-xj frequency cumulative% (A) of good products. Ratio (= (100-B) / (100-A)) of value (= 100-B) obtained by subtracting 2-xj frequency cumulative% (B) of defective products from 100 to value (= 100-A): , Referred to as a frequency cumulative upper ratio) as a division rule evaluation value.

なお、入力属性条件決定部１１１で決定された入力属性条件が、「ｘｊ≦ｘｊ−ｔｈ」なるタイプの場合には（極性判定部１３１で大きいと判定された方のｘｊ頻度累積のタイプが、不良品の２−ｘｊ頻度累積である場合には）、分割ルール評価値として頻度累積下比率（＝Ｂ／Ａ）を計算する。ここで、頻度累積下比率（＝Ｂ／Ａ）は、「ｘｊ≦ｘｊ−ｔｈ」という入力属性条件により、良品の第１データ群と分離して不良品の第２データ群を検出できる割合を表している。 If the input attribute condition determined by the input attribute condition determining unit 111 is of the type “xj ≦ xj-th” (the type of xj frequency accumulation determined to be larger by the polarity determining unit 131 is: In the case of 2-xj frequency accumulation of defective products), a frequency accumulation lower ratio (= B / A) is calculated as a division rule evaluation value. Here, the lower frequency cumulative ratio (= B / A) is a ratio at which the second data group of defective products can be detected separately from the first data group of non-defective products under the input attribute condition of “xj ≦ xj-th”. Represents.

また、入力属性条件決定部１１１で決定された入力属性条件が、「ｘｊ＞ｘｊ−ｔｈ」なるタイプの場合には（極性判定部１３１で大きいと判定された方のｘｊ頻度累積のタイプが、良品の１−ｘｊ頻度累積である場合には）、分割ルール評価値として頻度累積上比率（＝（１００−Ｂ）／（１００−Ａ））を計算する。ここで、頻度累積上比率（＝（１００−Ｂ）／（１００−Ａ））は、「ｘｊ＞ｘｊ−ｔｈ」という入力属性条件により、良品の第１データ群と分離して不良品の第２データ群を検出できる割合を表している。 When the input attribute condition determined by the input attribute condition determination unit 111 is of the type “xj> xj-th” (the type of xj frequency accumulation determined to be larger by the polarity determination unit 131 is: In the case of 1-xj frequency accumulation of non-defective products, a frequency accumulation ratio (= (100−B) / (100−A)) is calculated as a division rule evaluation value. Here, the frequency cumulative ratio (= (100−B) / (100−A)) is separated from the first data group of non-defective products by the input attribute condition of “xj> xj-th”. It represents the rate at which two data groups can be detected.

言い換えると、分割ルール評価値（頻度累積下比率または頻度累積上比率）は、各入力属性条件に対して、「入力属性ｘｊが入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という相関ルールの確からしさを表している。 In other words, the division rule evaluation value (lower frequency cumulative ratio or upper frequency cumulative ratio) is, for each input attribute condition, “if the input attribute xj satisfies the input attribute condition, the second data group in the analysis data group This is the data that is included. "

各入力属性条件に対する、分割ルール評価値（頻度累積下比率または頻度累積上比率）を表３７に示す。 Table 37 shows the division rule evaluation value (frequency cumulative lower ratio or frequency cumulative upper ratio) for each input attribute condition.

なお、このステップ１１１での処理は、２種類のルール評価値（第１および第２のルール評価値）を計算するのではなく、入力属性条件決定部１１１で決定された入力属性条件のタイプ（極性判定部１３１で大きいと判定された方のｘｊ頻度累積のタイプ）に応じて、頻度累積下比率または頻度累積上比率を分割ルール評価値として計算する点以外は、実施の形態２のデータ分析方法におけるステップ７での処理と同様である。 Note that the processing in step 111 does not calculate two types of rule evaluation values (first and second rule evaluation values), but instead calculates the type of the input attribute condition ( The data analysis of the second embodiment, except that the lower frequency cumulative ratio or the upper frequency cumulative ratio is calculated as the division rule evaluation value according to the xj frequency accumulation type determined to be larger by the polarity determination unit 131) This is the same as the processing in step 7 of the method.

［ステップ１１２］
次に、データ分割部１１５が、入力属性条件決定部１１１で決定された入力属性条件の中から、上記ステップ１１１の分割ルール評価値（頻度累積下比率または頻度累積上比率；表３７）の値が最大となる入力属性条件を抽出する（Ｓ１１２）。 [Step 112]
Next, the data dividing unit 115 determines the value of the division rule evaluation value (frequency cumulative lower ratio or frequency cumulative upper ratio; Table 37) from the input attribute conditions determined by the input attribute condition determining unit 111. The input attribute condition that maximizes the value is extracted (S112).

表３７を参照して、入力属性条件「ｘ２＞２」は、全ての入力属性条件の中で最大の分割ルール評価値をもち、分割ルール評価値＝頻度累積上比率＝∞となっている。これは、入力属性条件「ｘ２＞２」にて、良品の第１データ群ＤＡ１と完全に分離して、不良品の第２データ群ＤＡ２を検出できる事を示している。 Referring to Table 37, input attribute condition “x2> 2” has the largest division rule evaluation value among all the input attribute conditions, and division rule evaluation value = frequency cumulative ratio = ∞. This indicates that under the input attribute condition “x2> 2”, the second data group DA2 of the defective product can be detected completely separated from the first data group DA1 of the non-defective product.

ここで、別の見方をすると、「ｘ２＞２」なる入力属性条件は、他の入力属性（ｘ１、ｘ３、ｘ４）が如何なる値であっても、不良品の第２データ群ＤＡ２に対応するから、他の入力属性（ｘ１、ｘ３、ｘ４）の入力属性条件を決定する上では（ステップ２０３〜２０６）、または、閾値評価指標（ｘｊ頻度累積差％）を演算する上では（ステップ２０５）、外乱因子となっている可能性がある。このような場合、他の入力属性（ｘ１、ｘ３、ｘ４）の入力属性条件は、「ｘ２＞２」に該当するデータを、分析データ群ＤＡ００’から除外して求める方が望ましい。 From another point of view, the input attribute condition “x2> 2” corresponds to the defective second data group DA2 regardless of the values of the other input attributes (x1, x3, x4). To determine the input attribute conditions of the other input attributes (x1, x3, x4) (steps 203 to 206) or to calculate the threshold evaluation index (xj frequency cumulative difference%) (step 205) , It may be a disturbance factor. In such a case, it is preferable that the input attribute conditions of the other input attributes (x1, x3, x4) be obtained by excluding data corresponding to “x2> 2” from the analysis data group DA00 ′.

そこで、データ分割部１１５は、上記抽出した「ｘ２＞２」なる入力属性条件に基づいて、分析データ群ＤＡ００’を、「ｘ２＞２」を満たす要因データ群と、「ｘ２＞２」を満たさない（「ｘ２≦２」を満たす）他データ群とに分割する。要因データ群を表３８に、他データ群を表３９に示す。 Therefore, based on the extracted input attribute condition of “x2> 2”, the data dividing unit 115 converts the analysis data group DA00 ′ into a factor data group satisfying “x2> 2” and a factor data group satisfying “x2> 2”. There is no other data group (satisfies “x2 ≦ 2”). Table 38 shows the factor data group and Table 39 shows the other data group.

このステップ１１２は、実施の形態２のデータ分析方法におけるステップ９に対応している。なお、ここでは、分割ルール評価値として、頻度累積下比率または頻度累積上比率を演算しているが、Ｇｉｎｉインデックスや、上記の頻度累積差等の他の評価指標を用いてもよい。 This step 112 corresponds to step 9 in the data analysis method according to the second embodiment. Here, the frequency accumulation lower ratio or the frequency accumulation upper ratio is calculated as the division rule evaluation value, but another evaluation index such as a Gini index or the above-mentioned frequency accumulation difference may be used.

［ステップ１１３］
次に、分析データ群抽出部１０６が、ステップ１１２で分割されたデータ群のうち、他データ群を次の分析データ群ＤＡ００’として抽出する（Ｓ１０４）。そして、終了条件判定部１１で終了条件を満たしていると判定されるまで、上記のステップ１０６〜ステップ１１３の処理が繰り返される。すなわち、２回目のステップ１０６の後は、ステップ１１３に移行し、終了条件判定部１１で終了条件を満たしているか否かが判定される（Ｓ１１３）。そして、終了条件判定部１１で終了条件を満たしていないと判定された場合には、ステップ１０７〜ステップ１１２およびステップ１０４の処理を再度行い、終了条件判定部１１で終了条件を満たしていると判定された場合には、ステップ１１４に移行する。このステップ１１３での終了条件の判定は、実施の形態２のデータ分析方法におけるステップ１０での終了条件の判定と同様である。 [Step 113]
Next, the analysis data group extraction unit 106 extracts another data group from the data group divided in step 112 as the next analysis data group DA00 '(S104). Until the termination condition determination unit 11 determines that the termination condition is satisfied, the processing of steps 106 to 113 is repeated. That is, after the second step 106, the process proceeds to step 113, where the termination condition determination unit 11 determines whether or not the termination condition is satisfied (S113). If the termination condition determination unit 11 determines that the termination condition is not satisfied, the processing of steps 107 to 112 and step 104 is performed again, and the termination condition determination unit 11 determines that the termination condition is satisfied. If so, the process proceeds to step 114. The determination of the termination condition in step 113 is the same as the determination of the termination condition in step 10 in the data analysis method according to the second embodiment.

本実施形態の終了条件判定部１１は、繰返し処理中の上記ステップ１０６において不良品の第２データ群ＤＡ２のデータ個数が０となった場合を終了条件として判定するようになっている。このように不良品の第２データ群ＤＡ２のデータ個数が０となるまで繰り返し処理を実行することにより、不良品の第２データ群ＤＡ２に対する、詳細な要因分析結果が得られる。 The termination condition determination unit 11 of the present embodiment is configured to determine, as the termination condition, a case where the number of data of the second data group DA2 of the defective product becomes 0 in step 106 during the repetitive processing. As described above, by repeatedly performing the processing until the number of data in the second data group DA2 of the defective product becomes 0, a detailed factor analysis result for the second data group DA2 of the defective product can be obtained.

なお、終了条件は、第２データ群ＤＡ２のデータ個数に基づく他の終了条件、例えば、（１）繰返し処理中の上記ステップ１０６において第２データ群ＤＡ２のデータ個数が所定数以下となった場合、（２）繰返し処理中の上記ステップ１０６において第１データ群ＤＡ１のデータ個数に対する第２データ群ＤＡ２のデータ個数の割合が所定割合以下となった場合、（３）繰返し処理中の上記ステップ１１２で抽出された入力属性条件の分割ルール評価値が所定の値を下回った場合等としてもよい。これらのような終了条件を用いた場合、より簡潔で十分な要因分析結果を得ることができる。さらに、簡潔な要因分析結果を得ることを優先する場合には、終了条件を単に繰返し処理を所定回数行った場合としたり、終了条件判定部１１を省いて、可能な限り繰り返し処理を行うようにしてもよい。 Note that the end condition is another end condition based on the number of data of the second data group DA2, for example, (1) when the number of data of the second data group DA2 becomes equal to or less than a predetermined number in step 106 during the repetitive processing. (2) When the ratio of the number of data of the second data group DA2 to the number of data of the first data group DA1 is equal to or less than a predetermined ratio in the above step 106 during the repetitive processing, (3) the above step 112 during the repetitive processing May be the case where the division rule evaluation value of the input attribute condition extracted in step (1) falls below a predetermined value. When such termination conditions are used, a simpler and more sufficient factor analysis result can be obtained. Furthermore, when giving priority to obtaining a simple factor analysis result, the termination condition is simply determined to be a case where repetition processing is performed a predetermined number of times, or the termination condition determination unit 11 is omitted, and the processing is repeated as much as possible. You may.

２回目のステップ１０９における、不良品分離度演算部１１２の演算結果を表４０（１回目の表３５に対応）に示す。 Table 40 (corresponding to the first time Table 35) shows the result of the calculation performed by the defective product separation degree calculation unit 112 in the second step 109.

表４０の例では、ｘ１、ｘ３、ｘ４の入力属性条件が、分類後基本データ群ＤＡ００の不良品含有率（第２データ群含有率）よりも高い不良品分離度（第２データ群分離度）を有しており、これらの条件が抽出される（２回目のステップ１１０；表４１）。 In the example of Table 40, the input attribute conditions of x1, x3, and x4 are higher than the defective content rate (second data group content rate) of the classified basic data group DA00. ), And these conditions are extracted (second step 110; Table 41).

一方、ｘ２の入力属性条件（ｘ２≦１）における不良品分離度（第２データ群分離度）は、分類後基本データ群ＤＡ００の不良品含有率よりも低くなっているので、ｘ２の入力属性条件は抽出しない。 On the other hand, the defective product separation degree (second data group separation degree) under the input attribute condition of x2 (x2 ≦ 1) is lower than the defective content rate of the classified basic data group DA00. No conditions are extracted.

以上のようにして、２回目の処理（分類後基本データ群ＤＡ００の中で「ｘ２≦２」を満たすデータを分析データ群とした処理）によって、問題事象（不良品の第２データ群ＤＡ２）の要因として、「ｘ１＞２」、「ｘ３＞２」、「ｘ４≦１０」という入力属性条件が抽出された（表４１）。 As described above, the problem (the defective second data group DA2) is performed by the second processing (processing in which data satisfying “x2 ≦ 2” in the post-classification basic data group DA00 is set as the analysis data group). As the factors, the input attribute conditions “x1> 2”, “x3> 2”, and “x4 ≦ 10” were extracted (Table 41).

また、２回目のステップ１１１で演算された、分割ルール評価値（頻度累積下比率または頻度累積上比率）を表４２（１回目の表３７に対応）に示す。 Table 42 (corresponding to the first table 37) shows the division rule evaluation value (lower frequency cumulative ratio or upper frequency cumulative ratio) calculated in the second step 111.

この例の場合、分割ルール評価値は、「ｘ１＞２」と「ｘ４≦１０」において、最大の４となっているが、データ分割部１１５は、これらのうちの一方を選択する。この選択基準には一定のルールがあればよいが、例えば、入力属性ｘｊの番号ｊが若いものを優先し、「ｘ１＞２」を選択する（２回目のステップ１１２）。 In this example, the division rule evaluation value is a maximum of 4 in “x1> 2” and “x4 ≦ 10”, and the data division unit 115 selects one of them. This selection criterion only needs to have a certain rule. For example, priority is given to a smaller number j of the input attribute xj, and “x1> 2” is selected (second step 112).

データ分割部１１５が分割したデータ群のうち、他データ群（２回目の分析データ群の中で、「ｘ１≦２」を満たすデータ群）が、３回目の分析データ群として分析データ群抽出部１０６によって抽出される（表４３）。 Among the data groups divided by the data division unit 115, another data group (a data group satisfying “x1 ≦ 2” in the second analysis data group) is set as an analysis data group extraction unit as a third analysis data group. Extracted by 106 (Table 43).

しかし、表４３の３回目の分析データ群には、不良品のデータ（第２データ群ＤＡ２；ｙ＝２）が含まれていなかったため、この時点で（２回目の要因抽出まで行って）繰り返し処理が終了した。 However, since the third analysis data group in Table 43 did not include defective product data (second data group DA2; y = 2), the analysis was repeated at this time (up to the second factor extraction). Processing has ended.

［ステップ１１４］
ステップ１１０の繰り返し処理毎に抽出された入力属性条件（表３５、表４１）をまとめた抽出要因一覧テーブルを表４４に示す。 [Step 114]
Table 44 shows an extraction factor list table in which the input attribute conditions (Tables 35 and 41) extracted for each repetition of Step 110 are summarized.

表４４の抽出要因一覧テーブルには、第１の要因抽出部１０９（ステップ１１０）の繰り返し処理による、同一の入力属性における複数の入力属性条件が全て示されている。 The extraction factor list table of Table 44 shows all of the plurality of input attribute conditions for the same input attribute by the repetition processing of the first factor extraction unit 109 (step 110).

要因決定部１１７は、同一の入力属性における複数の入力属性条件（表４４）のうちで、優先度の高い入力属性条件のみを選択する（Ｓ１１４）。 The factor determining unit 117 selects only an input attribute condition having a higher priority from a plurality of input attribute conditions (Table 44) with the same input attribute (S114).

具体的には、同一の入力属性に対して、「入力属性が閾値以下である」という第１のパターンの中で、不良品分離度（第２データ群分離度）が最大となる入力属性条件を１つと、「入力属性が閾値を超える」という第２のパターンの中で、不良品分離度（第２データ群分離度）が最大となる入力属性条件を１つとを、選択する。 More specifically, for the same input attribute, the input attribute condition in which the defective item separation degree (second data group separation degree) is maximum in the first pattern “input attribute is equal to or less than threshold”. And one input attribute condition that maximizes the rejection degree (second data group separation degree) in the second pattern “input attribute exceeds the threshold value”.

表４４の例の場合には、最終的に、表４５に示す４条件が、問題事象（不良品の第２データ群ＤＡ２）の要因として選択された。 In the case of the example of Table 44, finally, the four conditions shown in Table 45 were selected as the causes of the problem event (the second data group DA2 of defective products).

表４５は、要因決定部１１７が問題事象の要因として決定した（選択した）入力属性条件の一覧であり、このテーブルを決定要因一覧テーブルと呼ぶ。決定要因一覧テーブルは、分析結果データ格納部１４に保存される。 Table 45 is a list of input attribute conditions determined (selected) as the cause of the problem event by the factor determination unit 117, and this table is called a determined factor list table. The determinant list table is stored in the analysis result data storage unit 14.

上記２回の繰り返し処理の過程を、決定木の形式（図１２と同様の形式）で表現すると図１７のようになる。図１７を参照して、本実施形態では、決定木の分岐毎に、この最終的な分岐条件だけでなく、全ての入力属性について問題事象（不良品の第２データ群ＤＡ２）の要因となる入力属性条件を求め（ステップ２０８の入力属性条件決定部１１１による処理）、これらのうち、不良品分離度の高い入力属性条件のみを抽出している（ステップ１１０の第１の要因抽出部１０９による処理）。そして、分岐の回数分（繰り返し処理の回数分）の全ての入力属性条件の中で、さらに不良品分離度の高い入力属性条件を絞り込み、最終の不良要因として決定している（ステップ１１４の要因決定部１１７による処理）。 FIG. 17 shows the process of the above-described two repetitions in the form of a decision tree (a format similar to that of FIG. 12). Referring to FIG. 17, in the present embodiment, for each branch of the decision tree, not only this final branch condition, but also all input attributes cause a problem (second defective data group DA2). The input attribute conditions are obtained (the processing by the input attribute condition determining unit 111 in step 208), and only the input attribute conditions with a high degree of defective product separation are extracted from these (by the first factor extracting unit 109 in step 110). processing). Then, among all the input attribute conditions for the number of times of branching (for the number of times of repetition processing), the input attribute conditions with a higher degree of rejection are narrowed down and determined as the final failure factor (factor of step 114). Processing by the determination unit 117).

このように、決定木における分岐条件以外の条件であっても、不良品分離度の高い条件を全て抽出しているから、分岐条件に競合因子が存在しても、その要因を逃すことなく、確実に捉えることができる。また、分岐毎の要因抽出（第１の要因抽出部１０９による処理）、および、最終的な要因決定（要因決定部１１７による処理）において、不良品分離度という明確な指標に基づいて要因の抽出または決定を行っているため、如何に複雑な決定木となろうとも、明確に問題事象の要因を把握することができる。さらに、不良品分離度を評価指標としているから、決定した複数の要因（入力属性条件）に対して、優先順位付けを行うことが可能となる。 As described above, even if conditions other than the branch condition in the decision tree are used, all the conditions having a high degree of rejection are extracted. Therefore, even if there is a competitive factor in the branch condition, the factor is not missed. It can be captured reliably. In addition, in the factor extraction for each branch (processing by the first factor extracting unit 109) and final factor determination (processing by the factor determining unit 117), factor extraction is performed based on a clear index of defective product separation. Or, since the decision is made, it is possible to clearly grasp the cause of the problem event no matter how complicated the decision tree becomes. Furthermore, since the degree of defective product separation is used as an evaluation index, it is possible to prioritize a plurality of determined factors (input attribute conditions).

［ステップ１１５］
複合要因不良数計算部１１８が、決定要因一覧テーブル（表４５）の入力属性条件のうち、２つの条件の複合要因による不良数を計算する（表４６）。 [Step 115]
The composite factor defect number calculation unit 118 calculates the number of defects due to the composite factor of two conditions among the input attribute conditions in the determinant list table (Table 45) (Table 46).

表４６において、タイトル行とタイトル列には、それぞれ、決定要因一覧テーブルの各入力属性条件が示されており、その交差部には、２つの入力属性条件の複合要因による不良数（第２データ群ＤＡ２の個数）が示されている。例えば、「ｘ１＞２」行、「ｘ２＞２」列は、
「ｘ１＞２」かつ「ｘ２＞２」を満たし、かつ、不良品の第２データ群ＤＡ２に該当するデータの個数（＝１）を表している。以下、表４６のテーブルを複合要因テーブルと呼ぶ。 In Table 46, the title row and the title column respectively show the input attribute conditions of the determinant list table, and at the intersection thereof, the number of failures due to the compound factor of the two input attribute conditions (second data (The number of groups DA2). For example, “x1> 2” row and “x2> 2” column
It represents the number (= 1) of data satisfying “x1> 2” and “x2> 2” and corresponding to the second data group DA2 of defective products. Hereinafter, the table of Table 46 is referred to as a composite factor table.

［ステップ１１６］
数値−文字データ変換部１１９では、必要に応じて、決定要因一覧テーブル（表４５）や複合要因テーブル（表４６）における入力属性閾値ｘｊ−ｔｈの数値を文字データに変換する。文字データへの変換ルールは、ステップ１００の変換の逆変換となるルールであり、下記の通りである。
（ｘ１）１→Ａ、２→Ｂ、３→Ｃ、４→Ｄ
（ｘ２）１→ａ、２→ｂ、３→ｃ、４→ｄ
（ｘ３）変換せず
（ｘ４）変換せず
表４５の決定要因一覧テーブルにおける入力属性閾値ｘｊ−ｔｈを文字データに変換した要因一覧テーブルを表４７に示す。 [Step 116]
The numerical value-character data conversion unit 119 converts the numerical value of the input attribute threshold value xj-th in the determinant list table (Table 45) or the composite factor table (Table 46) into character data as necessary. The conversion rule to character data is a rule that is the reverse conversion of the conversion in step 100, and is as follows.
(X1) 1 → A, 2 → B, 3 → C, 4 → D
(X2) 1 → a, 2 → b, 3 → c, 4 → d
(X3) No conversion (x4) No conversion Table 47 shows a factor list table in which the input attribute threshold value xj-th in the determinant factor list table of Table 45 is converted into character data.

［ステップ１１７］
以上でデータ分析を終了し、抽出要因一覧テーブル（表４４）、決定要因一覧テーブル（表４５、表４７）、複合要因テーブル（表４６）やデータ分析過程での各種情報が、最終的に、分析結果データとしてハードディスク等の分析結果データ格納部１４に格納される。これらの分析結果データは、適宜、分析結果データ格納部１４から表示装置や印刷装置等の出力部１５に送られ、テーブル（例えば表４７）、決定木（例えば図１７）や、グラフとして、表示装置にて表示したり、印刷装置にて印刷したりすることができる。 [Step 117]
The data analysis is completed as described above, and the extraction factor list table (Table 44), the determinant list table (Tables 45 and 47), the composite factor table (Table 46), and various information in the data analysis process are finally The analysis result data is stored in the analysis result data storage unit 14 such as a hard disk. These analysis result data are appropriately sent from the analysis result data storage unit 14 to the output unit 15 such as a display device or a printing device, and are displayed as a table (for example, Table 47), a decision tree (for example, FIG. 17), or a graph. It can be displayed on a device or printed by a printing device.

一例として、決定要因一覧テーブル（表４７）を要因内訳パレート図として表示した例を図１８に示す。図１８では、決定要因一覧テーブル（表４７）の各入力属性条件に起因する不良数（第２データ群ＤＡ２の個数）を棒グラフで、また、不良品分離度（第２データ群分離度）を折れ線グラフで示している。 As an example, FIG. 18 shows an example in which the determined factor list table (Table 47) is displayed as a factor breakdown Pareto diagram. In FIG. 18, the number of defectives (the number of the second data group DA2) caused by each input attribute condition of the determinant list table (Table 47) is represented by a bar graph, and the defective product separation (second data group separation) is displayed. This is shown in a line graph.

ユーザは、図１８の結果を参照する事により、「入力属性ｘ１〜ｘ４の各々がどの値の範囲にあるから製品特性が悪いのか？」という製品特性不良の要因を一眺して即座に判断できる。また、対策を施すべき順序（優先順位）を、不良品分離度（第２データ群分離度）から判断できる。さらに、図１８の入力属性条件に対する対策の結果、不良数をどの程度減じることができるかを、不良数（第２データ群ＤＡ２の個数）から見込む事ができる。 The user, by referring to the result of FIG. 18, can immediately judge the cause of the product characteristic failure such as "in which value range is each of the input attributes x1 to x4, the product characteristic is bad?" it can. In addition, the order (priority order) in which countermeasures should be taken can be determined from the defective product separation degree (second data group separation degree). Furthermore, as a result of the countermeasure against the input attribute condition shown in FIG. 18, it can be estimated from the number of defects (the number of the second data group DA2) how much the number of defects can be reduced.

図１８の例の場合には、不良品分離度（第２データ群分離度）が最も高い、入力属性ｘ２（「ｘ２＞２」、すなわち「ｘ２＝ｃｏｒｄ」）を第１に対策すべきであり、この対策により、４個の不良のうちの２個の不良が解消される（不良全体の５０％が解消される）ことが見込まれる。 In the case of the example of FIG. 18, the first measure is taken for the input attribute x2 (“x2> 2”, that is, “x2 = c ord”), which has the highest degree of defective product separation (second data group separation). It is expected that two of the four defects will be eliminated (50% of the total defects will be eliminated) by this measure.

また、第２に対策すべき内容については、複合要因テーブル（表４６）を用いて、第１の要因（「ｘ２＞２」、すなわち「ｘ２＝ｃｏｒｄ」）と他の要因との複合度を調べることで判断できる。図１９は、図１８の各要因（入力属性条件）の棒グラフ（不良数）において、第１の要因（「ｘ２＞２」、すなわち「ｘ２＝ｃｏｒｄ」）との複合要因による不良数に、ハッチングを付けて示したものである。図１９から、「ｘ１＞２」、すなわち「ｘ１＝ＣｏｒＤ」）は、不良品分離度（第２データ群分離度）が高く、かつ、第１の要因（「ｘ２＞２」、すなわち「ｘ２＝ＣｏｒＤ」）と重複しない不良数が多いことから、第１の要因に対する独立要因の可能性が高く、第２に対策すべき項目であることが読みとれる。 For the content to be dealt with secondly, the complex factor table (Table 46) is used to combine the first factor (“x2> 2”, that is, “x2 = cord”) with another factor. It can be determined by examining the degree. FIG. 19 shows, in the bar graph (number of defects) of each factor (input attribute condition) in FIG. 18, the number of defects due to a compound factor with the first factor (“x2> 2”, that is, “x2 = cord”). , With hatching. From FIG. 19, “x1> 2”, that is, “x1 = C or D”, means that the defective product separation (second data group separation) is high and the first factor (“x2> 2”, that is, Since there is a large number of defects that do not overlap with “x2 = C or D”), it is highly probable that the first factor is an independent factor, and it can be read that this is a second item to be dealt with.

なお、図１９では、さらに、第１の要因（「ｘ２＞２」、すなわち「ｘ２＝ＣｏｒＤ」）との複合要因（または従属因子）を抽出することも可能で、この例では、ハッチング部の割合が大きい「ｘ４≦１０」が抽出される。 In FIG. 19, it is also possible to extract a complex factor (or a dependent factor) with the first factor (“x2> 2”, that is, “x2 = C or D”). In this example, hatching is used. “X4 ≦ 10” having a large copy ratio is extracted.

本実施形態のデータ分析方法（ステップ１００〜ステップ１１７）では、上記した入力属性条件決定法（ステップ２０３〜ステップ２０８（ステップ１０７）の処理）の作用効果に加えて、以下のような効果を有する。 The data analysis method (steps 100 to 117) of the present embodiment has the following effects in addition to the effects of the above-described input attribute condition determination method (the processing of steps 203 to 208 (step 107)). .

すなわち、不良品切り出しの確度（分類後基本データ群ＤＡ００の中で入力属性条件に属するデータを母集団としたときの不良率）を示す、不良品分離度（第２データ群分離度）という明確な指標に基づいて、不良の要因を決定しているので、抽出した要因の優先順位付けを行うことができ、表４７（または表４５）の決定要因一覧テーブル、または図１８の要因内訳パレート図に示したような非常に簡潔な形で、問題事象の要因を導き出せる。そして、これを用いて、問題事象に対する各要因（入力属性条件）の不良品分離度（第２データ群分離度）や不良数を求める事ができる。 In other words, it is clear that the degree of defective product isolation (second data group separability) indicates the accuracy of defective product extraction (the defect rate when the data belonging to the input attribute condition in the classified basic data group DA00 is a population). Since the cause of the defect is determined based on the appropriate index, the extracted factors can be prioritized, and the determined factor list table in Table 47 (or Table 45) or the factor breakdown Pareto diagram in FIG. The cause of the problem can be derived in a very concise manner as shown in the above. Using this, it is possible to obtain the defect isolation (second data group isolation) and the number of defects for each factor (input attribute condition) for the problem event.

なお、上述した実施形態では、複数の分岐（繰り返し）による決定木を生成していたが、一回の分岐だけでよければ、ステップ１１０で終了してもよい。 In the above-described embodiment, the decision tree is generated by a plurality of branches (repetition). However, if only one branch is required, the process may end in step 110.

また、上記では、ステップ１１３において、分析データ群抽出部１０６が、分割されたデータ群のうち、他データ群のみを次の分析データ群として抽出したが、要因データ群も分析データ群として抽出し、ステップ１０６〜ステップ１１３の処理を繰り返すようにしてもよい。これにより、より詳細な分析を行うことができる。 In the above description, in step 113, the analysis data group extraction unit 106 extracts only the other data group among the divided data groups as the next analysis data group, but also extracts the factor data group as the analysis data group. , The processing of steps 106 to 113 may be repeated. Thereby, more detailed analysis can be performed.

また、上記では、第２データ群ＤＡ２を不良品のデータ群とし、不良の要因を抽出するデータ分析例を示したが、第２データ群ＤＡ２を良品のデータ群とし、良品を得るための条件を抽出するデータ分析としてもよい。 Further, in the above description, an example of data analysis in which the second data group DA2 is used as a defective data group and the cause of the defect is extracted is shown. May be analyzed as data.

以上で説明した入力属性条件決定方法、および、データ分析方法は、それぞれ、コンピュータが図１４・１５のＳ２０３〜Ｓ２０８（ステップ２０３〜ステップ２０８）、および、Ｓ１００〜Ｓ１１７（ステップ１００〜ステップ１１７）に対応するプロセスを含むプログラムを実行することによって実現できる。 In the input attribute condition determination method and the data analysis method described above, the computer performs the processing in steps S203 to S208 (steps 203 to 208) and S100 to S117 (steps 100 to 117) in FIGS. This can be realized by executing a program including a corresponding process.

したがって、図１３中の入力属性条件決定装置１００Ａは、入力属性条件決定プログラムが、コンピュータを、データ行分離部１０７、データ列抽出部５、頻度演算部６、頻度累積差演算部７、閾値決定部１３０、極性判定部１３１、および、入力属性条件決定部１１１として機能させることにより実現することが可能である。 Therefore, in the input attribute condition determining apparatus 100A in FIG. 13, the input attribute condition determining program causes the computer to execute the data line separating unit 107, the data string extracting unit 5, the frequency calculating unit 6, the frequency cumulative difference calculating unit 7, the threshold determining unit. It can be realized by functioning as the unit 130, the polarity determination unit 131, and the input attribute condition determination unit 111.

また、図１３のデータ分析装置１００は、データ分析プログラムが、コンピュータを、文字−数値データ変換部１、分類条件設定部１０３、データ分類部１０４、分析データ群抽出部１０６、データ行分離部１０７、データ列抽出部５、頻度演算部６、頻度累積差演算部７、閾値決定部１３０、極性判定部１３１、入力属性条件決定部１１１、不良品分離度演算部１１２、第１の要因抽出部１０９、頻度累積比率演算部１６、データ分割部１１５、終了条件判定部１１、要因決定部１１７、複合要因不良数計算部１１８、数値−文字データ変換部１１９として機能させることにより実現することが可能である。 Further, in the data analysis device 100 of FIG. 13, the data analysis program causes the computer to convert the character-numerical data conversion unit 1, the classification condition setting unit 103, the data classification unit 104, the analysis data group extraction unit 106, the data line separation unit 107 , Data string extraction unit 5, frequency calculation unit 6, frequency cumulative difference calculation unit 7, threshold value determination unit 130, polarity determination unit 131, input attribute condition determination unit 111, defective product separation degree calculation unit 112, first factor extraction unit 109, a frequency accumulation ratio calculation unit 16, a data division unit 115, an end condition determination unit 11, a factor determination unit 117, a complex factor failure number calculation unit 118, and a numerical value-character data conversion unit 119. It is.

上記プログラムは、コンピュータで読み取り可能な記録媒体に格納してユーザに提供することができる。この記録媒体は、コンピュータ本体に内蔵された内蔵メディアであってもよいし、コンピュータ本体に対して分離可能に構成されたリムーバブル・メディアであってもよい。上記内蔵メディアとしては、ＲＯＭ；フラッシュメモリ等の書き換え可能な不揮発性メモリ；ハードディスク等が挙げられる。また、上記リムーバブル・メディアとしては、ＣＤ−ＲＯＭ、ＤＶＤ等の光記録媒体；ＭＯ等の光磁気記録媒体；フロッピー（登録商標）ディスク、カセットテープ、リムーバブル・ハードディスク等の磁気記録媒体；メモリカード等のような書き換え可能な不揮発性メモリを内蔵したメディア；ＲＯＭカセット等のようなＲＯＭを内蔵したメディア等が挙げられる。 The above program can be stored in a computer-readable recording medium and provided to a user. This recording medium may be a built-in medium built in the computer main body, or may be a removable medium configured to be separable from the computer main body. Examples of the built-in media include a ROM; a rewritable nonvolatile memory such as a flash memory; and a hard disk. Examples of the removable media include optical recording media such as CD-ROM and DVD; magneto-optical recording media such as MO; magnetic recording media such as floppy (registered trademark) disks, cassette tapes and removable hard disks; And a medium having a built-in ROM such as a ROM cassette.

上記プログラムは、ＣＰＵのアクセスにより実行される構成であってもよいし、記録媒体に格納されているプログラムを読み出し、読み出したプログラムを内蔵メディアのプログラム記憶領域に転送した後、内蔵メディア上のプログラムがＣＰＵのアクセスにより実行される構成であってもよい。また、上記プログラムは、コンピュータで読み取り可能な記録媒体に格納された状態で販売されるものに限定されるものではなく、インターネット等の通信ネットワークを介してユーザのコンピュータに転送する形式で販売されるものであってもよい。 The above program may be configured to be executed by access of a CPU, or may read a program stored in a recording medium, transfer the read program to a program storage area of the internal medium, and then execute the program on the internal medium. May be executed by accessing the CPU. The program is not limited to being sold in a state stored in a computer-readable recording medium, but is sold in a form of being transferred to a user's computer via a communication network such as the Internet. It may be something.

本発明は上述した各実施形態に限定されるものではなく、請求項に示した範囲で種々の変更が可能であり、異なる実施形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施形態についても本発明の技術的範囲に含まれる。
〔実施の形態１の変形例〕
上記の実施の形態１では、第１の要因抽出部１０９（ステップ１１０）により、入力属性条件決定部１１１で決定した入力属性条件の中で、分類後基本データ群ＤＡ００の不良品含有率よりも大きい値の不良品分離度をもつ入力属性条件を、不良品の第２データ群ＤＡ２の要因を示す情報として抽出していた。すなわち、決定木の分岐毎に、最終的な分岐条件だけでなく、全ての入力属性について問題事象（不良品の第２データ群ＤＡ２）の要因となる入力属性条件を求め、これらのうちで、不良品分離度の高い入力属性条件を抽出していた。 The present invention is not limited to the embodiments described above, and various modifications are possible within the scope shown in the claims, and embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is also included in the technical scope of the present invention.
[Modification of First Embodiment]
In the first embodiment, the first factor extraction unit 109 (step 110) sets the input attribute condition determined by the input attribute condition determination unit 111 to be lower than the defective content rate of the classified basic data group DA 00. An input attribute condition having a large value of defective product separation is extracted as information indicating a factor of the second defective data group DA2. That is, for each branch of the decision tree, not only the final branch condition, but also input attribute conditions that cause a problem (second defective data group DA2) for all input attributes are obtained. Input attribute conditions with a high degree of rejection were extracted.

しかし、データ分析の目的によっては、このような詳細な分析よりも、簡潔さが要求される場合もあり得る。このような場合には、決定木における分岐条件のみを抽出すればよい。 However, depending on the purpose of the data analysis, simplicity may be required rather than such detailed analysis. In such a case, only the branch condition in the decision tree need be extracted.

本変形例では、実施の形態１（図１３）のデータ分割部１１５の後段に、データ分割部１１５が抽出した、最大の分割ルール評価値をもつ入力属性条件（決定木における分岐条件）を、問題事象（不良品の第２データ群ＤＡ２）の要因となる入力属性条件として抽出する第２の要因抽出部（図示せず）を備えたものである。この場合、第１の要因抽出部１０９は、省略可能である。 In this modified example, the input attribute condition (the branch condition in the decision tree) having the maximum division rule evaluation value extracted by the data division unit 115 is provided at the subsequent stage of the data division unit 115 of the first embodiment (FIG. 13). It is provided with a second factor extraction unit (not shown) that extracts as an input attribute condition that causes a problem event (defective second data group DA2). In this case, the first factor extraction unit 109 can be omitted.

〔実施の形態２〕
本発明の他の実施形態を以下に説明する。なお、説明の便宜上、前記実施の形態１にて示した各部材と同一の機能を有する部材には、同一の符号を付記し、その説明を省略する。 [Embodiment 2]
Another embodiment of the present invention will be described below. For the sake of convenience, members having the same functions as those described in the first embodiment are denoted by the same reference numerals, and description thereof is omitted.

まず、本実施形態のデータ分析装置を図１に基づいて説明する。 First, a data analyzer according to the present embodiment will be described with reference to FIG.

図１に示すように、データ分析装置は、文字−数値データ変換部１、分析対象データ格納部２、閾値設定部（閾値設定手段）３、データ分類部（分類手段）４、データ列抽出部５、頻度演算部（第１の評価手段の中の頻度演算手段）６、頻度累積差演算部（第１の評価手段の中の差分演算手段）７、入力属性閾値決定部（閾値決定手段）８、頻度累積比率演算部（第２の評価手段）１６、第２の要因抽出部（第２の要因抽出手段）９、要因未発見データ抽出部（分割手段）１０、終了条件判定部（終了条件判定手段）１１、入力属性閾値テーブル作成部１２、寄与率演算部１３、分析結果データ格納部１４、および出力部１５を備えている。 As shown in FIG. 1, the data analyzer includes a character-numerical data conversion unit 1, an analysis target data storage unit 2, a threshold setting unit (threshold setting unit) 3, a data classification unit (classification unit) 4, and a data string extraction unit. 5, frequency calculation unit (frequency calculation unit in the first evaluation unit) 6, frequency cumulative difference calculation unit (difference calculation unit in the first evaluation unit) 7, input attribute threshold value determination unit (threshold value determination unit) 8, frequency cumulative ratio calculation unit (second evaluation unit) 16, second factor extraction unit (second factor extraction unit) 9, factor undiscovered data extraction unit (division unit) 10, termination condition determination unit (end) Condition determination means) 11, an input attribute threshold value table creation unit 12, a contribution ratio calculation unit 13, an analysis result data storage unit 14, and an output unit 15.

次に、次の表４８のデータ群ＤＡを分析対象とする場合を例にとって、本実施形態のデータ分析方法を図２に基づいて説明する。表４８のデータ群ＤＡは、ハードディスク等の分析対象データ格納部２に格納されている。 Next, a data analysis method according to the present embodiment will be described with reference to FIG. 2, taking as an example a case where the data group DA in Table 48 below is an analysis target. The data group DA in Table 48 is stored in the analysis target data storage unit 2 such as a hard disk.

表４８のデータ群ＤＡは、１〜１２のｉｄ（識別子）を持つ１２個のデータから構成されている。表４８において、ｘ１，ｘ２，ｘ３，ｘ４は入力属性である。入力属性ｘ１は４つの文字Ａ，Ｂ，Ｃ，Ｄのいずれかをとる文字属性である。入力属性ｘ２は４つの文字ａ，ｂ，ｃ，ｄのいずれかをとる文字属性である。入力属性ｘ３は４つの離散値１，２，３，４のいずれかをとる離散属性である。入力属性ｘ４は４つの離散値１０，２０，３０，４０のいずれかをとる離散属性である。なお、入力属性は、連続した数値をとる連続属性でもよい。 The data group DA in Table 48 includes 12 pieces of data having ids (identifiers) of 1 to 12. In Table 48, x1, x2, x3, and x4 are input attributes. The input attribute x1 is a character attribute that takes one of the four characters A, B, C, and D. The input attribute x2 is a character attribute that takes one of the four characters a, b, c, and d. The input attribute x3 is a discrete attribute that takes one of four discrete values 1, 2, 3, and 4. The input attribute x4 is a discrete attribute that takes one of four discrete values 10, 20, 30, and 40. Note that the input attribute may be a continuous attribute that takes a continuous numerical value.

また、表４８において、ｙは出力属性である。出力属性は、文字属性であってもよく、離散属性でもよく、また連続属性でもよいが、ここでは、３つの文字Ｘ，Ｙ，Ｚのいずれかをとる文字属性である。 In Table 48, y is an output attribute. The output attribute may be a character attribute, a discrete attribute, or a continuous attribute. In this case, the output attribute is one of three characters X, Y, and Z.

なお、分析対象データの例としては、例えば、入力属性が、製品の製造工程における製造プロセス条件および／またはインライン検査結果（製造ライン途中での検査結果）、出力属性が製品の品質判定結果、ｙ＝Ｙなる問題事象が品質判定結果の不良であるデータが挙げられる。この場合、本実施形態のデータ分析方法により入力属性と出力属性との因果関係を分析し、ｙ＝Ｙなる問題事象の要因を導き出すことで、デバイス特性不良等の不良品の発生を解消する対策を容易に図ることが可能となる。したがって、歩留まりの向上等のような製造プロセスの改善を容易に図ることが可能となる。 As an example of the analysis target data, for example, the input attribute is a manufacturing process condition and / or an in-line inspection result (inspection result in the middle of a manufacturing line) in a product manufacturing process, the output attribute is a product quality determination result, and y Data in which the problem event of = Y is a defect in the quality determination result. In this case, a causal relationship between the input attribute and the output attribute is analyzed by the data analysis method of the present embodiment, and the cause of the problem event y = Y is derived to eliminate the occurrence of defective products such as device characteristic defects. Can be easily achieved. Therefore, it is possible to easily improve a manufacturing process such as an improvement in yield.

分析対象データのより具体的な例としては、例えば、入力属性ｘ１、ｘ２、ｘ３、ｘ４が、プラズマＣＶＤプロセスの、ガス流量、ガス圧力、投入電力、成膜時間などのプロセスデータで、出力属性ｙが、プラズマＣＶＤプロセスで形成される薄膜の膜厚であるようなデータが挙げられる。また、これら入力属性および出力属性の値は、連続属性でも離散属性でも文字属性でもよい。文字属性の場合には、例えば、出力属性が膜厚の例で、‘大’、‘中’、‘小’といった具合に表現される。 As a more specific example of the analysis target data, for example, the input attributes x1, x2, x3, and x4 are process data such as gas flow rate, gas pressure, input power, and film formation time of the plasma CVD process, and the output attribute is There is data such that y is the thickness of a thin film formed by a plasma CVD process. Also, the values of these input attributes and output attributes may be continuous attributes, discrete attributes, or character attributes. In the case of the character attribute, for example, the output attribute is an example of the film thickness, and is expressed as “large”, “medium”, “small”.

［ステップ０］
まず、文字−数値データ変換部１が、ハードディスク等の分析対象データ格納部２に格納された表４８のデータ群ＤＡにおける文字属性を下記の変換ルールに従って数値属性（数値データ）に変換する（Ｓ０）。これにより、各データは、数値データに変換される。そして、文字−数値データ変換部１は、変換されたデータ群をデータ分類部４に送る。
（ｘ１）Ａ→１、Ｂ→２、Ｃ→３、Ｄ→４
（ｘ２）ａ→１、ｂ→２、ｃ→３、ｄ→４
（ｘ３）変換せず
（ｘ４）変換せず
（ｙ）Ｘ→１、Ｙ→２、Ｚ→３
この変換ルールは、可能な限り、変換後の入力属性の数値が大きいほど出力属性の数値が大きくなるようにあるいはその逆順となるように設定されることが好ましい。なお、変換ルールは、一義性さえあればよく、上記の例に限られない。 [Step 0]
First, the character-numerical data converter 1 converts character attributes in the data group DA of Table 48 stored in the analysis target data storage 2 such as a hard disk into numerical attributes (numerical data) according to the following conversion rules (S0). ). Thus, each data is converted into numerical data. Then, the character / numerical data conversion unit 1 sends the converted data group to the data classification unit 4.
(X1) A → 1, B → 2, C → 3, D → 4
(X2) a → 1, b → 2, c → 3, d → 4
(X3) No conversion (x4) No conversion (y) X → 1, Y → 2, Z → 3
It is preferable that the conversion rule is set so that the numerical value of the output attribute increases as the numerical value of the converted input attribute increases, or the reverse order. The conversion rules need only be unique, and are not limited to the above example.

上記変換ルールにて数値データに変換されたデータ群ＤＡ０は、表４９に示す通りである。 The data group DA0 converted into the numerical data by the above conversion rule is as shown in Table 49.

この変換により、得られたデータ群ＤＡ０は、数値属性となる複数の入力属性（説明属性）と出力属性（目的属性）とで構成されるデータの集合となる。以下、データ群ＤＡ０を基本データ群と呼ぶ事にする。 By this conversion, the obtained data group DA0 is a set of data composed of a plurality of input attributes (description attributes) and output attributes (object attributes) that are numerical attributes. Hereinafter, the data group DA0 is referred to as a basic data group.

［ステップ１］
閾値設定部３は、予め定められた設定情報に従って、あるいは使用者が図示しないキーボードやマウス等の入力部から問題事象の属性値ｙ＝Ｙを入力したことに応答して、データ群ＤＡのｙ＝Ｙなる問題事象に対応する基本データ群ＤＡ０の出力属性ｙの閾値（出力属性閾値）ｙ_thを設定し、データ分類部４に出力する（Ｓ１）。この例においては、データ群ＤＡのｙ＝Ｙなる問題事象に対応する基本データ群ＤＡ０の出力属性ｙの閾値ｙ_thは、ｙ_th＝２である。 [Step 1]
The threshold setting unit 3 responds to the input of the attribute value y = Y of the problem event from the input unit such as a keyboard or a mouse (not shown) according to predetermined setting information or in response to the user inputting y of the data group DA. = Y consisting of the basic data group DA0 corresponding problem event output attribute y threshold set (output attribute threshold) y _th, and outputs the data classification section 4 (S1). In this example, the threshold value y _th of the output attribute y of the basic data group DA0 corresponding to the problem event y = Y of the data group DA is y _th = 2.

［ステップ２］
次に、データ分類部４が、基本データ群ＤＡ０の出力属性ｙの値と、閾値設定部３から出力された出力属性閾値ｙ_thとの比較論理（１）（２）に基づいて、基本データ群ＤＡ０を、第１データ群ＤＡ１と第２データ群ＤＡ２とに２分化（分類）する（Ｓ２）。 [Step 2]
Next, the data classification unit 4 performs basic data comparison based on the comparison logic (1) (2) between the value of the output attribute y of the basic data group DA0 and the output attribute threshold y _th output from the threshold setting unit 3. The group DA0 is divided into two (classification) into a first data group DA1 and a second data group DA2 (S2).

（１）ｙ＞ｙ_thまたはｙ＜ｙ_th→ＤＡ１
（２）ｙ＝ｙ_th→ＤＡ２
言い換えると、データ分類部４は、基本データ群ＤＡ０を、出力属性が出力属性閾値ｙ_thと一致しない（すなわち１または３である）第１データ群ＤＡ１と、出力属性が出力属性閾値ｙ_th（＝２）と一致する第２データ群ＤＡ２とに分類する。第２データ群ＤＡ２は問題事象（例えば、デバイス特性不良など）のデータ群である。すなわち、第２データ群ＤＡ２は出力属性ｙが問題事象を表す属性値（２）であるデータ群であり、第１データ群ＤＡ１は出力属性ｙが問題事象を表していない属性値（１または３）であるデータ群である。 (1) y> y _th or y <y _th → DA1
(2) y = y _th → DA2
In other words, the data classification unit 4 classifies the basic data group DA0 into a first data group DA1 whose output attribute does not match the output attribute threshold y _th (that is, 1 or 3), and an output attribute threshold y _th ( = 2) and a second data group DA2 that matches the second data group DA2. The second data group DA2 is a data group of problem events (for example, poor device characteristics). That is, the second data group DA2 is a data group in which the output attribute y is the attribute value (2) representing the problem event, and the first data group DA1 is the attribute value (1 or 3) in which the output attribute y does not represent the problem event. ).

第１データ群ＤＡ１を表５０に、第２データ群ＤＡ２を表５１に示す。 Table 50 shows the first data group DA1 and Table 51 shows the second data group DA2.

［ステップ３］
次に、データ列抽出部５が、良品データ群ＤＡ１（表５０）から、入力属性ｘｊ（１≦ｊ≦４）の各々のデータ列を抽出する（Ｓ３）。このデータ列を１−ｘｊデータ群と呼ぶ事にする。 [Step 3]
Next, the data string extraction unit 5 extracts each data string of the input attribute xj (1 ≦ j ≦ 4) from the conforming data group DA1 (Table 50) (S3). This data sequence will be referred to as a 1-xj data group.

同様に、データ列抽出部５は、不良品データ群ＤＡ２（表５１）からも、入力属性ｘｊ（１≦ｊ≦４）の各々のデータ列を抽出する（Ｓ３）。このデータ列を２−ｘｊデータ群と呼ぶ事にする。 Similarly, the data string extracting unit 5 extracts each data string of the input attribute xj (1 ≦ j ≦ 4) from the defective data group DA2 (Table 51) (S3). This data string will be referred to as a 2-xj data group.

１−ｘｊデータ群を表５２〜５５に、２−ｘｊデータ群を表５６〜５９に示す。 Tables 52 to 55 show 1-xj data groups, and Tables 56 to 59 show 2-xj data groups.

［ステップ４］
頻度演算部６は、ステップ３で良品データ群ＤＡ１から抽出された１−ｘｊデータ群の各々、およびステップ３で不良品データ群ＤＡ２から抽出された２−ｘｊデータ群の各々を、入力属性ｘｊの値で昇順に並べ替える（並べ替え処理１）。そして、入力属性ｘｊの個々の数値について、良品データ群ＤＡ１中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す１−ｘｊ頻度累積％（第１の頻度）と、不良品データ群ＤＡ２中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す２−ｘｊ頻度累積％（第２の頻度）とを計算する（Ｓ４）。 [Step 4]
The frequency calculating unit 6 calculates each of the 1-xj data groups extracted from the non-defective data group DA1 in step 3 and each of the 2-xj data groups extracted from the defective data group DA2 in step 3 as the input attribute xj (Sorting process 1). Then, for each numerical value of the input attribute xj, 1-xj frequency cumulative% (first frequency) representing the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the non-defective data group DA1, In the group DA2, 2-xj frequency cumulative% (second frequency) representing the ratio of the number of data whose input attribute xj is equal to or less than the numerical value is calculated (S4).

ここでは、表５２〜５５を入力属性ｘｊの値で昇順に並べ替えた表６０〜６３を用い、各行（ｉｄ）のデータについて表中でそのデータの位置以上の位置にあるデータ個数の、第１データ群の全データ数（＝８）に対する割合を１−ｘｊ頻度累積％として計算している。同様に、表５６〜５９を入力属性ｘｊの値で昇順に並べ替えた表６４〜６８を用い、各行（ｉｄ）のデータについて表中でそのデータの位置以上の位置にあるデータ個数の、第２データ群の全データ数（＝４）に対する割合を２−ｘｊ頻度累積％として計算している
ここで計算した１−ｘｊ頻度累積％および２−ｘｊ頻度累積％の値を表６０〜６７に示す。 Here, Tables 60 to 63 obtained by rearranging Tables 52 to 55 in ascending order by the value of the input attribute xj are used. For the data of each row (id), the number of data at the position equal to or higher than the data position in the table is calculated. The ratio of one data group to the total number of data (= 8) is calculated as 1-xj frequency cumulative%. Similarly, by using Tables 64 to 68 obtained by rearranging Tables 56 to 59 in ascending order by the value of the input attribute xj, the data of each row (id) is counted as the number of data in the table at a position equal to or more than the position of the data. The ratio of the 2-data group to the total number of data (= 4) is calculated as 2-xj frequency cumulative%. The calculated values of 1-xj frequency cumulative% and 2-xj frequency cumulative% are shown in Tables 60 to 67. Show.

なお、上述したステップ３・４では、データ列を抽出し、並び替えを行った後に、１−ｘｊ頻度累積％および２−ｘｊ頻度累積％を計算していたが、データ列の抽出や並び替えを行うことなく直接的に１−ｘｊ頻度累積％および２−ｘｊ頻度累積％を計算してもかまわない。 In the above steps 3 and 4, the data strings are extracted and rearranged, and then the 1-xj frequency cumulative% and the 2-xj frequency cumulative% are calculated. Alternatively, the 1-xj frequency cumulative% and the 2-xj frequency cumulative% may be directly calculated without performing.

さらに、頻度演算部６は、１−ｘｊ頻度累積％が計算された良品データ群である１−ｘｊデータ群のテーブルと、２−ｘｊ頻度累積％が計算された不良品データ群である２−ｘｊデータ群のテーブルとを結合する。具体的には、入力属性ｘ１について、表６０と表６４とを結合して表６８のｘ１頻度累積テーブルを、入力属性ｘ２について、表６１と表６５とを結合して表６９のｘ２頻度累積テーブルを、入力属性ｘ３について、表６２と表６６とを結合して表７０のｘ３頻度累積テーブルを、入力属性ｘ４について、表６３と表６７とを結合して表７１のｘ４頻度累積テーブルを、それぞれ作成する。 Further, the frequency calculation unit 6 includes a table of a 1-xj data group, which is a non-defective data group in which 1-xj frequency cumulative% is calculated, and a 2-, defective product data group in which 2-xj frequency cumulative% is calculated. xj data group table. Specifically, for the input attribute x1, the table 60 and the table 64 are combined to form the x1 frequency accumulation table of the table 68, and for the input attribute x2, the table 61 and the table 65 are joined to combine the x2 frequency accumulation table of the table 69. For the input attribute x3, the table 62 and the table 66 are joined to form the x3 frequency accumulation table of Table 70. For the input attribute x4, the table 63 and the table 67 are joined to form the x4 frequency accumulation table of Table 71. , Create each.

さらに、頻度演算部６は、表６８〜７１の各々の頻度累積テーブルを、入力属性ｘｊの値で昇順に並べ替える（並べ替え処理２）。並べ替え処理２の後、１−ｘｊ頻度累積％および２−ｘｊ頻度累積％の空欄に、上の空欄から順に、その直上の値（１行上のデータの値）を代入する（代入処理）。その後、入力属性ｘｊにおいて同じ値が続いている行に対し、それらの行のうちの最終行のデータのみを採用する（重複処理）。こうして、頻度演算部６にて、表７２〜表７５に示すように、入力属性ｘｊの各値に対して、良品データ群である第１データ群ＤＡ１中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す１−ｘｊ頻度累積％（Ａ；第１の頻度）と、不良品データ群である第２データ群ＤＡ２中において、入力属性ｘｊがその数値以下であるデータ個数の割合を表す２−ｘｊ頻度累積％（Ｂ；第２の頻度）との双方が算出される（Ｓ４）。 Further, the frequency calculation unit 6 rearranges each of the frequency accumulation tables in Tables 68 to 71 in ascending order by the value of the input attribute xj (rearrangement process 2). After the rearrangement process 2, the value immediately above (the value of the data on one line) is substituted into the blanks of 1-xj frequency cumulative% and 2-xj frequency cumulative% in order from the upper blank (substitution process). . After that, for the rows where the same value continues in the input attribute xj, only the data of the last row among those rows is adopted (duplication processing). In this way, as shown in Tables 72 to 75, the frequency calculation unit 6 determines that the input attribute xj is equal to or less than the numerical value in the first data group DA1, which is a non-defective data group, for each value of the input attribute xj. 1-xj frequency cumulative% (A; first frequency) representing the ratio of a certain number of data, and the ratio of the number of data whose input attribute xj is equal to or less than the numerical value in the second data group DA2 which is a defective data group Is calculated (B; second frequency) (S4).

［ステップ５］
次に、頻度累積差演算部７が、入力属性ｘｊの各値に対して、良品の１−ｘｊ頻度累積％（Ａ）と、不良品の２−ｘｊ頻度累積％（Ｂ）の差分（＝｜Ａ−Ｂ｜）を計算する（Ｓ５）。この差分値を、ｘｊ頻度累積差％（＝｜Ａ−Ｂ｜）と呼ぶ。ｘｊ頻度累積差％の計算結果を表７２〜表７５に示す。 [Step 5]
Next, the frequency cumulative difference calculation unit 7 calculates, for each value of the input attribute xj, the difference between the 1-xj frequency cumulative% (A) of the non-defective product and the 2-xj frequency cumulative% (B) of the defective product (= | AB |) is calculated (S5). This difference value is called xj frequency cumulative difference% (= | A−B |). Tables 72 to 75 show the calculation results of the xj frequency cumulative difference%.

入力属性ｘｊの値と、良品の１−ｘｊ頻度累積％（Ａ）、不良品の２−ｘｊ頻度累積％（Ｂ）、ｘｊ頻度累積差％｜Ａ−Ｂ｜との関係を図３〜図６に示す。 The relationship between the value of the input attribute xj and the 1-xj frequency cumulative% (A) of non-defective products, the 2-xj frequency cumulative% (B) of defective products, and the xj frequency cumulative difference% | A-B | 6 is shown.

入力属性ｘｊの各数値に対するｘｊ頻度累積差％｜Ａ−Ｂ｜は、入力属性ｘｊがその数値以下の範囲と、入力属性ｘｊがその数値を超える範囲との２分化によって、良品の第１データ群ＤＡ１と不良品の第２データ群ＤＡ２とがうまく切り分けられているかを表す指標である。言い換えると、ｘｊ頻度累積差％｜Ａ−Ｂ｜は、入力属性がその数値以下であるデータが第１データ群および第２データ群のうちの一方に偏っている度合い、および、入力属性がその数値を超えるデータが第１データ群および第２データ群のうちの他方に偏っている度合いを総合的に表す閾値評価指標である。 The xj frequency cumulative difference% | AB | for each numerical value of the input attribute xj is obtained by dividing the range of the input attribute xj equal to or less than the numerical value and the range of the input attribute xj exceeding the numerical value into the first data of the non-defective product. This is an index indicating whether the group DA1 and the second data group DA2 of defective products are properly separated. In other words, the xj frequency cumulative difference% | AB | is the degree to which the data whose input attribute is equal to or less than the numerical value is biased toward one of the first data group and the second data group, and the input attribute is This is a threshold evaluation index that comprehensively indicates the degree to which data exceeding the numerical value is biased toward the other of the first data group and the second data group.

すなわち、入力属性ｘｊの各数値に対するｘｊ頻度累積差％は、「入力属性ｘｊがその数値以下」または「入力属性ｘｊがその数値を超える」という入力属性条件に対し、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標とみなすことができる。 That is, the xj frequency cumulative difference% with respect to each numerical value of the input attribute xj is calculated based on the input attribute condition that “the input attribute xj is equal to or less than the numerical value” or “the input attribute xj exceeds the numerical value”. Is satisfied, the data belongs to the second data group in the analysis data group. If the input attribute does not satisfy the input attribute condition, the data belongs to the first data group in the analysis data group. " It can be regarded as an input attribute condition evaluation index indicating the certainty of the rule.

なお、上記では、閾値評価指標としてｘｊ頻度累積差％｜Ａ−Ｂ｜を演算しているが、各数値に対する閾値評価指標として、データの偏りの度合いを評価する指標、例えば、情報利得（ゲイン）、情報利得比、Ｇｉｎｉインデックス、平均自乗誤差等を用いてもよい。 In the above description, xj frequency cumulative difference% | AB | is calculated as a threshold evaluation index. However, as a threshold evaluation index for each numerical value, an index for evaluating the degree of data bias, for example, information gain (gain) ), Information gain ratio, Gini index, mean square error, etc. may be used.

［ステップ６］
入力属性閾値決定部８が、各入力属性ｘｊについて、それぞれ、ｘｊの個々の値の中で、ｘｊ頻度累積差％｜Ａ−Ｂ｜の値が最大となるときの入力属性ｘｊの値を抽出する（Ｓ６）。この値を、入力属性閾値ｘｊ−ｔｈと呼ぶ事にする。 [Step 6]
The input attribute threshold value determining unit 8 extracts, for each input attribute xj, the value of the input attribute xj when the value of the xj frequency cumulative difference% | AB | (S6). This value will be referred to as an input attribute threshold xj-th.

入力属性閾値ｘｊ−ｔｈは、図３〜図６を参照して分かるように、ｘｊ≦ｘｊ−ｔｈの範囲と、ｘｊ＞ｘｊ−ｔｈの範囲との２分化によって、良品の第１データ群ＤＡ１と、不良品の第２データ群ＤＡ２との切分けが最も容易となる入力属性ｘｊの値を示している。 As can be seen with reference to FIGS. 3 to 6, the input attribute threshold value xj-th is obtained by dividing the non-defective first data group DA1 into two ranges: xj ≦ xj-th and xj> xj-th. And the value of the input attribute xj that makes it easy to separate the defective data from the second data group DA2.

なお、ここでは、複数の入力属性について第３ステップ〜第６ステップの処理を一括して行っているが、ｊの値を１からＮまで順次増加させて第３ステップ〜該第６ステップの処理を繰り返してもよい。 Here, the processing of the third step to the sixth step is collectively performed for a plurality of input attributes, but the value of j is sequentially increased from 1 to N, and the processing of the third step to the sixth step is performed. May be repeated.

［ステップ７］
次に、頻度累積比率演算部１６が、ｘｊ＝ｘｊ−ｔｈにおいて、良品の１−ｘｊ頻度累積％（Ａ）に対する、不良品の２−ｘｊ頻度累積％（Ｂ）の比率を計算する（Ｓ７）。この比率を、２−ｘｊｔｈ下比率（＝Ｂ／Ａ）と呼ぶ事にする。また、１００から良品の１−ｘｊ頻度累積％（Ａ）を引いた値（＝１００−Ａ）に対する、１００から不良品の２−ｘｊ頻度累積％（Ｂ）を引いた値（＝１００−Ｂ）の比率を計算する（Ｓ７）。この比率を、２−ｘｊｔｈ上比率（＝（１００−Ｂ）／（１００−Ａ））と呼ぶ事にする。そして、双方の比率のうちの大きい方の値を表す、２−ｘｊｔｈ比率を抽出する。 [Step 7]
Next, at xj = xj-th, the frequency cumulative ratio calculation unit 16 calculates the ratio of the defective product 2-xj frequency cumulative% (B) to the non-defective 1-xj frequency cumulative% (A) (S7). ). This ratio is referred to as a 2-xjth lower ratio (= B / A). In addition, a value obtained by subtracting 2-xj frequency cumulative% (B) of defective products from 100 (= 100-A) is a value obtained by subtracting 1-xj frequency cumulative% (A) of good products from 100 (= 100-A). ) Is calculated (S7). This ratio will be referred to as a 2-xjth upper ratio (= (100−B) / (100−A)). Then, a 2-xjth ratio representing the larger value of both ratios is extracted.

ここで、２−ｘｊｔｈ下比率は、「ｘｊ≦ｘｊ−ｔｈ」という入力属性条件により、良品の第１データ群と分離して不良品の第２データ群を検出できる割合を表している。また、２−ｘｊｔｈ上比率は、「ｘｊ＞ｘｊ−ｔｈ」という入力属性条件により、良品の第１データ群と分離して不良品の第２データ群を検出できる割合を表している。 Here, the 2-xjth lower ratio indicates a ratio at which the second data group of defective products can be detected separately from the first data group of non-defective products under the input attribute condition of “xj ≦ xj-th”. The 2-xjth upper ratio represents a ratio at which the second data group of defective products can be detected separately from the first data group of non-defective products under the input attribute condition of “xj> xj-th”.

言い換えると、２−ｘｊｔｈ下比率は、「入力属性ｘｊが入力属性閾値ｘｊ−ｔｈ以下であれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す評価値（第１のルール評価値）を表している。また、２−ｘｊｔｈ上比率は、「入力属性ｘｊが入力属性閾値ｘｊ−ｔｈを超えていれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す評価値（第２のルール評価値）を表している。 In other words, the 2-xjth lower ratio is an evaluation value (the first value) representing the likelihood of the correlation rule that “if the input attribute xj is equal to or smaller than the input attribute threshold xj-th, the data is included in the second data group”. (Rule evaluation value). The 2-xjth upper ratio is an evaluation value (second value) indicating the certainty of the correlation rule that "if the input attribute xj exceeds the input attribute threshold value xj-th, the data is included in the second data group". (Rule evaluation value).

入力属性閾値決定部８で各入力属性ｘｊに対して決定（抽出）された入力属性閾値ｘｊ−ｔｈ、ｘｊ＝ｘｊ−ｔｈにおける、良品の１−ｘｊ頻度累積％（Ａ）、不良品の２−ｘｊ頻度累積％（Ｂ）、ｘｊ頻度累積差％｜Ａ−Ｂ｜、２−ｘｊｔｈ下比率Ｂ／Ａ、２−ｘｊｔｈ上比率（１００−Ｂ）／（１００−Ａ）、２−ｘｊｔｈ比率の各値を表７６に示す。 At the input attribute threshold values xj-th and xj = xj-th determined (extracted) for each input attribute xj by the input attribute threshold value determination unit 8, 1-xj frequency cumulative% (A) of good products and 2 of defective products -Xj frequency cumulative% (B), xj frequency cumulative difference% | A-B |, 2-xjth lower ratio B / A, 2-xjth upper ratio (100-B) / (100-A), 2-xjth ratio Are shown in Table 76.

［ステップ８］
第２の要因抽出部９が、ｘ１〜ｘ４の入力属性のうち、上記ステップ７の２−ｘｊｔｈ比率が最大となる入力属性を抽出する（Ｓ８）。これにより、２−ｘｊｔｈ比率が最大となる入力属性と、その閾値、採用した比率の種別（上、下）が第２データ群に対応する出力属性条件の要因（入力属性条件）を示すデータとして抽出されることになる。これは、全ての入力属性に関する前記相関ルールのうちで最も高い２−ｘｊｔｈ下比率または２−ｘｊｔｈ上比率を持つ相関ルールの入力属性条件を示すデータを抽出することに相当する。 [Step 8]
The second factor extraction unit 9 extracts the input attribute that maximizes the 2-xjth ratio in step 7 from among the input attributes x1 to x4 (S8). As a result, the input attribute that maximizes the 2-xjth ratio, the threshold value thereof, and the type (upper or lower) of the adopted ratio indicate data indicating the factor (input attribute condition) of the output attribute condition corresponding to the second data group. Will be extracted. This corresponds to extracting data indicating the input attribute condition of the correlation rule having the highest 2-xjth lower ratio or the 2-xjth upper ratio among the association rules for all input attributes.

なお、ここでは、最大のルール評価値を持つ相関ルールの入力属性を抽出するための指標として２−ｘｊｔｈ比率を演算しているが、最大のルール評価値を持つ相関ルールの入力属性を抽出するための指標として、他の評価指標、例えば、支持率（サポート）、確信度（コンフィデンス）、情報利得（ゲイン）、情報利得比、Ｇｉｎｉインデックス、平均自乗誤差等を用いてもよい。 Here, the 2-xjth ratio is calculated as an index for extracting the input attribute of the correlation rule having the maximum rule evaluation value, but the input attribute of the correlation rule having the maximum rule evaluation value is extracted. For example, other evaluation indices, for example, a support rate (support), a certainty factor (confidence), an information gain (gain), an information gain ratio, a Gini index, a mean square error, and the like may be used.

表７６を参照して、入力属性ｘ２＝ｘ２−ｔｈ＝２のとき、２−ｘ２ｔｈ比率＝２−ｘ２ｔｈ上比率＝∞となっている。これは、入力属性条件「ｘ２＞２」にて、良品の第１データ群ＤＡ１と完全に分離して、不良品の第２データ群ＤＡ２を検出できる事を示しており、この事は、図４を参照すると、より理解しやすい。 Referring to Table 76, when input attribute x2 = x2-th = 2, 2-x2th ratio = 2-x2th upper ratio = ∞. This indicates that under the input attribute condition “x2> 2”, the second data group DA2 of the defective product can be detected completely separated from the first data group DA1 of the non-defective product. 4 makes it easier to understand.

上記抽出された、入力属性（＝ｘ２）、該入力属性の値を表す入力属性閾値（＝２）、および採用した比率の種別（＝上）のデータを分析結果データ格納部１４に保存する。 The extracted data of the input attribute (= x2), the input attribute threshold (= 2) representing the value of the input attribute, and the type of the adopted ratio (= upper) are stored in the analysis result data storage unit 14.

以上のようにして、問題事象（不良品の第２データ群ＤＡ２）の一要因として、「ｘ２＞２」という入力属性条件が抽出された。
［ステップ９］
上記ステップ８にて、問題事象（不良品の第２データ群ＤＡ２）の一要因として、「ｘ２＞２」という入力属性条件が抽出されたので、次に、別の要因を調査する。このため、要因未発見データ抽出部１０が、基本データ群ＤＡ０（表４９）を入力属性条件「ｘ２＞２」を満たすデータ群（要因データ群）と、基本データ群ＤＡ０（表４９）の中で問題事象の要因をまだ発見できていないデータ群（他データ群）、すなわち入力属性条件「ｘ２≦２」を満たす（入力属性条件「ｘ２＞２」を満たさない）データ群とに分割し、問題事象の要因をまだ発見できていないデータ群を抽出する（Ｓ９；表７７参照）。 As described above, the input attribute condition “x2> 2” was extracted as one factor of the problem event (the second data group DA2 of defective products).
[Step 9]
In step 8 described above, the input attribute condition “x2> 2” was extracted as one factor of the problem event (the second data group DA2 of defective products). Next, another factor is investigated. For this reason, the factor undiscovered data extraction unit 10 stores the basic data group DA0 (Table 49) in the data group (factor data group) satisfying the input attribute condition “x2> 2” and the basic data group DA0 (Table 49). And a data group in which the cause of the problem event has not been found yet (another data group), that is, a data group satisfying the input attribute condition “x2 ≦ 2” (not satisfying the input attribute condition “x2> 2”). A data group for which the cause of the problem event has not been found yet is extracted (S9; see Table 77).

要因未発見データ抽出部１０は、抽出されたデータ群を次の（新しい）基本データ群ＤＡ０としてデータ分類部４に送る。 The factor undiscovered data extraction unit 10 sends the extracted data group to the data classification unit 4 as the next (new) basic data group DA0.

［ステップ１０］
そして、ステップ９で抽出されたデータ群を次の基本データ群ＤＡ０として、終了条件判定部１１で終了条件を満たしていると判定されるまで、上記のステップ２〜ステップ９の処理が繰り返される。すなわち、ステップ９で抽出されたデータ群を次の基本データ群ＤＡ０として、終了条件判定部１１で終了条件を満たしていると判定されるまで、上記のステップ２〜ステップ９の処理が繰り返される。本実施形態の終了条件判定部１１は、繰返し処理中の上記ステップ２において不良品の第２データ群ＤＡ２のデータ個数が０となった場合を終了条件と判定するようになっている。このように不良品の第２データ群ＤＡ２のデータ個数が０となるまで繰り返し処理を実行することにより、詳細な要因分析結果が得られる。 [Step 10]
Then, the data group extracted in step 9 is set as the next basic data group DA0, and the above-described steps 2 to 9 are repeated until the end condition determination unit 11 determines that the end condition is satisfied. That is, the data group extracted in step 9 is set as the next basic data group DA0, and the above-described steps 2 to 9 are repeated until the end condition determination unit 11 determines that the end condition is satisfied. The termination condition determination unit 11 of the present embodiment is configured to determine, when the number of data of the second data group DA2 of defective products becomes 0 in the above-described step 2 during the repetitive processing, a termination condition. As described above, by repeatedly executing the processing until the number of data of the second data group DA2 of the defective product becomes 0, a detailed factor analysis result can be obtained.

なお、終了条件は、第２データ群ＤＡ２のデータ個数に基づく他の終了条件、例えば、（１）繰返し処理中の上記ステップ２において第２データ群ＤＡ２のデータ個数が所定数以下となった場合、（２）繰返し処理中の上記ステップ２において第１データ群ＤＡ１のデータ個数に対する第２データ群ＤＡ２のデータ個数の割合が所定割合以下となった場合、（３）繰返し処理中の上記ステップ８において抽出された入力属性条件のルール評価値が所定の値を下回った場合等としてもよい。これらのような終了条件を用いた場合、より簡潔で十分な要因分析結果を得ることができる。さらに、簡潔な要因分析結果を得ることを優先する場合には、終了条件を単に繰返し処理を所定回数行った場合としたり、終了条件判定部１１を省いて、可能な限り繰り返し処理を行うようにしてもよい。 Note that the end condition is another end condition based on the number of data of the second data group DA2, for example, (1) when the number of data of the second data group DA2 becomes equal to or less than a predetermined number in step 2 during the repetitive processing. (2) If the ratio of the number of data in the second data group DA2 to the number of data in the first data group DA1 is equal to or less than a predetermined ratio in step 2 during the repetition processing, (3) step 8 during the repetition processing May be the case where the rule evaluation value of the input attribute condition extracted in the step <1> falls below a predetermined value. When such termination conditions are used, a simpler and more sufficient factor analysis result can be obtained. Furthermore, when giving priority to obtaining a simple factor analysis result, the termination condition is simply determined to be a case where repetition processing is performed a predetermined number of times, or the termination condition determination unit 11 is omitted, and the processing is repeated as much as possible. You may.

今回の例では、２回目の繰り返し処理中のステップ９で抽出した、要因未発見の、ｘ１≦２のデータ群に不良品のデータ（第２データ群ＤＡ２；ｙ＝２）が含まれていなかったため、繰り返し処理は２回目で（２回目の要因抽出まで行って）終了した。 In this example, defective data (second data group DA2; y = 2) is not included in the data group of x1 ≦ 2 extracted in step 9 during the second repetition processing and for which no factor has been found. Therefore, the repetition process was completed for the second time (until the second factor extraction).

［ステップ１１］
入力属性閾値テーブル作成部１２が、ステップ１０の繰り返し処理毎に抽出された入力属性ｘｊと、入力属性閾値ｘｊ−ｔｈと、採用された比率の種別とを格納した入力属性閾値テーブルを作成する（Ｓ１１；表７８参照）。 [Step 11]
The input attribute threshold table creation unit 12 creates an input attribute threshold table that stores the input attribute xj extracted for each repetition process of step 10, the input attribute threshold xj-th, and the type of the adopted ratio ( S11; see Table 78).

入力属性閾値テーブル作成部１２では、必要に応じて、入力属性閾値テーブルにおける入力属性閾値ｘｊ−ｔｈの数値を文字データに変換する。文字データへの変換ルールは、ステップ０の変換の逆変換となるルールであり、下記の通りである。
（ｘ１）１→Ａ、２→Ｂ、３→Ｃ、４→Ｄ
（ｘ２）１→ａ、２→ｂ、３→ｃ、４→ｄ
（ｘ３）変換せず
（ｘ４）変換せず
表７８の入力属性閾値テーブルにおける入力属性閾値ｘｊ−ｔｈを文字データに変換した入力属性閾値テーブルを表７９に示す。 The input attribute threshold table creation unit 12 converts the numerical value of the input attribute threshold xj-th in the input attribute threshold table into character data as necessary. The conversion rule to character data is a rule that is the reverse of the conversion in step 0, and is as follows.
(X1) 1 → A, 2 → B, 3 → C, 4 → D
(X2) 1 → a, 2 → b, 3 → c, 4 → d
(X3) No conversion (x4) No conversion Table 79 shows an input attribute threshold table obtained by converting the input attribute threshold xj-th in the input attribute threshold table of Table 78 into character data.

この入力属性閾値テーブルは、特許文献１に記載の従来の決定木−２（図１２）において、出力属性ｙ＝Ｙ（ｙ＝２）の切分けに着目した場合の決定木の分類条件に対応する。 This input attribute threshold table corresponds to the classification condition of the decision tree in the case of focusing on the separation of the output attribute y = Y (y = 2) in the conventional decision tree-2 (FIG. 12) described in Patent Document 1. I do.

［ステップ１２］
次に、寄与率演算部１３が、表７８の入力属性閾値テーブルから、抽出された入力属性条件の、問題事象（ｙ＝２：不良品データ群である、元の第２データ群ＤＡ２）に対する寄与率（全不良数の中で、当該入力属性条件に起因する不良数の割合）を求める。 [Step 12]
Next, the contribution ratio calculation unit 13 determines the input attribute conditions extracted from the input attribute threshold table of Table 78 with respect to the problem event (y = 2: the original second data group DA2 which is a defective data group). The contribution ratio (the ratio of the number of defects caused by the input attribute condition to the total number of defects) is determined.

表８０は、問題事象（不良品）である元の第２データ群ＤＡ２（表５１）において、その要因として１回目に抽出された「ｘ２＞２」なる入力属性条件、または、２回目に抽出された「ｘ１＞２」なる入力属性条件、に該当するデータに「＊」を付したものである。 Table 80 shows that in the original second data group DA2 (Table 51), which is a problem event (defective product), the input attribute condition of “x2> 2” extracted first or the second extraction as the factor "* 1" is attached to data corresponding to the input attribute condition "x1> 2".

表８０から、問題事象（元の第２データ群ＤＡ２）に対する入力属性条件「ｘ１＞２」、「ｘ２＞２」の寄与率が表８１に示すように求められる。 From Table 80, the contribution rates of the input attribute conditions “x1> 2” and “x2> 2” to the problem event (original second data group DA2) are obtained as shown in Table 81.

表８１において、「ｘ１＞２」と「ｘ１＞２」との交差部に示す寄与率、及び「ｘ２＞２」と「ｘ２＞２」との交差部に示す寄与率は、それぞれ「ｘ１＞２」単独要因の寄与率、及び「ｘ２＞２」単独要因の寄与率を、それぞれ表している。また、「ｘ１＞２」と「ｘ２＞２」との交差部に示す寄与率は何れも、「ｘ１＞２」要因と「ｘ２＞２」要因との複合要因の寄与率を表している。なお、表８１は、図７のようにも表現できる。 In Table 81, the contribution rate at the intersection of “x1> 2” and “x1> 2” and the contribution rate at the intersection of “x2> 2” and “x2> 2” are “x1> 2 ”represents the contribution rate of the single factor, and“ x2> 2 ”represents the contribution rate of the single factor. Each of the contribution ratios indicated at the intersection of “x1> 2” and “x2> 2” represents the contribution ratio of a composite factor of the “x1> 2” factor and the “x2> 2” factor. Table 81 can also be expressed as shown in FIG.

［ステップ１３］
以上でデータ分析を終了し、入力属性閾値テーブル作成部１２で作成された入力属性閾値テーブルや、寄与率のデータが、分析結果データとしてハードディスク等の分析結果データ格納部１４に格納される。この分析結果データは、適宜、分析結果データ格納部１４から表示装置や印刷装置等の出力部１５に送られ、表示装置にて決定木やテーブルとして表示したり、印刷装置にて決定木やテーブルとして印刷したりすることができる。 [Step 13]
The data analysis is completed as described above, and the input attribute threshold table created by the input attribute threshold table creating unit 12 and the data of the contribution ratio are stored in the analysis result data storage unit 14 such as a hard disk as analysis result data. This analysis result data is sent from the analysis result data storage unit 14 to an output unit 15 such as a display device or a printing device as appropriate, and is displayed as a decision tree or table on the display device, or is determined by the printing device. Or can be printed as

本実施形態によれば、特許文献１に記載の、従来の決定木−２（図１２）のように、ラベル階層構造（図１１）を予め定義しなくても、表７９（または表７８）の入力属性閾値テーブルに示したような非常に簡潔な形で、問題事象の要因を導き出せる。そして、これを用いて、問題事象に対する各要因（入力属性）の寄与率を求める事ができる。 According to the present embodiment, as in the conventional decision tree-2 (FIG. 12) described in Patent Literature 1, even if the label hierarchical structure (FIG. 11) is not defined in advance, the table 79 (or table 78) can be used. In the very simple form as shown in the input attribute threshold table, the cause of the problem event can be derived. Using this, the contribution rate of each factor (input attribute) to the problem event can be obtained.

ここで、表７９（または表７８）に示される本実施形態の入力属性閾値テーブルを、決定木の形式で表現すると、図８のように表される。また、従来の決定木−２（図１２）を用いて、図７と同じ形式で、問題事象ｙ＝Ｙ（＝２）に対する各要因の寄与率を表現すると、図９のようになる。 Here, when the input attribute threshold table of the present embodiment shown in Table 79 (or Table 78) is expressed in the form of a decision tree, it is expressed as shown in FIG. FIG. 9 illustrates the contribution rate of each factor to the problem event y = Y (= 2) in the same format as in FIG. 7 using the conventional decision tree-2 (FIG. 12).

本実施形態から導かれる決定木（図８）と、従来の決定木−２（図１２）とを比較すると、本実施形態の場合には、入力属性ｘ３の寄与が表現されていない。これは、図７と図９とを比較して分かるように、問題事象ｙ＝Ｙ（ｙ＝２）が、入力属性ｘ１およびｘ３の、それぞれの単独要因では発生していないからであり、上記の２回目の繰り返し操作中のステップ９において、ｘ１＞２のデータ群に対して繰り返し処理（ステップ１０）を実行しなかった事に因る。 Comparing the decision tree (FIG. 8) derived from the present embodiment with the conventional decision tree-2 (FIG. 12), the contribution of the input attribute x3 is not represented in the present embodiment. This is because, as can be seen by comparing FIG. 7 and FIG. 9, the problem event y = Y (y = 2) does not occur due to the single factors of the input attributes x1 and x3. In step 9 of the second repetition operation, the repetition processing (step 10) was not performed on the data group of x1> 2.

詳細に要因を追求する場合には、入力属性ｘ３の寄与も抽出する必要があるが、問題事象ｙ＝Ｙ（ｙ＝２）を除去する（改善する）事を目的すれば、入力属性ｘ１のみの抽出であってもこの目的を十分に達成できる。本実施形態では、この点に着目し、問題事象に対して対策すべき主要因を抽出しているため、入力属性ｘ３を抽出していない。詳細な分析を必要とする場合には、上記ステップ９で２分化されたデータ群の双方に対して、繰り返し処理（ステップ１０）を実行すればよい。 When pursuing the factors in detail, it is necessary to extract the contribution of the input attribute x3, but if the purpose is to eliminate (improve) the problem event y = Y (y = 2), only the input attribute x1 is used. This purpose can be sufficiently achieved even with the extraction of In the present embodiment, attention is paid to this point, and the main factor to be dealt with for the problem event is extracted, so the input attribute x3 is not extracted. When a detailed analysis is required, the repetitive processing (step 10) may be performed on both of the data groups divided into two in step 9 described above.

なお、上述した実施形態では、複数の要因を導き出し決定木を生成していたが、単に一つの要因だけを抽出したい場合であれば、ステップ８で終了してもよい。 In the above-described embodiment, a plurality of factors are derived and a decision tree is generated. However, if only one factor is to be extracted, the process may end in step 8.

以上で説明したデータ分析方法は、コンピュータが図２のＳ０〜Ｓ１２（ステップ０〜１３）に対応するプロセスを含むデータ分析プログラムを実行することによって実現できる。したがって、図１のデータ分析装置は、データ分析プログラムが、コンピュータを文字−数値データ変換部１、分析対象データ格納部２、閾値設定部３、データ分類部４、データ列抽出部５、頻度演算部６、頻度累積差演算部７、入力属性閾値決定部８、頻度累積比率演算部１６、第２の要因抽出部９、要因未発見データ抽出部１０、終了条件判定部１１、入力属性閾値テーブル作成部１２、および寄与率演算部１３として機能させることにより実現することが可能である。 The data analysis method described above can be realized by a computer executing a data analysis program including processes corresponding to S0 to S12 (steps 0 to 13) in FIG. Therefore, in the data analysis device of FIG. 1, the data analysis program uses a computer to convert a character-numerical data conversion unit 1, an analysis target data storage unit 2, a threshold setting unit 3, a data classification unit 4, a data string extraction unit 5, a frequency calculation Unit 6, frequency cumulative difference calculation unit 7, input attribute threshold value determination unit 8, frequency cumulative ratio calculation unit 16, second factor extraction unit 9, factor undiscovered data extraction unit 10, termination condition determination unit 11, input attribute threshold table It can be realized by making it function as the creation unit 12 and the contribution ratio calculation unit 13.

なお、本実施形態では、データ分類部４において出力属性と出力属性閾値との比較により分類を行っていたが、出力属性が文字属性である場合、文字−数値データ変換部１で出力属性を数値属性に変換せず、データ分類部４において出力属性と要因分析対象となる出力属性（文字；Ｙ）との比較により分類を行うようにしてもよい。 In the present embodiment, the data classification unit 4 performs classification by comparing the output attribute with the output attribute threshold. However, if the output attribute is a character attribute, the character-numerical data conversion unit 1 converts the output attribute into a numerical value. Instead of converting the attribute, the data classification unit 4 may perform the classification by comparing the output attribute with the output attribute (character; Y) to be analyzed.

本実施形態に係るデータ分析方法は、以上のように、Ｎ個（Ｎは２以上の整数）の属性からなるＮ列の入力属性のデータと、１個の属性からなる１列の出力属性のデータとで構成される基本データ群を分析対象とし、該出力属性と該入力属性との因果関係を分析するデータ分析方法であって、出力属性閾値を設定する第１ステップと、該出力属性の値と該出力属性閾値との比較に基づいて、該基本データ群を、第１データ群と第２データ群とに２分化する第２ステップと、該第１データ群および該第２データ群の各々から、第Ｊ入力属性（Ｊは、１≦Ｊ≦Ｎなる関係にある整数）のデータ列を表す１−Ｊデータ列および２−Ｊデータ列を、それぞれ抽出する第３ステップと、該１−Ｊデータ列の該第Ｊ入力属性の個々の値に対して、その値以下のデータ個数の割合を表す１−Ｊ頻度累積（％）を計算し、該２−Ｊデータ列の該第Ｊ入力属性の個々の値に対して、その値以下のデータ個数の割合を表す２−Ｊ頻度累積（％）を計算する第４ステップと、該１−Ｊデータ列および該２−Ｊデータ列の双方を含めた、該第Ｊ入力属性の全ての値の個々に対して、該１−Ｊ頻度累積（％）と該２−Ｊ頻度累積（％）との差の絶対値を表す、第Ｊ頻度累積差を計算す
る第５ステップと、第Ｊ頻度累積差の値が最大となるときの第Ｊ入力属性の値を第Ｊ入力属性閾値として抽出する第６ステップと、第Ｊ入力属性が第Ｊ入力属性閾値であるときにおいて、該１−Ｊ頻度累積（％）に対する該２−Ｊ頻度累積（％）の比率を表す２−Ｊ下比率、および、１００から該１−Ｊ頻度累積（％）を引いた値に対する、１００から該２−Ｊ頻度累積（％）を引いた値の比率を表す２−Ｊ上比率を計算し、双方の比率のうちの大きい方の値を示す、２−Ｊ比率を抽出する第７ステップと、Ｊの値を１からＮまで順次増加させて、該第３ステップ〜該第７ステップの操作を繰り返し、繰り返し操作中の該第７ステップで抽出された、第１から第Ｎまでの入力属性の該２−Ｊ比率のうち、その値が最大となる入力属性、該入力属性の値を表す入力属性閾値、および採用した比率の種別を抽出し、保存する第８ステップと、該第８ステップで抽出された入力属性に基づいて、該基本データ群を２分化する第９ステップと、該第９ステップで２分化されたデータ群のうちの少なくとも一方を、新たな基本データ群として、所定の終了条件を満たすまで、該第２ステップ〜該第９ステップの操作を繰返す第１０ステップとを含む。 As described above, the data analysis method according to the present embodiment includes N columns of input attribute data composed of N attributes (N is an integer of 2 or more) and one column of output attributes composed of one attribute. A data analysis method for analyzing a causal relationship between the output attribute and the input attribute with a basic data group composed of data as an analysis target, comprising: a first step of setting an output attribute threshold; A second step of dividing the basic data group into a first data group and a second data group based on a comparison between the first data group and the second data group based on a comparison between the first data group and the second data group. A third step of respectively extracting a 1-J data string and a 2-J data string representing a data string of a J-th input attribute (J is an integer having a relationship of 1 ≦ J ≦ N), -For each value of the J-th input attribute in the J data string, Calculate the 1-J frequency accumulation (%) representing the ratio of the number of data items, and express the ratio of the number of data items equal to or less than the individual value of the J-th input attribute of the 2-J data sequence. A fourth step of calculating the J frequency accumulation (%); and for each value of the J-th input attribute, including both the 1-J data string and the 2-J data string, A fifth step of calculating a J-th frequency cumulative difference, which represents the absolute value of the difference between the 1-J frequency cumulative (%) and the 2-J frequency cumulative (%), A sixth step of extracting the value of the J-th input attribute when the J-th input attribute is equal to the threshold of the J-th input attribute; The 2-J lower ratio representing the ratio of -J frequency accumulation (%) and 100 minus the 1-J frequency accumulation (%) , A 2-J upper ratio indicating a ratio of a value obtained by subtracting the 2-J frequency accumulation (%) from 100 is calculated, and a 2-J ratio indicating a larger value of both ratios is extracted. The seventh step and the value of J are sequentially increased from 1 to N, and the operations of the third step to the seventh step are repeated, and the first to Nth steps extracted in the seventh step during the repeated operation are performed. An eighth step of extracting and storing the input attribute having the maximum value, the input attribute threshold value indicating the value of the input attribute, and the type of the adopted ratio among the 2-J ratios of the input attributes up to, A ninth step of dividing the basic data group into two parts based on the input attributes extracted in the eighth step, and at least one of the data groups bisected in the ninth step is replaced with a new basic data group. Until the predetermined termination condition is satisfied. And a tenth step of repeating the operation of the ninth step.

上記方法によれば、ラベル階層構造を予め定義しなくても、非常に簡潔な形で問題事象の要因を複数導き出せる。そして、これを用いて、因果関係を表す決定木を作成したり、問題事象（出力属性）に対する各要因（入力属性）の寄与率を求めたりする事ができる。 According to the above method, it is possible to derive a plurality of causes of a problem in a very simple manner without defining a label hierarchical structure in advance. Using this, a decision tree representing a causal relationship can be created, and the contribution ratio of each factor (input attribute) to the problem event (output attribute) can be obtained.

本発明は、分析対象である出力属性（目的属性）、例えば製造工程で製造される製品の特性等と、出力属性に影響を与える属性である入力属性（説明属性）、例えば製造プロセス条件等とで構成されるデータに対して、出力属性の値がまとまるような入力属性条件を決定すること、あるいは、入力属性と出力属性との因果関係を分析することに利用できる。したがって、本発明は、例えば製造業における製造工程の改良に利用できる。 The present invention relates to an output attribute (object attribute) to be analyzed, for example, characteristics of a product manufactured in a manufacturing process, and an input attribute (description attribute) that affects the output attribute, for example, manufacturing process conditions. It can be used to determine an input attribute condition such that the values of the output attributes are grouped for the data composed of, or to analyze the causal relationship between the input attributes and the output attributes. Therefore, the present invention can be used, for example, for improving a manufacturing process in the manufacturing industry.

本発明の一実施形態に係るデータ分析装置の構成を示すブロック図である。1 is a block diagram illustrating a configuration of a data analysis device according to an embodiment of the present invention. 本発明の一実施形態に係るデータ分析方法を示すフローチャートである。5 is a flowchart illustrating a data analysis method according to an embodiment of the present invention. 本発明の実施形態に係るデータ分析装置における頻度累積差演算部７の出力の一例をグラフで表したもので、入力属性ｘ１と、良品の１−ｘ１頻度累積（Ａ）、不良品の２−ｘ１頻度累積（Ｂ）、ｘ１頻度累積差｜Ａ−Ｂ｜との関係を示す。FIG. 7 is a graph showing an example of an output of the frequency accumulation difference calculation unit 7 in the data analysis device according to the embodiment of the present invention, in which an input attribute x1, 1-x1 frequency accumulation (A) of non-defective products, and 2- of defective products. The relationship between x1 frequency accumulation (B) and x1 frequency accumulation difference | AB | 本発明の実施形態に係るデータ分析装置における頻度累積差演算部７の出力の一例をグラフで表したもので、入力属性ｘ２と、良品の１−ｘ２頻度累積（Ａ）、不良品の２−ｘ２頻度累積（Ｂ）、ｘ２頻度累積差｜Ａ−Ｂ｜との関係を示す。FIG. 6 is a graph showing an example of an output of the frequency accumulation difference calculation unit 7 in the data analysis device according to the embodiment of the present invention, in which an input attribute x2, 1-x2 frequency accumulation (A) of non-defective products, and 2- The relationship between x2 frequency accumulation (B) and x2 frequency accumulation difference | AB | 本発明の実施形態に係るデータ分析装置における頻度累積差演算部７の出力の一例をグラフで表したもので、入力属性ｘ３と、良品の１−ｘ３頻度累積（Ａ）、不良品の２−ｘ３頻度累積（Ｂ）、ｘ３頻度累積差｜Ａ−Ｂ｜との関係を示す。FIG. 7 is a graph showing an example of an output of the frequency accumulation difference calculation unit 7 in the data analysis device according to the embodiment of the present invention, in which an input attribute x3, 1-x3 frequency accumulation (A) of non-defective products, and 2- The relationship between x3 frequency accumulation (B) and x3 frequency accumulation difference | A−B | 本発明の実施形態に係るデータ分析装置における頻度累積差演算部７の出力の一例をグラフで表したもので、入力属性ｘ４と、良品の１−ｘ４頻度累積（Ａ）、不良品の２−ｘ４頻度累積（Ｂ）、ｘ４頻度累積差｜Ａ−Ｂ｜との関係を示す。FIG. 4 is a graph showing an example of an output of the frequency accumulation difference calculation unit 7 in the data analysis device according to the embodiment of the present invention, in which an input attribute x4, 1-x4 frequency accumulation (A) of non-defective products, and 2- The relationship between x4 frequency accumulation (B) and x4 frequency accumulation difference | A−B | 本発明の一実施形態に係るデータ分析装置における寄与率演算部１３（ステップ１２）で出力されるデータの一例であり、問題事象である出力属性条件ｙ＝２（＝Ｙ）に対する入力属性条件「ｘ１＞２」および入力属性条件「ｘ２＞２」の寄与率を示す。This is an example of data output by the contribution ratio calculation unit 13 (step 12) in the data analysis device according to one embodiment of the present invention, and is an input attribute condition “output attribute condition y = 2 (= Y)” which is a problem event. x1> 2 ”and the input attribute condition“ x2> 2 ”. 本発明の実施形態の入力属性閾値テーブルを、決定木の形式で表現した図である。FIG. 7 is a diagram illustrating an input attribute threshold table according to the embodiment of the present invention in the form of a decision tree. 従来の決定木−２を、図７と同じ形式で表現した図である。FIG. 8 is a diagram illustrating a conventional decision tree-2 in the same format as in FIG. 7. 従来の決定木−１を表す図である。It is a figure showing the conventional decision tree-1. 従来の決定木−２のラベル階層構造を表す図であり、（ａ）はｘ１属性、（ｂ）はｘ２属性、（ｃ）はｘ３属性、（ｄ）はｘ４属性を示す。It is a figure showing the label hierarchy structure of the conventional decision tree -2, (a) shows x1 attribute, (b) shows x2 attribute, (c) shows x3 attribute, (d) shows x4 attribute. 従来の決定木−２を表す図である。It is a figure showing the conventional decision tree-2. 本発明の他の実施形態に係る入力属性条件決定装置およびデータ分析装置の構成を示すブロック図である。It is a block diagram showing composition of an input attribute condition deciding device and a data analysis device concerning other embodiments of the present invention. 本発明の他の実施形態に係るデータ分析方法を示すフローチャートである。9 is a flowchart illustrating a data analysis method according to another embodiment of the present invention. 本発明の他の実施形態に係る入力属性条件決定方法を示すフローチャートである。9 is a flowchart illustrating an input attribute condition determination method according to another embodiment of the present invention. 本発明の他の実施形態に係るデータ分析装置における不良品分離度演算部１１２（ステップ１０９）で出力されるデータの一例（表３５）を、ベン図で表現した図である。It is the figure which expressed an example (Table 35) of the data output by the defective goods isolation | separation calculation part 112 (step 109) in the data analyzer which concerns on another embodiment of this invention in the Venn diagram. 本発明の他の実施形態に係るデータ分析方法の要因抽出（ステップ１１０）、および要因決定（ステップ１１４）の過程を、決定木の形式で表現した図である。It is the figure which expressed the process of factor extraction (step 110) and factor determination (step 114) of the data analysis method concerning another embodiment of the present invention in the form of a decision tree. 本発明の他の実施形態に係るデータ分析装置における要因決定部１１７（ステップ１１４）で出力される決定要因一覧テーブルの一例（表４７）について、各入力属性条件に対する不良数を棒グラフで、不良品分離度（第２データ群分離度）を折れ線グラフで表現した図である。In an example (Table 47) of the determining factor list table output by the factor determining unit 117 (Step 114) in the data analyzer according to another embodiment of the present invention, the number of defective products for each input attribute condition is represented by a bar graph. FIG. 10 is a diagram expressing the degree of separation (second data group degree of separation) in a line graph. 本発明の他の実施形態に係るデータ分析装置における複合要因不良数計算部１１８（ステップ１１５）で出力されるデータの一例（表４６）を用い、各入力属性条件に対する不良数を棒グラフで、不良品分離度（第２データ群分離度）を折れ線グラフで表現した図で、第１の要因（「ｘ２＞２」、すなわち「ｘ２＝ｃｏｒｄ」）との複合要因による不良数に、ハッチングを付けて示している。Using an example (Table 46) of data output from the complex factor defect count calculation unit 118 (Step 115) in the data analysis device according to another embodiment of the present invention, the number of defects for each input attribute condition is represented by FIG. 4 is a diagram in which a non-defective item separation degree (second data group separation degree) is represented by a line graph, and the number of failures due to a compound factor with the first factor (“x2> 2”, that is, “x2 = cord”) is hatched. It is shown with a. 非特許文献１に記載の従来技術であるＧｉｎｉインデックス法を説明するグラフで、表１のデータ群を題材として、入力属性ｘ１の分岐条件と、Ｇｉｎｉインデックス法の改善度との関係を示すグラフである。FIG. 7 is a graph illustrating the Gini index method as a conventional technique described in Non-Patent Document 1, which is a graph showing a relationship between a branch condition of an input attribute x1 and a degree of improvement of the Gini index method using a data group in Table 1 as a subject. is there. 非特許文献１に記載の従来技術であるＧｉｎｉインデックス法を説明するグラフで、表１のデータ群を題材として、入力属性ｘ２の分岐条件と、Ｇｉｎｉインデックス法の改善度との関係を示すグラフである。7 is a graph illustrating the Gini index method, which is a conventional technique described in Non-Patent Document 1, and showing a relationship between a branch condition of an input attribute x2 and a degree of improvement of the Gini index method using a data group of Table 1 as a subject. is there. 非特許文献１に記載の従来技術であるＧｉｎｉインデックス法を説明するグラフで、表１のデータ群を題材として、入力属性ｘ３の分岐条件と、Ｇｉｎｉインデックス法の改善度との関係を示すグラフである。FIG. 7 is a graph illustrating the Gini index method, which is a conventional technique described in Non-Patent Document 1, and is a graph showing a relationship between a branch condition of an input attribute x3 and a degree of improvement of the Gini index method using a data group of Table 1 as a subject. is there. 非特許文献１に記載の従来技術であるＧｉｎｉインデックス法を説明するグラフで、表１のデータ群を題材として、入力属性ｘ３の分岐条件と、Ｇｉｎｉインデックス法の改善度との関係を示すグラフである。FIG. 7 is a graph illustrating the Gini index method, which is a conventional technique described in Non-Patent Document 1, and is a graph showing a relationship between a branch condition of an input attribute x3 and a degree of improvement of the Gini index method using a data group of Table 1 as a subject. is there.

符号の説明Explanation of reference numerals

１文字−数値データ変換部（数値変換手段）
３閾値設定部（閾値設定手段）
４データ分類部（分類手段）
５データ列抽出部
６頻度演算部（第１の評価手段の中の頻度演算手段）
７頻度累積差演算部（第１の評価手段の中の差分演算手段）
８入力属性閾値決定部（閾値決定手段）
９第２の要因抽出部（第２の要因抽出手段）
１０要因未発見データ抽出部（分割手段）
１１終了条件判定部（終了条件判定手段）
１４分析結果データ格納部
１５出力部
１６頻度累積比率演算部（第２の評価手段、分割ルール評価手段）
１０２基本データ群格納部
１０３分類条件設定部（分類条件設定手段）
１０４データ分類部（分類手段）
１０５分類後基本データ群格納部
１０６分析データ群抽出部（分析データ群抽出手段）
１０７データ行分離部
１０９第１の要因抽出部（第１の要因抽出手段）
１１１入力属性条件決定部（入力属性条件決定手段）
１１２不良品分離度演算部（第２データ群分離度演算手段）
１１５データ分割部（分割手段）
１１７要因決定部（要因決定手段）
１１８複合要因不良数計算部
１１９数値−文字データ変換部
１３０閾値決定部（閾値決定手段）
１３１極性判定部（極性判定手段） 1 Character-numerical data converter (numerical conversion means)
3 threshold setting part (threshold setting means)
4 Data classification part (classification means)
5 data string extraction unit 6 frequency calculation unit (frequency calculation unit in the first evaluation unit)
7. Frequency cumulative difference calculation unit (difference calculation means in the first evaluation means)
8 Input attribute threshold value determination unit (threshold value determination means)
9 Second factor extraction unit (second factor extraction means)
10 factor undiscovered data extraction unit (division means)
11 End condition judgment unit (end condition judgment means)
14 analysis result data storage unit 15 output unit 16 frequency accumulation ratio calculation unit (second evaluation means, division rule evaluation means)
102 basic data group storage unit 103 classification condition setting unit (classification condition setting means)
104 Data Classification Unit (Classification Means)
105 Basic data group storage unit after classification 106 Analysis data group extraction unit (Analysis data group extraction means)
107 data line separation unit 109 first factor extraction unit (first factor extraction means)
111 Input attribute condition determination unit (input attribute condition determination means)
112 Defective product separability calculating unit (second data group separability calculating means)
115 Data division unit (division means)
117 Factor determination unit (factor determination means)
118 Multifactor defect number calculation unit 119 Numerical value-character data conversion unit 130 Threshold determination unit (threshold determination unit)
131 Polarity determination unit (polarity determination means)

Claims

数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定装置であって、
入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段と、
入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段と、
１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定手段と、
上記閾値決定手段で決定された閾値に基づいて、上記入力属性条件を決定する入力属性条件決定手段とを含むことを特徴とする入力属性条件決定装置。 A set of data composed of at least one input attribute that is a numerical attribute and an output attribute. The analysis data group is classified into a first data group and a second data group according to the value of the output attribute. An input attribute condition determining device for determining an input attribute condition, which is a condition of an input attribute for dividing the analysis data group into two such that the first data group and the second data group are combined,
For each numerical value of the input attribute, the ratio of the data whose input attribute is equal to or less than the numerical value in the first data group is calculated as the first frequency, and in the second data group, the input attribute is Frequency calculating means for calculating the ratio of data that is equal to or less than the numerical value as the second frequency;
Difference calculation means for calculating a difference value between the first frequency and the second frequency for each of all numerical values of the input attribute;
A threshold value determining unit that determines, as a threshold value in the input attribute, a numerical value in which the difference value is the largest among the numerical values taken by one input attribute, and determines at least one threshold value corresponding to at least one input attribute;
An input attribute condition determining device for determining the input attribute condition based on the threshold value determined by the threshold value determining device.

上記閾値決定手段で決定された閾値における、第１の頻度と第２の頻度との大小関係を判定する極性判定手段をさらに含み、
上記入力属性条件決定手段は、
入力属性条件を満たすデータ群に第２データ群がまとまり、入力属性条件を満たさないデータ群に第１データ群がまとまるように、
上記極性判定手段により第１の頻度が第２の頻度より大きいと判定された場合には、上記入力属性条件を「入力属性が閾値を超える」という条件に決定し、
上記極性判定手段により第２の頻度が第１の頻度より大きいと判定された場合には、上記入力属性条件を「入力属性が閾値以下」という条件に決定することを特徴とする請求項１に記載の入力属性条件決定装置。 A polarity determination unit configured to determine a magnitude relationship between the first frequency and the second frequency in the threshold value determined by the threshold value determination unit;
The input attribute condition determining means includes:
A second data group is grouped into a data group satisfying the input attribute condition, and a first data group is grouped into a data group not satisfying the input attribute condition.
If the first frequency is determined to be greater than the second frequency by the polarity determination means, the input attribute condition is determined to be a condition that the input attribute exceeds a threshold,
2. The method according to claim 1, wherein when the polarity determining unit determines that the second frequency is greater than the first frequency, the input attribute condition is determined to be a condition that the input attribute is equal to or less than a threshold. An input attribute condition determination device as described.

上記入力属性は、製品の製造工程における製造プロセス条件および／またはインライン検査結果であり、
上記出力属性は、製品の品質判定結果であり、
上記第２データ群は、品質判定結果が不良のデータ群であることを特徴とする請求項１に記載の入力属性条件決定装置。 The input attribute is a manufacturing process condition and / or an in-line inspection result in a manufacturing process of a product,
The output attribute is a quality judgment result of the product,
2. The input attribute condition determining apparatus according to claim 1, wherein the second data group is a data group having a poor quality determination result.

複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、
上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段と、
上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段と、
請求項１または２に記載の入力属性条件決定装置とを含み、
上記頻度演算手段および差分演算手段は、分析データ群の各々の入力属性がとる全ての数値の各々について上記演算を行い、
上記閾値決定手段は、分析データ群の各々の入力属性について、それぞれ、閾値を決定することを特徴とするデータ分析装置。 A data analysis device that analyzes a causal relationship between an input attribute and an output attribute for a basic data group that is a set of data composed of a plurality of input attributes and output attributes, and extracts information indicating the causal relationship. So,
Classifying means for classifying the basic data group into a first data group and a second data group according to the value of the output attribute, and adding a classification flag;
Analysis data group extraction means for extracting an analysis data group to be analyzed from the basic data group after the classification,
An input attribute condition determination device according to claim 1 or 2,
The frequency calculation means and the difference calculation means perform the calculation for each of all numerical values taken by each input attribute of the analysis data group,
The data analysis device, wherein the threshold value determination means determines a threshold value for each input attribute of the analysis data group.

数値属性でない入力属性を含む基本データ群に対し、入力属性を数値に変換する処理を行う数値変換手段をさらに備えていることを特徴とする請求項４に記載のデータ分析装置。 The data analysis apparatus according to claim 4, further comprising a numerical value conversion unit that performs a process of converting the input attribute into a numerical value for a basic data group including an input attribute that is not a numerical attribute.

上記入力属性条件決定装置で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という相関ルールの確からしさを表す分割ルール評価値を演算する分割ルール評価手段と、
上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件に基づいて、上記分析データ群を、該入力属性条件を満たす要因データ群と、該入力属性条件を満たさない他データ群とに分割する分割手段とを含むことを特徴とする請求項４に記載のデータ分析装置。 For each of the input attribute conditions determined by the input attribute condition determination device, the certainty of the correlation rule that "if the input attribute satisfies the input attribute condition, the data is included in the second data group in the analysis data group". Division rule evaluation means for calculating a division rule evaluation value representing
Based on the input attribute condition having the largest division rule evaluation value among the input attribute conditions determined by the input attribute condition determination device, the analysis data group is converted into a factor data group satisfying the input attribute condition, 5. The data analyzer according to claim 4, further comprising: a dividing unit configured to divide the data into another data group that does not satisfy the input attribute condition.

上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうちの少なくとも一方を新たな分析データ群として抽出し、
分析データ群抽出手段による処理、入力属性条件決定装置による処理、分割ルール評価手段による処理、および、分割手段による処理からなる一連の処理が繰り返し実行されるようになっていることを特徴とする請求項６に記載のデータ分析装置。 The analysis data group extraction means extracts at least one of the data groups divided by the division means as a new analysis data group,
A series of processes including a process by the analysis data group extracting unit, a process by the input attribute condition determining device, a process by the dividing rule evaluating unit, and a process by the dividing unit are repeatedly executed. Item 7. A data analyzer according to Item 6.

上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうち他データ群のみを、新たな分析データ群として抽出するものであることを特徴とする請求項７に記載のデータ分析装置。 8. The data analysis apparatus according to claim 7, wherein the analysis data group extraction unit extracts only another data group from the data group divided by the division unit as a new analysis data group. .

上記分割ルール評価手段は、上記入力属性条件決定装置で決定された入力属性条件の各々について、上記分析データ群の第１データ群中で該入力属性条件を満たすデータの割合に対する、上記分析データ群の第２データ群中で該入力属性条件を満たすデータの割合の比率を、分割ルール評価値として演算するものであることを特徴とする請求項６に記載のデータ分析装置。 For each of the input attribute conditions determined by the input attribute condition determination device, the division rule evaluation means may include: 7. The data analysis apparatus according to claim 6, wherein a ratio of a ratio of data satisfying the input attribute condition in the second data group is calculated as a division rule evaluation value.

分類条件を設定する分類条件設定手段をさらに含み、
上記分類手段は、分類条件設定手段で設定された分類条件に基づいて基本データ群を分類するようになっていることを特徴とする請求項４に記載のデータ分析装置。 A classification condition setting means for setting a classification condition;
5. The data analysis apparatus according to claim 4, wherein the classification unit classifies the basic data group based on the classification condition set by the classification condition setting unit.

上記基本データ群は、複数の出力属性を含み、
上記分類条件設定手段は、上記複数の出力属性の各々に対して分類条件を設定し、
上記分類手段は、分類条件設定手段で設定された各々の分類条件の論理和または論理積に依って、基本データ群を分類するようになっていることを特徴とする請求項１０に記載のデータ分析装置。 The basic data group includes a plurality of output attributes,
The classification condition setting means sets a classification condition for each of the plurality of output attributes,
11. The data according to claim 10, wherein the classification means classifies the basic data group based on a logical sum or a logical product of the respective classification conditions set by the classification condition setting means. Analysis equipment.

上記入力属性条件決定装置で決定された入力属性条件の各々について、上記基本データ群中で該入力属性条件を満たすデータの中に第２データ群が含まれる割合を表す、第２データ群分離度を演算する第２データ群分離度演算手段と、
上記入力属性条件決定装置で決定された入力属性条件の中で、上記基本データ群中の第２データ群の割合を表す第２データ群含有率よりも大きい値の、第２データ群分離度をもつ入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第１の要因抽出手段とを含むことを特徴とする請求項４〜７のいずれか１項に記載のデータ分析装置。 For each of the input attribute conditions determined by the input attribute condition determination device, a second data group separation degree representing a ratio of a second data group included in data satisfying the input attribute condition in the basic data group. Second data group separation degree calculating means for calculating
In the input attribute condition determined by the input attribute condition determination device, the second data group separation degree of a value larger than the second data group content rate representing the ratio of the second data group in the basic data group is set. 8. A method according to claim 4, further comprising: a first factor extracting means for extracting an input attribute condition having the attribute as information indicating a factor of an output attribute condition corresponding to the second data group. Data analyzer.

上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含むことを特徴とする請求項６または７に記載のデータ分析装置。 A second extraction of the input attribute condition having the largest division rule evaluation value from the input attribute conditions determined by the input attribute condition determination device as information indicating the factor of the output attribute condition corresponding to the second data group. 8. The data analysis device according to claim 6, further comprising: factor extraction means.

複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、
上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段と、
上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段と、
分析データ群の各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価手段と、
分析データ群の各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定手段と、
上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す第２評価指標を演算する第２の評価手段と、
上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含むことを特徴とするデータ分析装置。 A data analysis device that analyzes a causal relationship between an input attribute and an output attribute for a basic data group that is a set of data composed of a plurality of input attributes and output attributes, and extracts information indicating the causal relationship. So,
Classifying means for classifying the basic data group into a first data group and a second data group according to the value of the output attribute, and adding a classification flag;
Analysis data group extraction means for extracting an analysis data group to be analyzed from the basic data group after the classification,
For each input attribute condition that can be taken by each input attribute of the analysis data group, "If the input attribute satisfies the input attribute condition, it is data belonging to the second data group in the analysis data group, and the input attribute is input. If the attribute condition is not satisfied, the data belongs to the first data group in the analysis data group. "
Input attribute condition determining means for determining, for each input attribute of the analysis data group, an input attribute condition having a maximum input attribute condition evaluation index as an input attribute condition satisfying the first correlation rule;
For each of the input attribute conditions determined by the input attribute condition determining means, a second correlation rule that "if the input attribute satisfies the input attribute condition, data is included in the second data group in the analysis data group" Second evaluation means for calculating a second evaluation index representing the likelihood of
A second input attribute condition in which the second evaluation index is maximum among the input attribute conditions determined by the input attribute condition determining means is extracted as information indicating a factor of the output attribute condition corresponding to the second data group. And a factor extracting means.

複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群を分析対象とし、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析装置であって、
基本データ群を出力属性に依って第１データ群と第２データ群とに分類する分類手段と、
各入力属性の全ての数値について、入力属性がその数値以下であるデータが第１データ群および第２データ群のうちの一方に偏っている度合いを表す閾値評価指標を演算する第１の評価手段と、
第１の評価手段で演算された閾値評価指標に基づいて、各入力属性について最大の閾値評価指標を持つ数値を各入力属性の閾値として決定する閾値決定手段と、
閾値決定手段で決定された各入力属性の閾値に基づいて、「入力属性が閾値以下であれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す第１のルール評価値と、「入力属性が閾値を超えていれば第２データ群に含まれるデータである」という相関ルールの確からしさを表す第２のルール評価値とを各入力属性について演算する第２の評価手段と、
第２の評価手段でルール評価値が演算された、全ての入力属性に関する相関ルールのうちで最も高いルール評価値を持つ相関ルールの入力属性条件を示すデータを、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段とを含むことを特徴とするデータ分析装置。 A data analyzer that analyzes a causal relationship between an input attribute and an output attribute and extracts information indicating a causal relationship by analyzing a basic data group that is a set of data composed of a plurality of input attributes and output attributes And
Classification means for classifying the basic data group into a first data group and a second data group according to an output attribute;
A first evaluation means for calculating a threshold evaluation index indicating a degree of bias of data whose input attribute is equal to or less than the numerical value to one of the first data group and the second data group for all numerical values of each input attribute; When,
Threshold determining means for determining a numerical value having a maximum threshold evaluation index for each input attribute as a threshold of each input attribute based on the threshold evaluation index calculated by the first evaluation means;
A first rule evaluation value indicating the likelihood of the correlation rule that “if the input attribute is equal to or less than the threshold, the data is included in the second data group” based on the threshold of each input attribute determined by the threshold determining means And a second rule evaluation value that indicates the likelihood of the correlation rule that “if the input attribute exceeds the threshold value, the data is included in the second data group” for each input attribute. When,
Data indicating the input attribute condition of the correlation rule having the highest rule evaluation value among the correlation rules for all the input attributes whose rule evaluation values have been calculated by the second evaluation means is output to the second data group. A second factor extracting means for extracting as information indicating a factor of the attribute condition.

上記第２の要因抽出手段で抽出された入力属性条件に基づいて、分析データ群を、上記入力属性条件を満たす要因データ群と、上記入力属性条件を満たさない他データ群とに分割する分割手段をさらに含み、
上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうちの少なくとも一方を新たな分析データ群として抽出し、
分析データ群抽出手段による処理、第１の評価手段による処理、入力属性条件決定手段による処理、第２の評価手段による処理、第２の要因抽出手段による処理、および分割手段による処理からなる一連の処理が繰り返し実行されるようになっていることを特徴とする請求項１４に記載のデータ分析装置。 Dividing means for dividing the analysis data group into a factor data group satisfying the input attribute condition and another data group not satisfying the input attribute condition based on the input attribute condition extracted by the second factor extracting means Further comprising
The analysis data group extraction means extracts at least one of the data groups divided by the division means as a new analysis data group,
A series of processing including processing by the analysis data group extracting means, processing by the first evaluating means, processing by the input attribute condition determining means, processing by the second evaluating means, processing by the second factor extracting means, and processing by the dividing means The data analysis apparatus according to claim 14, wherein the processing is repeatedly executed.

上記分析データ群抽出手段は、上記分割手段で分割されたデータ群のうち他データ群のみを、新たな分析データ群として抽出するものであることを特徴とする請求項１６に記載のデータ分析装置。 17. The data analysis apparatus according to claim 16, wherein the analysis data group extraction unit extracts only another data group from the data group divided by the division unit as a new analysis data group. .

上記第１の評価手段は、
各入力属性の全ての数値について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段と、
各入力属性の全ての数値について、第１の頻度と第２の頻度との差分を演算する差分演算手段とを含むことを特徴とする請求項１４または１５に記載のデータ分析装置。 The first evaluation means includes:
For all numerical values of each input attribute, the ratio of the data whose input attribute is equal to or less than the numerical value in the first data group is calculated as the first frequency, and the input attribute is equal to or less than the numerical value in the second data group. Frequency calculation means for calculating the ratio of data that is the second frequency,
16. The data analysis device according to claim 14, further comprising a difference calculation unit that calculates a difference between the first frequency and the second frequency for all numerical values of each input attribute.

上記入力属性は、製品の製造工程における製造プロセス条件および／またはインライン検査結果であり、
上記出力属性は、製品の品質判定結果であり、
上記第２データ群は、品質判定結果が不良のデータ群であることを特徴とする請求項１４または１５に記載のデータ分析装置。 The input attribute is a manufacturing process condition and / or an in-line inspection result in a manufacturing process of a product,
The output attribute is a quality judgment result of the product,
16. The data analyzer according to claim 14, wherein the second data group is a data group having a quality determination result of a failure.

終了条件を満たしているか否かを判定する終了条件判定手段をさらに含み、
上記終了条件判定手段は、上記分析データ群抽出手段で抽出した分析データ群における第２データ群のデータ数が０であるかを終了条件として判定を行い、
上記終了条件判定手段において終了条件を満たしていると判定されると、上記一連の処理の実行を終了するようになっていることを特徴とする請求項７または１６に記載のデータ分析装置。 Further comprising an end condition determining means for determining whether or not the end condition is satisfied,
The end condition determination means determines whether or not the number of data of the second data group in the analysis data group extracted by the analysis data group extraction means is 0 as an end condition,
17. The data analysis device according to claim 7, wherein when the termination condition determination unit determines that the termination condition is satisfied, the execution of the series of processes is terminated.

請求項１に記載の入力属性条件決定装置を用いて、数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定方法であって、
上記頻度演算手段により、入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算ステップと、
上記差分演算手段により、入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算ステップと、
上記閾値決定手段により、１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定ステップと、
上記入力属性条件決定手段により、上記閾値決定手段で決定された閾値に基づいて、上記入力属性条件を決定する入力属性条件決定ステップとを含むことを特徴とする入力属性条件決定方法。 A set of data including at least one input attribute that is a numerical attribute and an output attribute using the input attribute condition determination device according to claim 1, wherein the first data group is determined based on a value of the output attribute. An input attribute, which is a condition of an input attribute for dividing the analysis data group into two so that the first data group and the second data group are united with respect to the analysis data group classified into the second data group, An input attribute condition determination method for determining a condition,
The frequency calculating means calculates, as a first frequency, a ratio of data having an input attribute equal to or less than the numerical value in the first data group for each of all numerical values of the input attribute, A frequency calculating step of calculating, as a second frequency, a ratio of data whose input attribute is equal to or less than the numerical value;
A difference calculation step of calculating a difference value between the first frequency and the second frequency for each of all numerical values of the input attribute by the difference calculation means;
The threshold value deciding means sets, as a threshold value in the input attribute, a numerical value with the largest difference value among the numerical values of one input attribute, and determines at least one threshold value corresponding to at least one input attribute. A threshold determination step;
An input attribute condition determining step of determining the input attribute condition by the input attribute condition determining means based on the threshold value determined by the threshold value determining means.

請求項４に記載のデータ分析装置を用いて、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析方法であって、
上記分類手段により、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類ステップと、
上記分析データ群抽出手段により、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出ステップと、
上記入力属性条件決定装置の上記頻度演算手段により、分析データ群の各々の入力属性がとる全ての数値の各々について、分析データ群の第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、分析データ群の第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算ステップと、
上記入力属性条件決定装置の上記差分演算手段により、分析データ群の各々の入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算ステップと、
上記入力属性条件決定装置の上記閾値決定手段により、各々の入力属性について、それぞれ、上記差分値が最大となる数値を該入力属性の閾値として決定する閾値決定ステップと、
上記入力属性条件決定装置の上記入力属性条件決定手段により、上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性条件を決定する入力属性条件決定ステップとを含むことを特徴とするデータ分析方法。 A causal relationship between an input attribute and an output attribute is analyzed for a basic data group, which is a set of data including a plurality of input attributes and an output attribute, using the data analysis device according to claim 4. , A data analysis method for extracting information indicating a causal relationship,
A classifying step of classifying the basic data group into a first data group and a second data group by the classifying means according to a value of an output attribute, and providing a classifying flag;
An analysis data group extracting step of extracting an analysis data group to be analyzed from the classified basic data group by the analysis data group extraction means;
For each numerical value of each input attribute of the analysis data group, the data whose input attribute is equal to or less than the numerical value in the first data group of the analysis data group by the frequency calculation means of the input attribute condition determination device. A frequency calculation step of calculating, as a second frequency, a ratio of data whose input attribute is equal to or less than the numerical value in a second data group of the analysis data group,
A difference calculating step of calculating a difference value between a first frequency and a second frequency for each of all numerical values taken by each input attribute of the analysis data group by the difference calculating means of the input attribute condition determining device; ,
A threshold value determining step of determining, as a threshold value of the input attribute, a numerical value that maximizes the difference value for each input attribute by the threshold value determining means of the input attribute condition determining device;
The input attribute condition determining means of the input attribute condition determining device divides the analysis data group into two based on the threshold value determined by the threshold value determining means such that the first data group and the second data group are respectively combined. An input attribute condition determining step of determining an input attribute condition for performing the data analysis.

請求項６に記載の上記分割ルール評価手段により、上記入力属性条件決定装置で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という相関ルールの確からしさを表す分割ルール評価値を演算する分割ルール評価ステップと、
請求項６に記載の上記データ分析装置の上記分割手段により、上記入力属性条件決定装置で決定された入力属性条件の中で、最大の分割ルール評価値を持つ入力属性条件に基づいて、上記分析データ群を、該入力属性条件を満たす要因データ群と、該入力属性条件を満たさない他データ群とに分割する分割ステップとを含むことを特徴とする請求項２２に記載のデータ分析方法。 For each of the input attribute conditions determined by the input attribute condition determination device by the division rule evaluation means according to claim 6, "if the input attribute satisfies the input attribute condition, the second data group in the analysis data group A partition rule evaluation step of calculating a partition rule evaluation value representing the likelihood of the association rule that the data is included in
7. The data analyzing apparatus according to claim 6, wherein the dividing unit performs the analysis based on an input attribute condition having a maximum division rule evaluation value among input attribute conditions determined by the input attribute condition determining device. 23. The data analysis method according to claim 22, further comprising a dividing step of dividing the data group into a factor data group satisfying the input attribute condition and another data group not satisfying the input attribute condition.

請求項１４に記載のデータ分析装置を用いて、複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、入力属性と出力属性との因果関係を分析し、因果関係を示す情報を抽出するデータ分析方法であって、
上記分類手段により、上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類ステップと、
上記分析データ群抽出手段により、上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出ステップと、
上記第１の評価手段により、各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価ステップと、
上記入力属性条件決定手段により、各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定ステップと、
上記第２の評価手段により、上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す第２評価指標を演算する第２の評価ステップと、
第２の要因抽出手段により、上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出ステップとを含むことを特徴とするデータ分析方法。 A causal relationship between an input attribute and an output attribute is analyzed for a basic data group that is a set of data including a plurality of input attributes and an output attribute, using the data analysis device according to claim 14. , A data analysis method for extracting information indicating a causal relationship,
A classifying step of classifying the basic data group into a first data group and a second data group by the classifying means according to a value of an output attribute, and providing a classifying flag;
An analysis data group extracting step of extracting an analysis data group to be analyzed from the classified basic data group by the analysis data group extraction means;
According to the first evaluation means, for each of the input attribute conditions that each of the input attributes can take, "If the input attribute satisfies the input attribute condition, the data belongs to the second data group in the analysis data group; If the input attribute does not satisfy the input attribute condition, it is data belonging to the first data group in the analysis data group. " Steps and
An input attribute condition determining step of, for each input attribute, determining an input attribute condition having a maximum input attribute condition evaluation index as an input attribute condition satisfying the first correlation rule; ,
For each of the input attribute conditions determined by the input attribute condition determining means by the second evaluation means, "If the input attribute satisfies the input attribute conditions, the data included in the second data group in the analysis data group is used. A second evaluation step of calculating a second evaluation index representing the likelihood of the second association rule “is”,
The second attribute extracting means determines, from among the input attribute conditions determined by the input attribute condition determining means, the input attribute condition with the largest second evaluation index as the factor of the output attribute condition corresponding to the second data group. A second factor extracting step of extracting the information as information indicating the following.

数値属性である少なくとも１つの入力属性と、出力属性とで構成されるデータの集合であり、出力属性の値に依り第１データ群と第２データ群とに分類される分析データ群に対して、
コンピュータを、
入力属性がとる全ての数値の各々について、第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段、
入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段、
１つの入力属性がとる各数値の中で、上記差分値が最大となる数値を、該入力属性における閾値とし、少なくとも１つの入力属性に対応する少なくとも１つの閾値を決定する閾値決定手段、および、
上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定手段として機能させるための入力属性条件決定プログラム。 A set of data composed of at least one input attribute that is a numerical attribute and an output attribute. The analysis data group is classified into a first data group and a second data group according to the value of the output attribute. ,
Computer
For each numerical value of the input attribute, the ratio of the data whose input attribute is equal to or less than the numerical value in the first data group is calculated as the first frequency, and in the second data group, the input attribute is Frequency calculation means for calculating the ratio of data that is equal to or less than the numerical value as the second frequency,
Difference calculating means for calculating a difference value between the first frequency and the second frequency for each of all numerical values of the input attribute;
A threshold value determining unit that determines, as a threshold value in the input attribute, a numerical value that maximizes the difference value among the numerical values taken by one input attribute, and determines at least one threshold value corresponding to at least one input attribute;
Based on the threshold value determined by the threshold value determination means, an input attribute condition, which is a condition of an input attribute for dividing the analysis data group into two so that the first data group and the second data group are respectively combined, is determined. An input attribute condition determining program for functioning as input attribute condition determining means.

請求項２５に記載の入力属性条件決定プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium recording the input attribute condition determination program according to claim 25.

複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、
コンピュータを、上記基本データ群を、出力属性の値に依って、第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段、
上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段、
分析データ群の各々の入力属性がとる全ての数値の各々について、分析データ群の第１データ群中において、入力属性がその数値以下であるデータの割合を第１の頻度として演算すると共に、分析データ群の第２データ群中において、入力属性がその数値以下であるデータの割合を第２の頻度として演算する頻度演算手段、
分析データ群の各々の入力属性がとる全ての数値の各々について、第１の頻度と第２の頻度との差分値を演算する差分演算手段、各々の入力属性について、それぞれ、上記差分値が最大となる数値を、該入力属性の閾値として決定する閾値決定手段、
上記閾値決定手段で決定された閾値に基づいて、第１データ群と第２データ群とがそれぞれまとまるように上記分析データ群を２分化するための入力属性の条件である、入力属性条件を決定する入力属性条件決定手段として機能させるためのデータ分析プログラム。 For a basic data group, which is a set of data composed of a plurality of input attributes and output attributes,
Classifying means for classifying the computer into the first data group and the second data group according to the value of the output attribute, and assigning a classification flag;
Analysis data group extraction means for extracting an analysis data group to be analyzed from among the basic data groups after the classification,
For each numerical value of each input attribute of the analysis data group, a ratio of data whose input attribute is equal to or less than the numerical value in the first data group of the analysis data group is calculated as a first frequency, and the analysis is performed. Frequency calculating means for calculating, as a second frequency, a ratio of data whose input attribute is equal to or less than the numerical value in the second data group of the data group;
A difference calculating means for calculating a difference value between the first frequency and the second frequency for each of all numerical values of each input attribute of the analysis data group; and for each input attribute, the difference value is maximum. Threshold determining means for determining a numerical value to be the threshold of the input attribute,
Based on the threshold value determined by the threshold value determination means, an input attribute condition, which is a condition of an input attribute for dividing the analysis data group into two so that the first data group and the second data group are respectively combined, is determined. A data analysis program for functioning as input attribute condition determination means to be executed.

複数の入力属性と、出力属性とで構成されるデータの集合である基本データ群に対して、
コンピュータを、
上記基本データ群を、出力属性の値に依って第１データ群と第２データ群とに分類し、分類フラグを付与する分類手段、
上記分類後の基本データ群の中から、分析の対象とする分析データ群を抽出する分析データ群抽出手段、
各々の入力属性が取り得る全ての入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に属するデータであり、入力属性が入力属性条件を満たさなければ、分析データ群中の第１データ群に属するデータである」という第１の相関ルールの確からしさを表す、入力属性条件評価指標を演算する第１の評価手段、
各々の入力属性について、それぞれ、最大の入力属性条件評価指標を持つ入力属性条件を、上記第１の相関ルールを満たす入力属性条件として決定する入力属性条件決定手段、
上記入力属性条件決定手段で決定された入力属性条件の各々について、「入力属性が入力属性条件を満たせば、分析データ群中の第２データ群に含まれるデータである」という第２の相関ルールの確からしさを表す第２評価指標を演算する第２の評価手段、
上記入力属性条件決定手段で決定された入力属性条件の中で、第２評価指標が最大となる入力属性条件を、第２データ群に対応する出力属性条件の要因を示す情報として抽出する第２の要因抽出手段として機能させるためのデータ分析プログラム。 For a basic data group, which is a set of data composed of a plurality of input attributes and output attributes,
Computer
Classifying means for classifying the basic data group into a first data group and a second data group according to the value of the output attribute, and adding a classification flag;
Analysis data group extraction means for extracting an analysis data group to be analyzed from among the basic data groups after the classification,
For each of all input attribute conditions that each input attribute can take, "If the input attribute satisfies the input attribute condition, it is data belonging to the second data group in the analysis data group, and the input attribute satisfies the input attribute condition. If not, it is data belonging to the first data group in the analysis data group. "The first evaluation means for calculating the input attribute condition evaluation index, which represents the likelihood of the first correlation rule,
Input attribute condition determining means for determining, for each input attribute, an input attribute condition having a maximum input attribute condition evaluation index as an input attribute condition satisfying the first correlation rule;
For each of the input attribute conditions determined by the input attribute condition determining means, a second correlation rule that "if the input attribute satisfies the input attribute condition, data is included in the second data group in the analysis data group" Second evaluation means for calculating a second evaluation index representing the likelihood of
A second input attribute condition in which the second evaluation index is maximum among the input attribute conditions determined by the input attribute condition determining means is extracted as information indicating a factor of the output attribute condition corresponding to the second data group. A data analysis program for functioning as a factor extraction means.

請求項２７に記載のデータ分析プログラム、および／または、請求項２８に記載のデータ分析プログラムを記録したコンピュータ読み取り可能な記録媒体。 A computer-readable recording medium recording the data analysis program according to claim 27 and / or the data analysis program according to claim 28.