JP4275359B2

JP4275359B2 - Data analysis method

Info

Publication number: JP4275359B2
Application number: JP2002182064A
Authority: JP
Inventors: 英大白井; 英隆津田
Original assignee: Fujitsu Semiconductor Ltd
Current assignee: Fujitsu Semiconductor Ltd
Priority date: 2002-06-21
Filing date: 2002-06-21
Publication date: 2009-06-10
Anticipated expiration: 2022-06-21
Also published as: JP2004029971A

Description

【０００１】
【発明の属する技術分野】
この発明は、広く産業界で取り扱われるデータ間の関連を把握し、産業上優位な結果をもたらすための有意性のある結果を抽出するデータ解析方法に関する。
【０００２】
【従来の技術】
例えば、半導体製造工程において歩留りを向上させるため、製造段階で使用された装置の履歴、試験結果、設計情報、各種測定データ等に基づいて歩留りを低下させている要因をできるだけ速やかに見つけ出す作業が行われる。このためには、実際に物理解析を行うよりも事前に収集されたデータに基づいた統計解析を行っておくのが、経済性の面からも優れており、この統計解析を効率的に行うことが重要である。
【０００３】
本願発明者等は、先にこのようなデータを統計解析する装置および方法として特願２０００−２８４５７８号（特開２００１−３０６９９９号公報）を出願している。統計的データ解析により有意差の抽出を行う場合、どのようなデータをどのような解析手法で解析するかは、解析者の持っている経験、技術等により決定される。この場合、一度の解析結果で意思決定がなされるのは稀であり、一般に各解析結果を解釈した後に、次になすべき解析条件（データや解析手法等）が検討、決定されたうえで、解析処理がなされる。
【０００４】
図３８は、一般的なデータ解析処理の手順を示すフローチャートである。決定された解析条件の設定を行い（ステップＳ５０）、設定された解析条件に基づきデータの解析を実行し（ステップＳ５１）、得られた解析結果を解釈し（ステップＳ５２）、意思決定を行う（ステップＳ５３）。意志決定されれば（ステップＳ５３：Ｙｅｓ）、統計解析を終了し、今回の解析結果で意思決定できない場合には（ステップＳ５３：Ｎｏ）、解析条件の変更を行い（ステップＳ５４）、変更した解析条件に基づきデータの解析を実行する。
【０００５】
【発明が解決しようとする課題】
上記ステップＳ５４において変更される解析条件は、解析者によりその説明変数、目的変数、処理終了条件等が設定されたうえで実行プログラムに入力される。従って、この入力の際に操作ミスや、実行結果待ち等が生じるため解析効率が低下していた。ここで指定される解析条件は、ある程度パターン化できるものが多いが、解析結果を得た後でなければ最終的な解析条件を指定できない場合が多い。これらは解析自動化を阻害する要因の一つとなっている。特に、一回の解析に時間を要し、多くのパラメータを扱うデータマイニングのような処理に顕著に現れる。
【０００６】
データ解析を効率的に行うためには、解析者がどのような手順で何を解析したいか、その説明変数、目的変数を何にすべきかを常に意識して進める必要がある。一般には、各解析者が各解析ケース毎に説明変数のカテゴリを認識しておこなっているが、従来は入力されたデータがどのようなカテゴリのものであるか判別せずに処理結果を出していた。
【０００７】
特に、レコード数が少ないにもかかわらず、説明変数の数が多い半導体製造に係るプロセスデータについては、説明変数が複雑に絡み合い、解析対象、解析目的に合った処理手順、説明変数の選択を適切に行わないと、効率的にデータ解析を進めることができなかった。特に、時刻データはプロセスデータ解析において重要な役割を果たしており大量に取得されている。しかし、説明変数が増えすぎたり、交絡しやすくなる（独立でなくなる）ため、統計的有意差の抽出をより困難にしている。対応して計算時間をはじめとする計算機資源も多く必要となっていた。
【０００８】
この発明は、上記問題点に鑑みてなされたものであって、生産工程における生産装置や処理時刻のデータの中から所望するデータ解析に必要なデータのみを容易に抽出でき、歩留り向上に有効な解析結果を効率よく得ることができるデータ解析方法を提供することを目的とする。また、レコード数が少なく説明変数が多いデータのデータ解析を自動的に実行できるデータ解析方法の提供を本発明の目的に含めることができる。
【０００９】
【課題を解決するための手段】
上記目的を達成するため、本発明は、生産装置名や、処理時刻等のデータからデータ解析に必要なデータを抽出し、歩留りを低下させる問題の生産装置名や処理時刻及び出来映えや歩留り等のデータを絞り込みデータ解析を実行する。このデータ解析の実行に際して、データの説明変数に対しデータ項目のカテゴリを識別する付加文字列を付加することにより、解析処理時にカテゴリを認識できカテゴリに対応した解析手順を自動実行できる。この際、データの目的変数、説明変数を選択及び削除して所望する解析結果を得る。また、所望するデータ解析に不要なデータを説明変数の項目名によって削除し、異常値を有するレコードや項目を削除して必要なデータのみを用いたデータ解析を可能にする。
【００１０】
この発明によれば、一通りの解析結果を実行した後に解析条件を指定するといった従来のような手間のかかるデータ解析を行わずとも、解析対象となるデータの説明変数に付加された付加文字列によってデータ項目のカテゴリを認識して必要なデータ解析処理を自動的に順次実行していくことができ、かつ解析結果の信頼性を向上できるようになる。そして、レコード数が少なく説明変数の数が多い半導体製造に係るプロセスデータのように、説明変数が複雑に絡み合うものであっても、解析対象、解析目的に適合した説明変数及び処理手順を選択でき、データ解析を効率的に遂行できるようになる。
【００１１】
【発明の実施の形態】
（実施の形態１：データの自動解析）
以下に添付図面を参照して、この発明に係るデータ解析装置およびデータ解析方法の好適な実施の形態を詳細に説明する。この発明の実施の形態で扱う解析対象のオリジナルデータは、半導体製造に係るプロセスデータを例とし、このプロセスデータは時間変動を有しているものとする。このプロセスデータの解析を効率的に行うには、特に時間変動を示す時刻に関する項目が重要となる。以下に説明する各実施の形態では、歩留り要因等の解析のために各製造工程に配置された生産装置（工程装置）とその処理時刻を用い、低歩留り要因をデータマイニング技法（回帰木分析、決定木分析）の使用によって抽出し、さらに解析効率を向上させようとするものである。
【００１２】
図１は、本発明の実施の形態に係るデータ解析装置に用いられる計算機システムのハードウェア構成を示す図である。このデータ解析装置は、キーボード等の装置操作用の操作手段およびネットワーク等を介してデータを入力するための入力装置１、入力されたデータに対し後述する解析処理を実行するＣＰＵ等を備えた中央処理装置２、ＣＲＴ，ＬＣＤ等の表示手段やプリンタ等の印字手段からなる出力装置３、およびＨＤＤ等のデータを格納保持する記憶装置４によって構成される。
【００１３】
図２は、プロセスデータの流れを説明するための図である。半導体等の被製造対象の製造工程には、複数（Ｎ）の工程装置１０ａ〜１０ｎが配置される。各工程装置１０ａ〜１０ｎは、それぞれの製造工程におけるプロセスデータを管理サーバ１１に送出する。このプロセスデータは、各工程において製造対象を製造した処理時刻、製造に関わった使用装置の名称、歩留り等からなる。管理サーバ１１は、入力されたプロセスデータに基づき、製造情報データベースＤＢ１を作成する。この製造情報データベースＤＢ１は、図１に示す記憶装置４に対しネットワーク等を介して格納される。
【００１４】
図３は、図１に示すシステム構成により実現されるデータ解析装置の機能ブロック図である。このデータ解析装置３０には、図２に示す製造情報データベースＤＢ１に格納された各工程装置のプロセスデータが入力される。このデータ解析装置３０は、データ抽出手段３１，データクレンジング／特徴化手段３２，データ解析手段３３，解析結果評価手段３４，報告レポート出力手段３５を備え、各手段はそれぞれ設定ファイルＲＦ（ＲＦ１〜ＲＦ７）に記述された設定情報に従って処理を実行する。なお、各手段における処理を一連化し自動実行することを自動解析と称す。
【００１５】
このデータ解析装置３０は、回帰木分析等の解析処理プログラムを起動し、必要な入出力ファイル名及び目的変数と説明変数が指定された後、各種解析フロー設定ファイル群に従って自動的に▲１▼データベースからのデータ抽出、▲２▼データのクレンジング及び特徴化、▲３▼回帰木分析、▲４▼解析結果評価を順次行い、▲５▼工程履歴から問題工程と装置と時刻を抽出し、報告レポートＲＥＰ１を出力する。
【００１６】
この際、ロット番号（データ件数即ちレコード数に相当）と各工程の装置名と処理日時のデータに目的変数項目が付加されたデータを抽出及び加工してデータマイニング技法によるデータ解析を行い、自動的に注目すべき工程と装置と時刻を絞り込む。
【００１７】
図４は、本発明のデータ解析装置におけるデータの処理手順の概要を示すフローチャートである。製造情報データベースＤＢ１から抽出された解析対象の解析用データＤＡＴＡは、プログラム初期化ファイルＩＮＩ（後述する各設定ファイルＲＦ１〜ＲＦ７を含む）の設定に従い、解析プログラムの実行によりデータ解析される。ここで、プログラム初期化ファイルＩＮＩの設定に基づき、解析プログラムは、解析用データＤＡＴＡ内のデータ項目のカテゴリを識別する付加文字列＿ｘｘを設定する。この付加文字列＿ｘｘは解析プログラムにより認識される。
【００１８】
そして、解析プログラムで該当する処理モードが指定されると（ステップＳ１）、ｘｘで示すカテゴリ種に対応する処理（目的変数や説明変数等の変数自動選択（ステップＳ２），あるいは変数手動選択（ステップＳ３））の後、回帰木分析等による解析処理が実行される（ステップＳ４）。解析処理は、一回目の解析処理に基づきあらかじめ設定された解析対象、解析手順に合わせて再実行するか否か選択される（ステップＳ５）。再実行時には、実行結果を抽出し（ステップＳ６）、目的変数、説明変数等の変数を自動選択あるいは削除し（ステップＳ７）、ステップＳ４に復帰して次の解析処理を実行して最終的な解析結果を得る。
【００１９】
（データ抽出手段３１について）
データ抽出手段３１は、
（１）データ抽出及び変換の抽出条件設定と処理
（２）データマイニング条件設定と処理
（３）装置名及び時刻設定と処理
をそれぞれ実行する。
【００２０】
（１）データ抽出及び変換の抽出条件設定と処理について
データ抽出及び変換は、抽出条件設定ファイルＲＦ１データ抽出のプログラムに従い、製造情報データベースＤＢ１に貯えられたプロセスデータを、決められた時間に、あるいは定期的に、設定された条件（対象品種、期間、項目）でロット番号と工程名と装置名と処理日時のデータに目的変数項目が付加されたデータを抽出する。
【００２１】
（２）データマイニング条件設定と処理について
項目設定ファイルＲＦ２とマイニング条件設定のプログラムに従い、レコード識別名、説明変数名、目的変数名、説明変数値名を選択し、解析用データ１（ＤＡＴＡ１）を作成し出力する。具体的には、項目設定ファイルＲＦ２を用いて次の内容を設定する。
レコード名：ロット番号
説明変数項目：製造工程（大工程名＋小工程名）
説明変数値名：装置、時刻
目的変数名：ＹＩＥＬＤ、特性値（歩留り）等
【００２２】
（３）装置名及び時刻設定処理について
この解析装置の製造工程履歴解析に必要となる説明変数値として装置名及び時刻を設定する。解析用データ１（ＤＡＴＡ１）は１列目にレコード名、２列目以降に説明変数と目的変数を整列させる。図５は、解析用データ１（ＤＡＴＡ１）の内容の一部を示す図表である。図示のように、説明変数項目名には、説明変数値名にその値の名前が付加され、識別できるようになっている。この例では、製造工程名に「＿装置」又は「＿時刻」が付加され、名前から工程とその値の内容が識別できる。この項目設定で設定された説明変数項目及び説明変数値名により、以降のデータクレンジング／特徴化手段３２及びデータ解析手段３３において製造工程と時刻又は装置を関連付けて識別できる。なお、生産ラインの各工程には複数台の装置（例えば、図示の工程０１＿装置における６ｎｗ２と６ｎｗ４）が配置され並列に動作して生産を行っている状態が示されている。
【００２３】
（データクレンジング／特徴化手段３２について）
データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３，特徴化設定ファイルＲＦ４，およびデータクレンジング／特徴化のプログラムに従って、以下に示すデータクレンジング及び特徴化の処理を実行する。
（１）異常値処理条件設定及び処理
（２）経時変化に基づく装置名の変更設定及び処理
（３）一台装置工程における項目名変更
（４）不要項目削除設定及び処理
（５）異常値割合による項目削除及びレコード削除設定及び処理
（６）時刻データの解析条件設定及び処理
【００２４】
上記（１）〜（６）の各処理を詳細に説明する。
（１）異常値処理条件設定及び処理について
データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３の設定内容に基づき、解析用データ１（ＤＡＴＡ１）の説明変数項目値が欠損している場合には特定値に置き換える。このデータクレンジングの処理内容について説明する。図６は、データクレンジング後の解析用データ２（ＤＡＴＡ２）の内容の一部を示す図表である。図５に示した解析用データ１（ＤＡＴＡ１）の説明変数項目値がＮｕｌｌ（欠損値）である箇所は、図６に示すように特定値（図示の例では欠損する数値は９９９９９、欠損する文字列値はｎｏｐ）に置き換えられる。
【００２５】
そして、データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３の設定内容に基づき、この特定値を、値の一つとして解析するか、欠損値として解析するかを設定する。異常判断基準についても、その異常値定義と置換値を設定し、設定に従って異常値を処理する。
【００２６】
（２）経時変化に基づく装置名の変更設定及び処理について
同一装置であっても、何らかのトラブルによってある時期から突然異常な装置に変わることがある。データクレンジング／特徴化手段３２は、この突然の変化による目的変数値の変動を捉えて、異常な状態へ変化した装置に別の名前を付けることにより、さらに問題点を絞り込むことができるようになる。
【００２７】
データクレンジング／特徴化手段３２は、全装置について処理時刻による目的変数推移の特徴をウェーブレット変換等による特徴抽出（フィルタリングによるノイズ除去等）によって確認し、設定された基準に対し特徴の強かった装置については急上昇と急降下時期及び期間により特徴付けを行う。例えば、プログラム初期化ファイルＩＮＩの設定により急変動の量が全体の標準偏差の０．８倍以上を基準とする。また、装置名は、その推移特徴情報を付加した装置名へと変換する。この実施の形態では推移特徴情報を次のように装置名に付加する。
【００２８】
▲１▼区切り文字（＠）を付ける。
▲２▼その装置に対応する目的変数値が急上昇又は急降下した時期によって期間を最大で３つに分け、期間を表す記号（Ｆは前半、Ｍは中盤、Ｌは後半、記号無しは全体）を付ける。
▲３▼その装置の推移の形状を表す記号としてトレンドマークをつける。
▲４▼推移特徴の強さを表す記号として１桁の数字（０：弱い〜９：強い）を付ける。
【００２９】
図７は、装置名に付与されるトレンドマークを示す図表である。図示のように、トレンドマークは、図６に示す解析用データ２（ＤＡＴＡ２）全体を前半、中盤、後半の３つに分け、各期間別の目的変数値の推移の状態を示すものであり、前記ウェーブレット変換等の特徴抽出によって得られた状態が図示のように１（−＾：前半が低く、後半が高い）〜５（−：特徴なし）までに分けられた設定となっている。
【００３０】
推移特徴情報を付加した装置名の例を、図６に示す工程０１＿装置を用いると、
６ｎｗ２＠Ｆ−＾７
が得られる。上記例は、装置名が６ｎｗ２であり、前半期間（Ｆ）における目的変数値の推移の形状は前半が低く後半が高く、推移特徴が７であることを示す。また、推移特徴がまったく無い場合には、６ｎｗ２＠−０となる。このようにデータクレンジング／特徴化手段３２は、各工程装置それぞれについて、推移特徴情報を付加した装置名を作成する。なお、図６に示した解析用データ２（ＤＡＴＡ２）はデータ全体の一部であるため、実際にはデータ全体を用いて特徴抽出された後の推移特徴情報が付加されることになる。
【００３１】
（３）一台装置工程における項目名変更について
前述したが、半導体をはじめとする各種製造ラインの各工程では複数台の装置が配置され並列に動作させることが多い。しかし、ある一つの工程に一台の装置のみを配置し運用する場合もある。このような構成の工程については「一台装置工程」と定義する。一台装置工程では、複数台の装置同士間での差を確認することができないため、問題となる要因を配置された一台の装置における経時変化に基づき求める。
【００３２】
説明変数となる工程の装置名が１種類の場合、その工程の装置間差は出せないため、説明変数から外す。該当する工程の処理時刻を示すデータ項目名に“＿一台装置”を付加する。図６に示す例では、工程０４＿装置の処理時刻を“工程０４＿一台装置”に変更する。なお、該当する処理時刻データ項目が存在しない場合や、該当する処理時刻データ項目も１種類の値の場合は変更しない。この付加文字によって解析時に一台装置工程の時刻データ項目であることが認識される。
【００３３】
（４）不要項目削除設定及び処理について
データを取得する段階で解析に不要となる説明変数項目が混じることがあり、この場合、解析前に不要項目を削除する設定を行う。この例では、初期設定として、直接製品加工を行わない検査工程等を除くために、説明変数項目名に含まれる検査工程名等の文字列を複数設定する。設定に基づいて不要項目を削除する。
【００３４】
（５）異常値割合による項目削除及びレコード削除設定及び処理について
欠損値及び定義された異常値の割合が初期設定値を超えた項目とレコードを削除する。例えば、欠損値及び定義された異常値の割合が６０％以上の項目を削除する。欠損値及び定義された異常値の割合が７０％以上のレコードを削除する。目的変数値が欠損又は異常であるレコードは削除する。なお、説明変数のうち名義尺度の項目は値の種類が１又は１００以上の項目は解析対象としない。
【００３５】
（６）時刻データの解析条件設定及び処理について
装置は、初期設定によって説明変数における時刻データは順序尺度（時刻単位の処理対象）として扱う。また、第２の設定として、製造工程に存在する周期を用いて期間を区切り、期間を表す名前に加工した名義尺度（名前単位の処理対象）としても扱う。例えば、製造作業者の交代周期等がある。
【００３６】
（データ解析手段３３の解析処理について）
データ解析手段３３は、解析設定ファイルＲＦ５及び解析処理プログラムの内容に従って回帰木分析による下記の解析処理を実行する。
【００３７】
（１）装置履歴＋時刻データ解析の設定
製造工程の処理形態に応じて以下のように異なる解析処理を実行する。
▲１▼製造工程全体が先入れ先出しでロットを処理する場合
基本的にどの工程においてもロットの処理順番が同じとなるため、各工程の処理時刻は説明変数としては一つあれば十分であり、全ての装置項目と第１候補の時刻を説明変数とした解析を実行する。多少のロット処理順番の入れ替わりがある場合は、時刻データの数を増やし、「装置履歴＋上位Ｎ候補時刻データ解析」を実行する。Ｎは１〜２０が適正な範囲である。
【００３８】
▲２▼製造工程全体が先入れ先出しでロットを処理しない場合
時刻データの独立性の検定手法を用いて、互いに独立でない工程の時刻データはまとめて一つの代表時刻項目にし、独立な時刻データのみの代表時刻項目群に絞り込む（時刻データの絞り込み）。その後、全ての装置名項目と絞り込んだ代表時刻項目群を説明変数とした「装置履歴＋時刻データ解析」を実行する。
【００３９】
（２）解析終了条件設定
回帰木分析終了条件は、例えば、分割集合の標準偏差が全体の０．５倍以下になった時として設定される。
【００４０】
（３）解析の実行
データ解析手段３３は、解析設定ファイルＲＦ５及び解析処理プログラムの内容に従って解析を実行する。目的変数が複数設定されている場合は、設定された複数の項目を順次選択して解析する。
【００４１】
▲１▼「装置履歴＋上位Ｎ候補時刻データ解析」の処理内容
１．装置名項目及び一台装置の処理時刻項目だけを説明変数として、指定の目的変数について回帰木分析を行う。
２．次にその解析結果で候補となった上位Ｎ個の工程の時刻データを説明変数に追加して再度回帰木分析を行う。なお、第ｋ候補に挙がった項目の時刻データが無かった場合は第ｋ＋１候補以降で時刻データの存在する項目の時刻データを探して追加する。無い場合は無かったことを、あった場合は第何候補であったかを解析結果で明示する（１≦ｋ≦Ｎ）。
【００４２】
▲２▼「装置履歴＋時刻データ解析」の処理内容
全ての装置名項目と絞り込んだ代表時刻項目群を説明変数とした回帰木分析を行う。
【００４３】
（解析結果評価手段３４の解析結果の抽出、評価について）
上述したデータ解析手段３３による自動解析は１度の実行で終わるものではない。解析対象期間又は対象ロットを変化させ、かつ、データクレンジング／特徴化手段３２、及びデータ解析手段３３の各種設定値を初期値から変化させながらくり返し解析を行い解析結果データ（ＤＡＴＡ３）を得る。そして、この解析結果データ（ＤＡＴＡ３）が示す得られた複数の結果を評価し、より信頼できる解析結果を得るようになっている。
【００４４】
ここで回帰木分析及びｔ検定の概要について説明しておく。回帰木分析は、複数の属性を示す説明変数とそれにより影響を受ける目的変数からなるレコードの集合を対象とし、その目的変数に最も影響を与える属性と属性値を判別するものである。解析結果評価手段３４は、データの特徴や規則性を示すルールを出力する。
【００４５】
回帰木分析の処理は、各説明変数（属性）のパラメータ値（属性値）に基づいて集合の２分割を繰り返していくことで実現される。その集合分割の際、分割前の目的変数の平方和をＳ０、分割後の２つの集合のそれぞれの目的変数の平方和をＳ１およびＳ２としたとき、下記式（１）で示すΔＳが最大となるように、分割するレコードの説明変数とそのパラメータ値を求める。
【００４６】
ΔＳ＝Ｓ０−（Ｓ１＋Ｓ２）・・・（１）
【００４７】
ここで得られる説明変数とそのパラメータ値は、回帰木では分岐点に対応している。以降、分割された集合についても同様な処理を繰り返し、説明変数の目的変数に対する影響を調べる。以上が、一般によく知られている回帰木分析の手法であるが、集合分割の明確さをより詳しく把握するために、複数の上位分割候補に関して、ΔＳの他に以下のパラメータ（ａ）〜（ｄ）も回帰木分析結果の定量的な評価として使用する。
【００４８】
（ａ）Ｓ比：
集合分割による平方和の低減率であり、集合分割により平方和がどの程度低減したかを示すパラメータである。この値が小さいほど集合分割の効果は大きく、集合分割が明確におこなわれているので、有意差が大である。
【００４９】
Ｓ比＝（（Ｓ１＋Ｓ２）／２）／Ｓ０・・・（２）
【００５０】
（ｂ）ｔ値：
回帰木分析の処理実行により集合が２分割されるが、分割された２つの集合の平均（／Ｘ１，／Ｘ２）の差の検定のための値である。ここで、“／”は上線を示す。統計のｔ検定は、分割された集合における目的変数の平均値の有意差を示す基準となる。自由度、即ちデータ数が同じであるなら、ｔが大きいほど集合が明確に分割されており、有意差が大である。
【００５１】
この際、分割された集合の分散に有意差がない場合には下記式（３）によりｔ値を求め、分割された集合の分散に有意差がある場合には下記式（４）によりｔ値を求める。ここで、Ｎ１およびＮ２は、それぞれ分割した集合１および集合２の要素数である。また、／Ｘ１および／Ｘ２はそれぞれ分割後の各集合の平均である。Ｓ１およびＳ２は、それぞれ分割後の各集合の目的変数の平方和である。
【００５２】
【数１】

【００５３】
【数２】

【００５４】
（ｃ）分割された集合の目的変数の平均値の差：
この値が大きいほど有意差が大である。
【００５５】
（ｄ）分割された各集合のデータ数：
両者の差が小さいほど異常値（ノイズ）による影響が小である。
【００５６】
（１）解析結果の評価
評価は、探索設定ファイルＲＦ６及び解析結果評価プログラムに基づき行う。この評価情報は各解析結果ごとに算出するが、各解析結果間で比較できる。
（２）信頼性の高い解析結果の探索
解析結果評価手段３４は、探索設定ファイルＲＦ６の設定に基づき、解析結果データ（ＤＡＴＡ３）に対し信頼性の高い解析結果の探索の判断を行う。例えば、「回帰木第一分岐のｔ検定値で比較する。ただし、２分割グループの各データ数が設定基準以上であること」を条件とした比較評価値を用いる。この比較評価値によって、より信頼できる解析結果を探索することができる。そして、各設定ファイルＲＦ１〜ＲＦ５の各設定値を変化させる範囲を限定すること又は自動解析時間を限定することにより、探索を終了させ、得られた複数の解析結果と各総合評価値と順位を得て、行った解析の中で最も信頼できる解析結果を抽出する。設定値を変化させる範囲と方法の設定の一部を図１４（ｂ）に示す。この設定に従って、２分割交絡度及び各項目の異常値割合の活用による多角的な分析を実行し、より明確な分析結果の探索を行っている。
【００５７】
図８は、解析結果データ（ＤＡＴＡ３）に基づく評価処理の内容を説明するための図表である。図示のように、以下の評価処理によって、項目名として各工程装置と各評価値（以下に説明するｔ検定値、低いグループと高いグループの装置名及び件数、平均値等）を算出し、ｔ検定の検定値の値が大きい順に問題が大であるとして、順位Ｎｏを付与する。
【００５８】
（報告レポート出力手段３５について）
報告レポート出力手段３５は、最も信頼できる解析結果について報告レポートを作成し、出力する。
（１）報告レポート作成について
解析結果にはファイルに回帰木のルール情報と、第一分岐の所定数候補（例えば上位２０候補）の評価用統計値、及び各候補間の２分割交絡度、並びにレポート情報ファイルとしてＨＴＭＬファイルに回帰木図と主要２分岐及び上位２候補の有意差を簡潔な文章、及び箱髭図又は相関図をデータと関連付けて出力する。
【００５９】
（２）具体的報告方法について
解析結果の報告は、報告条件設定ファイルＲＦ７の設定及び報告処理プログラムに基づき処理される。例えば、報告内容は、画面表示及びあらかじめ設定した電子メールアドレスにアラーム通知し、報告レポートと報告レポートＷＥＢアドレスを報告する。
【００６０】
図９は、報告レポートの内容の一例を示す図である。報告レポートＲＥＰ１は、解析結果に基づき、▲１▼総合判定内容（主要２分岐及び上位２候補の有意差を説明する簡潔な文章）Ａ１，▲２▼統計的情報Ａ２，▲３▼回帰木図Ａ３，▲４▼回帰木図Ａ３に対応する箱髭図Ａ４，又は相関図Ａ５を表示する。回帰木図Ａ３は例えば、ＨＴＭＬ等の記述形式で表示され、所望する工程装置をクリックすることにより、リンクされた該当する▲４▼箱髭図Ａ４あるいは相関図Ａ５を選択的に表示可能となっている。
【００６１】
図１０は、データ解析処理の処理手順を示すフローチャートである。図示のように、データ抽出手段３１は製造情報データベースＤＢ１からデータの抽出及び変換を行い解析用データ（ＤＡＴＡ１）を得る（ステップＳ１１）。次に、データクレンジング／特徴化手段３２は解析用データ（ＤＡＴＡ１）のデータをクレンジング及び特徴化した解析用データ（ＤＡＴＡ２）を得る（ステップＳ１２）。
【００６２】
次に、データ解析手段３３は、クレンジング及び特徴化後の解析用データ（ＤＡＴＡ２）を回帰木分析の手法によりデータマイニングし解析結果データ（ＤＡＴＡ３）を得る（ステップＳ１３）。次に、解析結果評価手段３４は、解析結果データ（ＤＡＴＡ３）を用いて解析結果を評価する（ステップＳ１４）。この評価時、信頼性の高い結果を探索する。例えば、設定した解析終了条件を満たすか否かを判断する（ステップＳ１５）。満たしていない場合には（ステップＳ１５：Ｎｏ）、解析対象期間又は対象ロットを変化させ、かつ、データクレンジング／特徴化手段３２、及びデータ解析手段３３の各種設定値を初期値から変化させながらくり返し解析を行いより信頼できる解析結果データ（ＤＡＴＡ３）を得る。
【００６３】
データ解析手段３３の解析終了条件を満たす解析結果が得られると（ステップＳ１５：Ｙｅｓ）、報告レポート出力手段３５による報告レポートＲＥＰ１を出力し（ステップＳ１６）、データ解析処理を終了する。以上のデータ解析処理は、週単位や月単位で自動実行され、半導体製造工程等の製造段階で使用された装置の履歴、試験結果、設計情報、各種測定データ等のデータ解析によって得られた報告レポートＲＥＰ１によって、歩留りを低下させている要因を容易に見つけ出すことができ、歩留りの向上が図れるようになる。
【００６４】
図１１〜図１４は、上記解析処理を設定する設定ファイルの設定内容の一例を示す図である。図１１は、データクレンジング設定ファイルＲＦ３の一部を示す図表である。解析用データ１（ＤＡＴＡ１）の説明変数項目値の欠損時の置換や特定値への置き換え、項目及びレコードを削除する異常値割合等が設定される。図１２は、項目設定ファイルＲＦ２の一部を示す図表である。日付型項目と装置型項目の識別文字例についてそれぞれ設定される。図１３は、データクレンジング設定ファイルＲＦ３の一部を示す図表である。特定文字列が使われている項目名を削除する場合の検索文字列が設定される。
【００６５】
図１４は、解析設定ファイルＲＦ５と探索設定ファイルＲＦ６の一部を示す図表である。図１４（ａ）は解析設定ファイルＲＦ５の一部であり、時刻データ解析の設定内容及び解析条件の設定内容が示され、前述した「装置履歴＋上位Ｎ候補時刻データ解析」、あるいは「装置履歴＋独立時刻データ解析」のいずれかが設定され、目的変数、説明変数の指定、解析処理の終了条件等が設定される。図１４（ｂ）は探索設定ファイルＲＦ６の一部であり、多角的な分析を行うための設定がなされ、２分割交絡度の基本的活用方法や項目の異常値割合による選別方法等が設定されている。
【００６６】
上記各設定内容は、この解析装置を適用する対象に固有の条件があった場合には、解析フロー設定ファイル群（設定ファイルＲＦ１〜ＲＦ７）に対し必要な設定を追加又は変更することで容易に対応できる。
【００６７】
（実施の形態２：データ解析のモード指定選択処理について）
本発明の実施の形態２は、既にクレンジング済みの解析対象データが得られている場合に、解析処理以降を自動的に行う構成である。データ解析装置３０の構成は実施の形態１と同様であり説明を省略する。なお、実施の形態１で説明したデータ解析手段３３における具体的なデータ解析処理についてこの実施の形態２で補足説明する。
【００６８】
図１５は、本発明の実施の形態２における解析処理の選択画面を示す図である。入出力データそれぞれのファイル名を選択すると図示のように表示項目４０が表示される。表示項目４０は４つの処理解析モードが表示されており、１〜４のいずれかを選択できる。１「装置履歴データ解析」，２「装置履歴＋上位（ｎ）候補時刻データ解析」，３「装置履歴＋第１候補時刻データ解析」，４「Ｍａｎｕａｌ（手動解析）」である。
【００６９】
図１６は、入力データの内容を説明するための図表である。同図は、前述した図５（製造情報データベースＤＢ１）に相当する。入力データは、図１６に示すように、各ロット番号毎に、説明変数とする各工程の使用装置名と処理時刻、目的変数とする歩留り値等からなるＣＳＶ形式のファイルである。
【００７０】
データ解析装置３０のデータ抽出手段３１は、上記の製造情報データベースＤＢ１を取り込み自動的にデータ解析をして注目すべき工程の装置名や処理時刻を絞り込む。図１２を用いて説明した項目設定ファイルＲＦ２では、説明変数の使用装置名は工程名に付加文字列”＿ｅ”、処理時刻には工程名に付加文字列”＿ｔ”を付与する。即ち、工程Ａでの使用装置名、処理時刻は各々”Ａ＿ｅ”、”Ａ＿ｔ”である。
【００７１】
また、日付型項目を示す付加文字列として”＿ｔ”の他に、”＿時刻”、”＿ＤＡＹ＿ＴＩＭＥ”が定義され、後者２つは代表識別文字列である”＿ｔ”に変換されて扱われる。装置項目を示す識別文字列についても同様である。これにより、異なるデータソースから収集されたデータであっても同一のカテゴリデータ種として扱えるようになる。
【００７２】
上記の設定により、解析処理時には、付加文字列”＿ｔ”、”＿時刻”、”＿ＤＡＹ＿ＴＩＭＥ”が付いたものを時刻データとみなし、一方、付加文字列”＿ｅ”、”＿装置”、”＿ＥＱＵＩＰ”が付いたものを装置データとみなし、各処理モードに対応した解析処理を自動実行する。なお、自動実行は、上記４「Ｍａｎｕａｌ（手動解析）」以外のモード選択時に行われる。
【００７３】
また、プログラム初期化ファイルＩＮＩの一部を構成している抽出条件設定ファイルＲＦ１，項目設定ファイルＲＦ２の設定内容に基づき、どのようなカテゴリのデータを説明変数として有しているかを判断する。そして、図１２に例示した項目設定ファイルＲＦ２の設定に基づき、工程の使用装置と処理時刻を有していると判断する。ここで、プログラム初期化ファイルＩＮＩの設定内容と一致する識別文字列がデータ内の項目にない場合は、全てマニュアル解析となる（図４における処理モード指定時ステップＳ１の時期の判断）。
【００７４】
図１７は、解析用データ２（ＤＡＴＡ２）の一例を示す図表である。データ解析手段３３にはこの解析用データに対する回帰木分析を行う。その結果、全８ロットの歩留りに影響を及ぼすのは工程１〜４の使用装置とその処理時刻であるとの解析結果を得る。図１８は、解析処理結果の一例であるトレンドグラフを示す図である。横軸はロット番号、縦軸は歩留りである。同図には、工程２＿ｅを構成する装置Ｖ２０１，Ｖ２０２について、ある期間に処理した４ロット（Ｌｏｔ２〜Ｌｏｔ５）の歩留りが高い状態と、工程２の装置Ｖ２０２で処理したロットの歩留りが低い状態が解析結果として得られる。
【００７５】
次に、上記各処理モード別の解析処理を説明する。
（１）Ｍａｎｕａｌ（手動解析）
図１５に示した解析処理の選択画面で「Ｍａｎｕａｌ（手動解析）」を選択すると、変数選択の項目リストＬ１に必要な項目が表示される。図１９は、手動解析時の項目設定時の画面を示す図である。項目リストＬ１には、目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２が一覧表示されており、必要な数値項目Ｋ１，文字項目Ｋ２の選択と、条件設定を手動で行った後に解析処理を実行する。
【００７６】
図２０は、手動解析時の項目設定後の画面を示す図である。図示の例は、目的変数として選択した歩留りが目的変数リストＳ１に表示される。説明変数としては、全工程の使用装置と処理時刻（工程１＿ｅ〜工程４＿ｅ，工程１＿ｔ〜工程４＿ｔ）を選択した状態が説明変数リストＳ２に表示される。データ解析手段３３は、この設定に基づき解析処理プログラムを実行する。上記データ解析の結果を図２１に示す。図２１は、全工程の使用装置と処理時刻を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００７７】
（２）装置履歴データ解析
図２２は、装置履歴データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴データ解析」を選択すると、項目リストＬ１の文字項目Ｋ２には、各工程での使用装置を示す”＿ｅ”が付加文字列であるものだけが説明変数として自動的に一覧表示される。この後、手動で数値項目Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００７８】
データ解析手段３３は、この設定に基づき説明変数が装置履歴であるものだけを用いて回帰木分析を実行する。上記データ解析の結果を図２３に示す。図２３は、全工程の履歴を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００７９】
図２３によれば、工程２による処理装置差が最も有意（工程２が第一候補）であることが分かる。ただし、これだけでは時系列で見た場合の有意差が不明である。歩留りの低いロットが多い装置Ｖ２０２は、偶然に悪い期間（例えば、プロセス条件の一時的な変更等の使用装置以外の要因による変動があった期間等）に多くのロットを処理していた可能性もある。
【００８０】
（３）装置履歴＋上位候補時刻データ解析
図２４は、装置履歴データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴＋上位候補時刻データ解析」を選択すると、項目リストＬ１に時刻項目以外の全項目（目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２）が自動的に一覧表示される。この後、手動で数値項Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００８１】
データ解析手段３３は、これら設定された目的変数と説明変数に基づき、回帰木分析を実行する。以上の処理は「（２）装置履歴データ解析」と同じ処理を実行しこの状態での結果は図２３と同じである。
【００８２】
次に、データ解析手段３３は、得られたデータ解析結果を自動的に抽出し、回帰木図内および回帰木図の最上階層での集合分割の評価候補であるＥｖａｌｕａｔｉｏｎＤａｔａに挙げられた工程の処理時刻を新たに説明変数として追加する処理を行い、再度回帰木分析を自動的に実行する。
【００８３】
この際、全４工程の処理時刻が説明変数として追加される。ここで、工程の使用装置名に対応する工程処理時刻の項目は、付加文字列を除いたものが同一であることに基づき抽出する。具体的に説明すると、２回目の回帰木分析では１回目で指定された説明変数に対して、時刻項目工程１＿ｔ，工程２＿ｔ，工程３＿ｔ，工程４＿ｔが追加される。上記データ解析の結果を図２５に示す。図２５は、全工程の履歴と時刻データを指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。なお、この図２５に示す結果は前述した図２１と同じになる。２回目のデータ解析の処理結果は新たに”ＥＱ＿Ｔｉｍｅ”という名称のフォルダに保存される。
【００８４】
上記結果によれば、工程１の時刻による差が最も有意であることが確認できるが、ＥｖａｌｕａｔｉｏｎＤａｔａによる上位３候補（工程２，４の処理時刻）の差を確認すると、差がまったく同じで交絡していると想定される。実際にこれらの各工程でロットを処理した時刻の並び（順番）はまったく同じで、次の工程３についてもほとんど同様の推移（トレンド）が得られる。
【００８５】
１回目の回帰木分析で出力されるＥｖａｌｕａｔｉｏｎＤａｔａを３項目目までとした場合は、図２３に示したＥｖａｌｕａｔｉｏｎＤａｔａに挙がっている項目のうち工程１＿ｅは除外され、かつ回帰木図内にも存在しない。このため、２回目の回帰木分析では工程１＿ｔは説明変数として追加されないこととなる。図２６は、工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００８６】
（４）装置履歴＋第１候補時刻データ解析
図２７は、装置履歴＋第１候補時刻データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴＋第１候補時刻データ解析」を選択すると、項目リストＬ１には、目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２には時刻項目以外の全項目が自動的に一覧表示される。この後、手動で数値項目Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００８７】
データ解析手段３３は、この設定に基づき説明変数が装置履歴であるデータだけを用いて回帰木分析を実行する。以上の解析処理は「（２）装置履歴データ解析」と同じ解析処理を行う。解析処理の結果は自動的に抽出され、ＥｖａｌｕａｔｉｏｎＤａｔａに時刻データとして第一候補に挙がった項目のみを説明変数に追加し、再度回帰木分析を実行するまでを自動的に行う。本実施の形態では工程２の処理時刻が説明変数として追加される。図２８は、工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００８８】
前述した「（３）装置履歴＋上位候補時刻データ解析」の処理モードで行った解析では交絡している各工程の時刻データ項目がまとまってＥｖａｌｕａｔｉｏｎＤａｔａの上位に挙げられる。この場合、実際の数百工程の時刻項目を説明変数にすると、回帰木図内を含めて各工程の時刻データ項目だけが出力される。これら各工程の時刻データ項目は交絡している場合が多く、代表になる時刻データは１項目だけで良い場合が多い。そこで、あらかじめ各装置単位のデータを説明変数として歩留りに関する回帰木分析を行い、その結果で第１候補になった工程を用いて、第１候補工程の時刻データのみを時刻データの代表として説明変数に追加し、再度、歩留りに関する回帰木分析を行う。
【００８９】
２回目の解析の処理結果は、新たに”ＥＱ＿ＴｉｍｅＸＸ”（ＸＸ：最初に時刻データが挙がった順位）という名称のフォルダに保存される。図２９は、あるロット単位の不良率トレンドグラフを示す図である。図中横軸は各ロット、縦軸は各ロットの不良率である。図示の例では、非常に稀にしか発生しない不良が特定の８ロット（ＬＯＴ３，４，６，７，８，Ｋ，Ｌ，Ｍ）で高い割合で生じていることが示されている。
【００９０】
図３０は、説明変数と目的変数を指定した決定木を示す図である。説明変数として工程の使用装置名、処理時刻を用い、目的変数として前記不良発生の８ロットが”Ｈ”，他の１５ロットが”Ｌ”を指定して決定木分析を実行した結果が示されている。決定木図内の集合分岐は全て時刻項目である。そして、半導体製造工程においては、ほぼロット番号順に処理されていくので、時刻データは全てがほぼ交絡し、説明変数としてはほぼ等価であるといえる。従って、どの時刻データを採用しても大差がないので時刻データとしては最も有意とされた１項目に絞っても大差がなく、解析結果がむしろ解釈しやすいものとなる。
【００９１】
図３１は、説明変数と目的変数をさらに指定した決定木を示す図である。この図３１には、図３０の決定木図においてＡ工程の処理時刻Ａ＿ｔが最も目的変数に対して有意とされ、次に説明変数からＡ＿ｔ以外の時刻データを全て削除した説明変数による決定木分析結果が示されている。同図に示す最上階層での集合分岐は、図３０と同様にＡ工程の処理時刻Ａ＿ｔによるが、以下はほとんどこのＡ工程の処理時刻Ａ＿ｔと等価である時刻データが除去されているため、その背後に隠れていた使用装置による差が現れる。
【００９２】
これによると、工程ＤでＤＭ２号機を使用し、工程ＥでＥＭ４号機を使用した場合に高不良率となることが示されている。図３２は、図３１の結果を処理時刻別のトレンドグラフで表した状態を示す図である。図には工程ＥのＥＭ４号機の時間変動が高不良率の時間変動に影響を及ぼしていたことが明確に現れている。
【００９３】
以上説明した実施の形態２によれば、実施の形態１で説明したデータ解析の自動化では行えない作業、特にオペレータが注目したい事項を特定してデータ解析を行う場合や、生産設備の各生産工程に配置される生産装置の入れ替え等に対応したデータ解析を適切に行えるようになる。
【００９４】
（実施の形態３：一台装置工程に関する他の処理例）
前述した実施の形態１において説明したデータ解析手段３３の解析処理「（１）装置履歴＋時刻データ解析の設定」によれば、
▲１▼製造工程全体が先入れ先出しでロットを処理する場合、基本的にどの工程においてもロットの処理順番が同じであり、各工程の処理時刻は説明変数としては一つあれば十分である。
▲２▼一台装置工程は、装置間差を確認することができない工程であるため、問題となる要因をその一台装置の経時変化に求める。
▲３▼さらに、装置台数が多い工程ほど、回帰木分析で２分割集合間の有意差が出やすい。
【００９５】
これら▲１▼〜▲３▼の３つの事柄を応用して、本実施の形態では一台装置工程を含む全ての工程処理時刻は一つの説明変数で代表でき、かつ処理時刻による変動要因としては一台装置工程が他の複数台装置工程と同等以上に疑わしいと判断して、自動的に問題要因として疑わしい項目を絞り込むものである。
【００９６】
この実施の形態３では、製造工程全体が先入れ先出しでロットを処理する場合、データクレンジング／特徴化手段３２は、実施の形態１で説明した「（２）経時変化に基く装置名の変更設定及び処理」及び「（３）一台装置工程における項目名変更」で説明した各処理と異なり、説明変数名及び説明変数値名を変えずに一台装置工程に関する処理を行う。これにより、実施の形態１に比べて処理を単純にでき、データ件数、即ちレコード数（ロット数）が少ない場合における目的変数の外れ値による悪影響を抑えるようにしたものである。
【００９７】
図３３は、実施の形態３によるデータ解析手順を示すフローチャートである。処理内容を説明すると、実施の形態１で説明した「（１）異常値処理条件設定及び処理」を実行後の解析用データ１（ＤＡＴＡ１）に対し、
【００９８】
▲１▼一台装置工程抽出（リスト作成）
▲２▼代表項目として全一台装置工程の中で工程順の中間に位置する工程を選び、値は時刻（間隔尺度）とする。項目名は”一台装置工程時刻”とする。
▲３▼”一台装置工程時刻”を解析用データの説明変数に加え、他の全ての工程時刻項目は説明変数から除外する各処理を行い解析用データ２（ＤＡＴＡ２）を得る（ステップＳ２０）。図３４は、抽出された一台装置工程リストＲＥＰ２の一例を示す図表である。
【００９９】
この後、解析用データ２（ＤＡＴＡ２）に対しては、実施の形態１で説明した「（４）不要項目削除設定及び処理，（５）異常値割合による項目削除及びレコード削除設定及び処理，（６）時刻データの解析条件設定及び処理」、データ解析手段３３でのデータ解析処理（回帰木分析実施）、解析結果評価手段３４での解析結果評価を実行し（ステップＳ２１）、解析終了条件を満たすと（ステップＳ２２：Ｙｅｓ）、一台装置工程リストを含む報告レポートを作成し、これら報告レポートＲＥＰ１と一台装置工程リストＲＥＰ２を出力する（ステップＳ２３）。
【０１００】
次に、上記解析用データ２（ＤＡＴＡ２）に対する解析処理について説明する。図３５は、実施の形態３において用いる解析用データ２（ＤＡＴＡ２）の一例を示す図表である。図示のように、工程５の使用装置が全てＶ５０１であり、この工程５が一台装置工程とされる。図３６は、図３５に示すロット番号別の歩留りを示す図である。
【０１０１】
図３７は、図３５に示す各工程の使用装置名と処理時刻を説明変数、歩留り値を目的変数とした場合の回帰木と結果評価情報一覧を示す図である。図３７に示すように、最上階層での集合分岐の候補を示すＥｖａｌｕａｔｉｏｎＤａｔａの上位３項目は交絡しており、そのうちの一つの工程５＿ｔは、工程５の処理時刻である。工程５の使用装置工程５＿ｅは、種類数が１であるため回帰木分析実行時には説明変数から削除される。
【０１０２】
そして、ＥｖａｌｕａｔｉｏｎＤａｔａに挙げられたうち、目的変数に対してその時刻変動が効いているとされた工程（ここでは、工程１，工程２，工程３，工程４，工程５）について対応する処理時刻を示す項目が抽出され、各項目の処理装置を示す項目でその種類数が１であるものが抽出される。
【０１０３】
上記のように、実施の形態３によれば、一台装置工程時刻に代表される経時変化による有意差が大きいことを抽出することができる。また、一台装置工程リストＲＥＰ２の記載によって、一台装置工程に該当する工程を容易に把握できるようになる。図３７に示す例における一台装置工程は、「工程５であり、この一つの工程」となる。
【０１０４】
以上説明したデータ解析処理に係る方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、各種記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布することが可能な伝送媒体であってもよい。
【０１０５】
（付記１）所望するデータ解析に必要なデータをオリジナルデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いてデータの解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。
【０１０６】
（付記２）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析方法において、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの説明変数の異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。
【０１０７】
（付記３）前記データ抽出工程は、前記データの説明変数の項目名にカテゴリを識別するための付加文字列を付加し、
前記データ解析工程は、前記付加文字列に基づき説明変数のカテゴリを識別したデータ解析を行うことを特徴とする付記２に記載のデータ解析方法。
【０１０８】
（付記４）前記データ抽出工程は、前記生産工程が備える生産装置を示す説明変数の項目名に対しては製造工程名に装置のカテゴリを意味する付加文字列を付加し、該生産装置が生産対象を生産した処理時刻を示す説明変数の項目名に対しては製造工程名に時刻のカテゴリを意味する付加文字列を付加することを特徴とする付記３に記載のデータ解析方法。
【０１０９】
（付記５）前記データ抽出工程による付加文字列を付加する指示と、及び前記データ解析工程におけるカテゴリを識別したデータ解析の指示とをあらかじめ設定ファイルに設定する設定工程を含み、
前記データ抽出工程及び前記データ解析工程は、それぞれの処理実行時に前記設定ファイルを読み出し、該設定ファイルに設定された指示に基づく処理を実行することを特徴とする付記３または４に記載のデータ解析方法。
【０１１０】
（付記６）前記特徴化工程は、時刻経過による目的変数の変動に関する特徴を求め、前記データの説明変数として用いられる前記生産装置の装置名に対し、前記求められた特徴に対応する所定の記号を付加することを特徴とする付記２〜５のいずれか一つに記載のデータ解析方法。
【０１１１】
（付記７）ある一つの生産工程に生産装置が一台のみ設けられる場合、
前記データクレンジング工程は、前記一台の生産装置に相当する装置名の説明変数をデータ解析対象から外すとともに、前記生産工程の処理時刻に相当する説明変数の項目名に前記一台の生産装置のみによって構成された工程であることを示す付加文字列を付加し、
前記データ解析工程は、前記データクレンジング工程にて付加文字列が付加された前記一台の生産装置のみによって構成された工程に対しては処理時刻の説明変数を用いたデータ解析を行うことを特徴とする付記２〜６のいずれか一つに記載のデータ解析方法。
【０１１２】
（付記８）前記データクレンジング工程は、前記プロセスデータのうち生産装置に関係するデータ以外のデータを説明変数の項目名に基づいて削除することを特徴とする付記２〜７のいずれか一つに記載のデータ解析方法。
【０１１３】
（付記９）前記データクレンジング工程は、前記プロセスデータのうち項目及びレコードに対して所定の異常値の割合を超えた項目及びレコードと、目的変数の値が欠損あるいは異常なレコードとを削除することを特徴とする付記２〜８のいずれか一つに記載のデータ解析方法。
【０１１４】
（付記１０）前記データクレンジング工程は、前記生産装置の説明変数が処理時刻を示すデータを処理時刻データとして扱う設定と、生産工程における所定の周期を用いて期間を区切った際に該期間名のデータとして扱う設定とを選択可能なことを特徴とする付記２〜９のいずれか一つに記載のデータ解析方法。
【０１１５】
（付記１１）前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理する場合、
前記データ解析工程は、全ての生産工程の装置項目と、全ての生産工程における上位数Ｎの候補となる処理時刻を解析対象の説明変数として用いることを特徴とする付記２〜１０のいずれか一つに記載のデータ解析方法。
【０１１６】
（付記１２）前記生産工程が生産対象のロットを先入れ先出しせずに独立処理する場合、
前記データ解析工程は、各生産工程の処理時刻についてそれぞれが独立した処理時刻であるか否かを判別する独立時刻判別工程と、
前記独立時刻判別工程によって独立していないと判別された生産工程の処理時刻をまとめて一つの代表時刻の項目を作成する代表時刻項目作成工程と、
全ての生産工程の生産装置の項目と、前記代表時刻項目作成工程により作成された代表時刻の項目をデータ解析対象の説明変数として用いることを特徴とする付記２〜１０のいずれか一つに記載のデータ解析方法。
【０１１７】
（付記１３）前記データ解析工程は、解析すべきデータの特徴性や規則性を表すルールをデータマイニング技法により抽出することを特徴とする付記２〜１２のいずれか一つに記載のデータ解析方法。
【０１１８】
（付記１４）前記データ解析工程により前記データのロットあるいは処理時刻を変化させて得た複数の解析結果を用いて所定の総合評価値を得る評価工程を含むことを特徴とする付記２〜１３のいずれか一つに記載のデータ解析方法。
【０１１９】
（付記１５）前記評価工程は、前記ルールの信頼度を表す情報として、前記データ解析工程により解析して得たデータの集合を２分割する際の分割の明確度を表す集合分割評価値を求めることを特徴とする付記１４に記載のデータ解析方法。
【０１２０】
（付記１６）前記評価工程は、前記集合分割評価値として次の式で表されるｔの値を用いることを特徴とする付記１５に記載のデータ解析方法。
【数３】

【０１２１】
（付記１７）前記評価工程により得られた評価結果に基づき、前記生産設備の問題となる前記生産工程、前記生産装置、あるいは前記処理時刻のいずれかを絞り込んだ報告レポートを出力するレポート出力工程を含むことを特徴とする付記１６に記載のデータ解析方法。
【０１２２】
（付記１８）前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理し、ある一つの生産工程に生産装置が一台のみ設けられる場合、
前記特徴化工程は、前記一台の生産装置からなる製造工程のリストを作成することを特徴とする付記２〜１７のいずれか一つに記載のデータ解析方法。
【０１２３】
（付記１９）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析装置において、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出するデータ抽出手段と、
前記データ抽出手段により抽出されたデータの説明変数の異常値をデータクレンジングするデータクレンジング手段と、
前記データクレンジング手段によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化手段と、
前記特徴化手段により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析手段と、
を備えたことを特徴とするデータ解析装置。
【０１２４】
（付記２０）前記データクレンジング手段によりデータクレンジングされたデータの説明変数に対し、データ項目のカテゴリを示す付加文字列を付加する説明変数変換手段を備え、
前記データ解析手段は、前記付加文字列に基づきデータ項目のカテゴリを認識しカテゴリ別の解析処理を実行することを特徴とする付記１９に記載のデータ解析装置。
【０１２５】
（付記２１）前記説明変数変換手段は、データに含まれる説明変数及び目的変数の一覧を抽出し、データ解析に用いる説明変数及び目的変数を手動選択可能なことを特徴とする付記２０に記載のデータ解析装置。
【０１２６】
（付記２２）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析プログラムであって、該プログラムは、コンピュータに対し、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出させ、
前記抽出されたデータの説明変数の異常値をデータクレンジングさせ、
前記データクレンジングされたデータの目的変数の変動を表す特徴情報を求めさせ、
前記特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行わせ、
前記データ解析によって得られた解析結果に対する所定の評価を行わせることを特徴とするデータ解析プログラム。
【０１２７】
【発明の効果】
本発明によれば、解析対象のデータを適切に抽出し、データクレンジング及び特徴化を行ってデータ解析を実行するものであり、特に、データカテゴリに対応する付加文字列をデータの説明変数に付加することにより、データの種別やデータカテゴリ間の関連性を明確にして所望するデータ解析処理を自動実行できるようになり、省資源で効率的に行えるという効果を奏する。加えて、手動によりデータ解析の条件設定等を行う場合においても、操作ミスを防ぐことができ、所望する解析処理と解析結果を得ることができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の実施の形態に係るデータ解析装置に用いられる計算機システムのハードウェア構成を示す図である。
【図２】プロセスデータの流れを説明するための図である。
【図３】図１に示すシステム構成により実現されるデータ解析装置の機能ブロック図である。
【図４】本発明のデータ解析装置におけるデータの処理手順の概要を示すフローチャートである。
【図５】解析用データ１（ＤＡＴＡ１）の内容の一部を示す図表である。
【図６】データクレンジング後の解析用データ２（ＤＡＴＡ２）の内容の一部を示す図表である。
【図７】装置名に付与されるトレンドマークを示す図表である。
【図８】解析結果データ（ＤＡＴＡ３）に基づく評価処理の内容を説明するための図表である。
【図９】報告レポートの内容の一例を示す図である。
【図１０】データ解析処理の処理手順を示すフローチャートである。
【図１１】データクレンジング設定ファイルＲＦ３の一部を示す図表である。
【図１２】項目設定ファイルＲＦ２の一部を示す図表である。
【図１３】データクレンジング設定ファイルＲＦ３の一部を示す図表である。
【図１４】解析設定ファイルＲＦ５と探索設定ファイルＲＦ６の一部を示す図表である。
【図１５】本発明の実施の形態２における解析処理の選択画面を示す図である。
【図１６】入力データの内容を説明するための図表である。
【図１７】解析用データ２（ＤＡＴＡ２）の一例を示す図表である。
【図１８】解析処理結果の一例であるトレンドグラフを示す図である。
【図１９】手動解析時の項目設定時の画面を示す図である。
【図２０】手動解析時の項目設定後の画面を示す図である。
【図２１】全工程の使用装置と処理時刻を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２２】装置履歴データ解析時における項目設定画面を示す図である。
【図２３】全工程の履歴を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２４】装置履歴データ解析時における項目設定画面を示す図である。
【図２５】全工程の履歴と時刻データを指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２６】工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２７】装置履歴＋第１候補時刻データ解析時における項目設定画面を示す図である。
【図２８】工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２９】あるロット単位の不良率トレンドグラフを示す図である。
【図３０】説明変数と目的変数を指定した決定木を示す図である。
【図３１】説明変数と目的変数をさらに指定した決定木を示す図である。
【図３２】図３１の結果を処理時刻別のトレンドグラフで表した状態を示す図である。
【図３３】本発明の実施の形態３によるデータ解析手順を示すフローチャートである。
【図３４】抽出された一台装置工程リストＲＥＰ２の一例を示す図表である。
【図３５】実施の形態３において用いる解析用データ２（ＤＡＴＡ２）の一例を示す図表である。
【図３６】図３５に示すロット番号別の歩留りを示す図である。
【図３７】図３５に示す各工程の使用装置名と処理時刻を説明変数、歩留り値を目的変数とした場合の回帰木と結果評価情報一覧を示す図である。
【図３８】一般的なデータ解析処理の手順を示すフローチャートである。
【符号の説明】
１入力装置
２中央処理装置
３出力装置
４記憶装置
１０ａ〜１０ｎ工程装置
１１管理サーバ
ＤＢ１製造情報データベース
３０データ解析装置
３１データ抽出手段
３２データクレンジング／特徴化手段
３３データ解析手段
３４解析結果評価手段
３５報告レポート出力手段
４０表示項目
Ａ１総合判定内容
Ａ２統計的情報
Ａ３回帰木図
Ａ４箱髭図
Ａ５相関図
ＤＡＴＡ１解析用データ１
ＤＡＴＡ２解析用データ２
Ｋ１目的変数とする数値項目
Ｋ２説明変数とする文字項目
Ｌ１項目リスト
ＲＦ設定ファイル
ＲＦ１抽出条件設定ファイル
ＲＦ２項目設定ファイル
ＲＦ３データクレンジング設定ファイル
ＲＦ４特徴化設定ファイル
ＲＦ５解析設定ファイル
ＲＦ６探索設定ファイル
ＲＦ７報告条件設定ファイル
ＲＥＰ１報告レポート
ＲＥＰ２一台装置工程リスト
Ｓ１目的変数リスト
Ｓ２説明変数リスト[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data analysis method for grasping a relation between data widely handled in the industry and extracting a meaningful result for producing an industrially superior result.
[0002]
[Prior art]
For example, in order to improve the yield in the semiconductor manufacturing process, work to find out the factors that decrease the yield as quickly as possible based on the history of equipment used in the manufacturing stage, test results, design information, various measurement data, etc. Is called. For this purpose, statistical analysis based on data collected in advance is actually more economical than actual physical analysis, and this statistical analysis is efficient. is important.
[0003]
The inventors of the present application have previously filed Japanese Patent Application No. 2000-284578 (Japanese Patent Laid-Open No. 2001-306999) as an apparatus and method for statistically analyzing such data. When extracting a significant difference by statistical data analysis, what kind of data is to be analyzed by what analysis method is determined by the experience, technique, etc. possessed by the analyst. In this case, it is rare to make a decision based on a single analysis result. Generally, after interpreting each analysis result, the analysis conditions (data, analysis method, etc.) to be performed next are examined and determined. Analysis processing is performed.
[0004]
FIG. 38 is a flowchart showing a general data analysis processing procedure. The determined analysis conditions are set (step S50), data is analyzed based on the set analysis conditions (step S51), the obtained analysis results are interpreted (step S52), and decision making is performed (step S52). Step S53). If the decision is made (step S53: Yes), the statistical analysis is terminated, and if the decision cannot be made based on the current analysis result (step S53: No), the analysis condition is changed (step S54), and the changed analysis is performed. Perform data analysis based on conditions.
[0005]
[Problems to be solved by the invention]
The analysis conditions changed in step S54 are input to the execution program after the explanatory variables, objective variables, processing end conditions, etc. are set by the analyst. Therefore, an operation error or waiting for an execution result occurs at the time of this input, so that the analysis efficiency is lowered. Many of the analysis conditions specified here can be patterned to some extent, but the final analysis conditions can often be specified only after the analysis results are obtained. These are one of the factors that hinder analysis automation. In particular, it takes a long time for one analysis, and it appears prominently in processing such as data mining that handles many parameters.
[0006]
In order to perform data analysis efficiently, it is necessary to always be aware of what procedure the analyst wants to analyze and what the explanatory variables and objective variables should be. In general, each analyst recognizes the category of the explanatory variable for each analysis case, but conventionally, the processing result is output without determining what category the input data belongs to. It was.
[0007]
Especially for process data related to semiconductor manufacturing with a large number of explanatory variables despite the small number of records, the explanatory variables are intricately entangled, and the processing procedure and the explanatory variables appropriate for the analysis target and analysis purpose are selected appropriately. Otherwise, data analysis could not be carried out efficiently. In particular, time data plays an important role in process data analysis and is acquired in large quantities. However, it is more difficult to extract statistically significant differences because there are too many explanatory variables and confounding becomes easier (not independent). Correspondingly, a lot of computer resources such as calculation time are required.
[0008]
The present invention has been made in view of the above-mentioned problems, and can easily extract only data necessary for desired data analysis from production equipment and processing time data in a production process, and is effective in improving yield. An object is to provide a data analysis method capable of efficiently obtaining an analysis result. In addition, provision of a data analysis method capable of automatically executing data analysis of data having a small number of records and a large number of explanatory variables can be included in the object of the present invention.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, the present invention extracts the data necessary for data analysis from the production device name, data such as the processing time, etc., and the production device name, processing time, workmanship and yield, etc. of the problem that lowers the yield. Narrow down the data and perform data analysis. When this data analysis is executed, an additional character string for identifying the category of the data item is added to the data explanatory variable, so that the category can be recognized during the analysis process, and the analysis procedure corresponding to the category can be automatically executed. At this time, a desired analysis result is obtained by selecting and deleting the objective variable and explanatory variable of the data. Also, data unnecessary for the desired data analysis is deleted by the item name of the explanatory variable, and records and items having abnormal values are deleted to enable data analysis using only necessary data.
[0010]
According to the present invention, the additional character string added to the explanatory variable of the data to be analyzed without performing the time-consuming data analysis such as designating the analysis condition after executing a series of analysis results. Thus, the category of the data item can be recognized, and necessary data analysis processing can be automatically and sequentially executed, and the reliability of the analysis result can be improved. Even if the explanatory variables are intricately intertwined, such as process data related to semiconductor manufacturing with a small number of records and a large number of explanatory variables, it is possible to select explanatory variables and processing procedures suitable for the analysis target and analysis purpose. Data analysis can be performed efficiently.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1: Automatic analysis of data)
Exemplary embodiments of a data analysis apparatus and a data analysis method according to the present invention will be explained below in detail with reference to the accompanying drawings. The original data to be analyzed handled in the embodiment of the present invention is an example of process data related to semiconductor manufacturing, and this process data has time variations. In order to efficiently analyze this process data, items relating to time indicating time fluctuation are particularly important. In each embodiment described below, a production device (process device) arranged in each manufacturing process and its processing time are used to analyze a yield factor and the like, and a low yield factor is converted to a data mining technique (regression tree analysis, It is extracted by using decision tree analysis) to further improve the analysis efficiency.
[0012]
FIG. 1 is a diagram showing a hardware configuration of a computer system used in the data analysis apparatus according to the embodiment of the present invention. This data analysis apparatus is a central unit including an operation means for operating a device such as a keyboard and an input device 1 for inputting data via a network and the like, a CPU for performing analysis processing to be described later on the input data, and the like. The processing device 2, an output device 3 comprising a display means such as a CRT and LCD, and a printing means such as a printer, and a storage device 4 for storing and holding data such as an HDD.
[0013]
FIG. 2 is a diagram for explaining the flow of process data. A plurality (N) of process apparatuses 10a to 10n are arranged in a manufacturing process of a manufacturing target such as a semiconductor. Each process apparatus 10a to 10n sends process data in each manufacturing process to the management server 11. This process data includes the processing time at which the manufacturing object is manufactured in each process, the name of the device used for manufacturing, the yield, and the like. The management server 11 creates the manufacturing information database DB1 based on the input process data. This manufacturing information database DB1 is stored in the storage device 4 shown in FIG. 1 via a network or the like.
[0014]
FIG. 3 is a functional block diagram of the data analysis apparatus realized by the system configuration shown in FIG. The data analysis apparatus 30 receives process data of each process apparatus stored in the manufacturing information database DB1 shown in FIG. The data analysis apparatus 30 includes a data extraction unit 31, a data cleansing / characterizing unit 32, a data analysis unit 33, an analysis result evaluation unit 34, and a report report output unit 35, each of which is a setting file RF (RF1 to RF7). The process is executed according to the setting information described in (1). It should be noted that a series of processes in each means and automatically executed is referred to as automatic analysis.
[0015]
The data analysis apparatus 30 starts an analysis processing program such as regression tree analysis, and after specifying necessary input / output file names, objective variables, and explanatory variables, automatically (1) according to various analysis flow setting file groups. Data extraction from the database, (2) data cleansing and characterization, (3) regression tree analysis, (4) analysis result evaluation, (5) problem process, equipment and time are extracted from the process history and reported Report REP1 is output.
[0016]
At this time, data with the objective variable item added to the lot number (corresponding to the number of data items, that is, the number of records), the device name of each process, and the processing date / time is extracted and processed, and data analysis is performed by the data mining technique. Narrow down the process, equipment and time to be noted.
[0017]
FIG. 4 is a flowchart showing an outline of a data processing procedure in the data analysis apparatus of the present invention. The analysis data DATA to be analyzed extracted from the manufacturing information database DB1 is analyzed by executing the analysis program according to the settings of the program initialization file INI (including setting files RF1 to RF7 described later). Here, based on the setting of the program initialization file INI, the analysis program sets the additional character string _xx that identifies the category of the data item in the analysis data DATA. This additional character string _xx is recognized by the analysis program.
[0018]
When the corresponding processing mode is designated in the analysis program (step S1), the processing corresponding to the category type indicated by xx (automatic variable selection such as objective variable and explanatory variable (step S2) or variable manual selection (step After S3)), analysis processing such as regression tree analysis is executed (step S4). In the analysis process, it is selected whether or not to re-execute in accordance with the analysis target and analysis procedure set in advance based on the first analysis process (step S5). At the time of re-execution, the execution result is extracted (step S6), and variables such as objective variables and explanatory variables are automatically selected or deleted (step S7), and the process returns to step S4 to execute the next analysis process and finally Obtain analysis results.
[0019]
(About data extraction means 31)
The data extraction means 31
(1) Extraction condition setting and processing for data extraction and conversion
(2) Data mining condition setting and processing
(3) Device name and time setting and processing
Are executed respectively.
[0020]
(1) About extraction condition setting and processing of data extraction and conversion
In the data extraction and conversion, according to the extraction condition setting file RF1 data extraction program, the process data stored in the manufacturing information database DB1 is set at a predetermined time or periodically at a set condition (target product type, period, Item), data obtained by adding the objective variable item to the lot number, process name, device name, and processing date / time is extracted.
[0021]
(2) Data mining condition setting and processing
According to the item setting file RF2 and the mining condition setting program, a record identification name, an explanation variable name, an objective variable name, and an explanation variable value name are selected, and analysis data 1 (DATA1) is created and output. Specifically, the following contents are set using the item setting file RF2.
Record name: Lot number
Explanation variable item: Manufacturing process (large process name + small process name)
Explanation variable value name: device, time
Objective variable name: YIELD, characteristic value (yield), etc.
[0022]
(3) Device name and time setting process
The device name and time are set as explanatory variable values required for the manufacturing process history analysis of this analysis device. In the analysis data 1 (DATA1), the record name is arranged in the first column, and the explanatory variable and the objective variable are arranged in the second and subsequent columns. FIG. 5 is a chart showing a part of the contents of the analysis data 1 (DATA1). As shown in the figure, the explanatory variable item name is identified by adding the value name to the explanatory variable value name. In this example, “_device” or “_time” is added to the manufacturing process name, and the process and the contents of its value can be identified from the name. Based on the explanatory variable item and the explanatory variable value name set in this item setting, the subsequent data cleansing / characterizing means 32 and data analyzing means 33 can identify the manufacturing process in association with the time or apparatus. In each process of the production line, a plurality of devices (for example, 6nw2 and 6nw4 in the illustrated process 01_device) are arranged and are operating in parallel to perform production.
[0023]
(About data cleansing / characterizing means 32)
The data cleansing / characterizing means 32 executes the following data cleansing / characterizing process according to the data cleansing setting file RF3, the characteristic setting file RF4, and the data cleansing / characterizing program.
(1) Abnormal value processing condition setting and processing
(2) Device name change setting and processing based on changes over time
(3) Change of item name in one machine process
(4) Unnecessary item deletion setting and processing
(5) Item deletion and record deletion setting and processing by abnormal value ratio
(6) Time data analysis condition setting and processing
[0024]
Each process of said (1)-(6) is demonstrated in detail.
(1) About abnormal value processing condition setting and processing
Based on the setting contents of the data cleansing setting file RF3, the data cleansing / characterizing means 32 replaces the explanatory variable item value of the analysis data 1 (DATA1) with a specific value. The details of the data cleansing process will be described. FIG. 6 is a chart showing a part of the content of analysis data 2 (DATA2) after data cleansing. The location where the explanatory variable item value of analysis data 1 (DATA1) shown in FIG. 5 is Null (missing value) is a specific value (in the example shown, the missing numerical value is 99999, the missing character as shown in FIG. 6). The column value is replaced with nop).
[0025]
Then, the data cleansing / characterizing means 32 sets whether to analyze this specific value as one of values or as a missing value based on the setting contents of the data cleansing setting file RF3. As for the abnormality determination criteria, the abnormal value definition and replacement value are set, and the abnormal value is processed according to the setting.
[0026]
(2) Device name change setting and processing based on changes over time
Even the same device may suddenly change to an abnormal device from some time due to some trouble. The data cleansing / characterizing means 32 can further narrow down the problem by capturing the fluctuation of the objective variable value due to this sudden change and assigning another name to the device that has changed to an abnormal state. .
[0027]
The data cleansing / characterizing means 32 confirms the feature of the objective variable transition according to the processing time for all devices by feature extraction by wavelet transform or the like (noise removal by filtering, etc.), and for devices having strong features with respect to the set criteria. Is characterized by the timing and duration of sudden rise and fall. For example, the amount of sudden fluctuation based on the setting of the program initialization file INI is based on 0.8 times or more of the entire standard deviation. Further, the device name is converted into a device name to which the transition feature information is added. In this embodiment, the transition feature information is added to the device name as follows.
[0028]
(1) Add a delimiter (@).
(2) The period is divided into a maximum of three according to the time when the objective variable value corresponding to the device suddenly rises or falls. wear.
(3) A trend mark is attached as a symbol representing the transition shape of the device.
(4) A one-digit number (0: weak to 9: strong) is added as a symbol representing the strength of the transition feature.
[0029]
FIG. 7 is a chart showing trend marks given to the device names. As shown in the figure, the trend mark divides the entire analysis data 2 (DATA2) shown in FIG. 6 into the first half, the middle stage, and the second half, and shows the transition state of the objective variable value for each period. The state obtained by feature extraction such as wavelet transform is divided into 1 (-^: the first half is low and the second half is high) to 5 (-: no feature) as shown in the figure.
[0030]
Using the process 01_device shown in FIG. 6 as an example of the device name to which the transition feature information is added,
6nw2 @ F- ^ 7
Is obtained. The above example shows that the device name is 6nw2, the shape of the transition of the objective variable value in the first half period (F) is low in the first half and high in the second half, and the transition feature is 7. When there is no transition feature, 6nw2 @ -0 is obtained. In this way, the data cleansing / characterizing means 32 creates a device name to which the transition feature information is added for each process device. Since the analysis data 2 (DATA2) shown in FIG. 6 is a part of the entire data, the transition feature information after the feature extraction using the entire data is actually added.
[0031]
(3) About item name change in one machine process
As described above, in each process of various production lines including semiconductors, a plurality of devices are often arranged and operated in parallel. However, there is a case where only one device is arranged and operated in one process. A process having such a configuration is defined as a “single device process”. In a single device process, since a difference between a plurality of devices cannot be confirmed, a factor causing a problem is obtained based on a change over time in the single device arranged.
[0032]
If there is only one type of device name for a process that becomes an explanatory variable, the difference between the devices in that process cannot be obtained, so it is excluded from the explanatory variable. “_One device” is added to the data item name indicating the processing time of the corresponding process. In the example illustrated in FIG. 6, the processing time of the process 04_device is changed to “process 04_one device”. Note that if there is no corresponding processing time data item, or if the corresponding processing time data item is also a single value, no change is made. By this additional character, it is recognized that it is a time data item of one device process at the time of analysis.
[0033]
(4) Unnecessary item deletion setting and processing
There are cases where explanatory variable items that are unnecessary for analysis are mixed in the stage of acquiring data, and in this case, settings are made to delete unnecessary items before analysis. In this example, as an initial setting, a plurality of character strings such as inspection process names included in the explanatory variable item names are set in order to exclude inspection processes that do not directly process products. Delete unnecessary items based on the settings.
[0034]
(5) About item deletion and record deletion setting and processing by abnormal value ratio
Delete items and records where the percentage of missing values and defined abnormal values exceeds the default value. For example, an item in which the ratio of missing values and defined abnormal values is 60% or more is deleted. Delete records where the percentage of missing values and defined abnormal values is 70% or more. Records with missing or abnormal objective variable values are deleted. Of the explanatory variables, the item of nominal scale does not analyze the item whose value type is 1 or 100 or more.
[0035]
(6) Time data analysis condition setting and processing
The device treats the time data in the explanatory variables as an order scale (processing unit in time units) by default. In addition, as a second setting, the period is divided using a period existing in the manufacturing process, and it is also handled as a nominal scale (a processing unit of name unit) processed into a name representing the period. For example, there is a change cycle of a manufacturing worker.
[0036]
(About the analysis process of the data analysis means 33)
The data analysis means 33 executes the following analysis processing by regression tree analysis according to the contents of the analysis setting file RF5 and the analysis processing program.
[0037]
(1) Device history + time data analysis settings
Different analysis processes are executed as follows according to the processing mode of the manufacturing process.
(1) When the entire manufacturing process processes lots on a first-in first-out basis
Since the processing order of lots is basically the same in any process, it is sufficient that there is only one explanatory time for the processing time of each process, and all the device items and the time of the first candidate are used as explanatory variables. Run the analysis. If there is a slight change in lot processing order, the number of time data is increased, and “device history + higher N candidate time data analysis” is executed. N is an appropriate range of 1-20.
[0038]
(2) When the entire manufacturing process is a first-in first-out process of lots
Using the time data independence test method, the time data of the processes that are not independent from each other are combined into one representative time item, and are narrowed down to a representative time item group including only independent time data (time data narrowing down). Thereafter, “apparatus history + time data analysis” is executed using all the apparatus name items and the selected representative time item group as explanatory variables.
[0039]
(2) Analysis end condition setting
The regression tree analysis end condition is set, for example, when the standard deviation of the divided set becomes 0.5 times or less of the whole.
[0040]
(3) Execution of analysis
The data analysis means 33 performs analysis according to the contents of the analysis setting file RF5 and the analysis processing program. When a plurality of objective variables are set, the set items are sequentially selected and analyzed.
[0041]
(1) Processing contents of “device history + top N candidate time data analysis”
1. Only the device name item and the processing time item of one device are used as explanatory variables, and a regression tree analysis is performed for a specified objective variable.
2. Next, the time data of the top N processes that are candidates in the analysis result are added to the explanatory variables, and the regression tree analysis is performed again. If there is no time data for the item listed as the kth candidate, the time data of the item having time data after the k + 1th candidate is searched for and added. If there is not, the fact that there was no, and if so, what number the candidate was, are clearly indicated in the analysis result (1 ≦ k ≦ N).
[0042]
(2) Processing contents of “Device history + Time data analysis”
Regression tree analysis is performed with all device name items and the selected representative time items as explanatory variables.
[0043]
(About extraction and evaluation of analysis result of analysis result evaluation means 34)
The automatic analysis by the data analysis means 33 described above does not end with one execution. The analysis target period or the target lot is changed, and the analysis result data (DATA3) is obtained by performing repeated analysis while changing various setting values of the data cleansing / characterizing means 32 and the data analysis means 33 from the initial values. A plurality of obtained results indicated by the analysis result data (DATA3) are evaluated to obtain a more reliable analysis result.
[0044]
Here, an outline of regression tree analysis and t-test will be described. In the regression tree analysis, a set of records composed of explanatory variables indicating a plurality of attributes and objective variables affected thereby is determined, and attributes and attribute values that most affect the objective variables are discriminated. The analysis result evaluation means 34 outputs a rule indicating data characteristics and regularity.
[0045]
The regression tree analysis process is realized by repeatedly dividing the set into two based on the parameter value (attribute value) of each explanatory variable (attribute). When the set is divided, when the sum of squares of the objective variables before the division is S0 and the sum of squares of the objective variables of the two sets after the division is S1 and S2, ΔS shown in the following equation (1) is the maximum. As described above, the explanatory variable of the record to be divided and its parameter value are obtained.
[0046]
ΔS = S0− (S1 + S2) (1)
[0047]
The explanatory variables and parameter values obtained here correspond to branch points in the regression tree. Thereafter, the same processing is repeated for the divided sets, and the influence of the explanatory variable on the objective variable is examined. The above is a generally well-known method of regression tree analysis. In order to understand the clarity of set partitioning in more detail, the following parameters (a) to (a) in addition to ΔS are associated with a plurality of upper partitioning candidates. d) is also used as a quantitative evaluation of the regression tree analysis results.
[0048]
(A) S ratio:
This is a reduction rate of the sum of squares by the set division, and is a parameter indicating how much the sum of squares has been reduced by the set division. The smaller this value is, the greater the effect of set partitioning is, and the set partition is clearly performed, so the significant difference is large.
[0049]
S ratio = ((S1 + S2) / 2) / S0 (2)
[0050]
(B) t value:
The set is divided into two by executing the regression tree analysis process, and this is a value for testing the difference between the averages (/ X1, / X2) of the two divided sets. Here, “/” indicates an overline. The statistical t-test is a standard indicating a significant difference in the mean value of the objective variable in the divided set. If the degree of freedom, that is, the number of data is the same, the larger the t, the more clearly the set is divided and the greater the difference.
[0051]
At this time, if there is no significant difference in the variance of the divided sets, the t value is obtained by the following equation (3). If there is a significant difference in the variance of the divided sets, the t value is obtained by the following equation (4). Ask for. Here, N1 and N2 are the numbers of elements of set 1 and set 2, respectively. Further, / X1 and / X2 are averages of the respective sets after the division. S1 and S2 are the sum of squares of the objective variable of each set after division.
[0052]
[Expression 1]

[0053]
[Expression 2]

[0054]
(C) Difference in mean value of objective variables of divided sets:
The greater this value, the greater the difference.
[0055]
(D) Number of data in each divided set:
The smaller the difference between the two, the smaller the influence of abnormal values (noise).
[0056]
(1) Evaluation of analysis results
The evaluation is performed based on the search setting file RF6 and the analysis result evaluation program. This evaluation information is calculated for each analysis result, but can be compared between each analysis result.
(2) Search for highly reliable analysis results
Based on the setting of the search setting file RF6, the analysis result evaluation unit 34 determines whether to search the analysis result data (DATA3) with high reliability. For example, a comparison evaluation value is used on the condition that “comparison is performed using the t-test value of the first branch of the regression tree. With this comparative evaluation value, a more reliable analysis result can be searched. Then, by limiting the range in which each setting value of each setting file RF1 to RF5 is changed or limiting the automatic analysis time, the search is terminated, and the obtained plurality of analysis results, the respective comprehensive evaluation values, and the ranks are displayed. Obtain the most reliable analysis result obtained. FIG. 14B shows a part of the setting range and method setting range. In accordance with this setting, a diversified analysis is performed by utilizing the bipartite confounding degree and the abnormal value ratio of each item, and a clearer analysis result is searched.
[0057]
FIG. 8 is a chart for explaining the contents of the evaluation process based on the analysis result data (DATA3). As shown in the figure, by the following evaluation process, each process device and each evaluation value (t-test value described below, device names and number of low and high group devices, the average value, etc.) are calculated as item names, and t Rank No is given assuming that the problem is large in descending order of the test value.
[0058]
(About report report output means 35)
The report report output means 35 creates and outputs a report report for the most reliable analysis result.
(1) Report report creation
The analysis results include the rule information of the regression tree in the file, the statistical value for evaluation of a predetermined number of candidates for the first branch (for example, the top 20 candidates), the degree of entanglement between the two candidates, and the report information file in the HTML file. A significant difference between the regression tree diagram and the main two branches and the top two candidates is output in a simple sentence, and a box map or correlation diagram is associated with the data.
[0059]
(2) Specific reporting method
The analysis result report is processed based on the setting of the report condition setting file RF7 and the report processing program. For example, the report content is displayed on the screen and an alarm is notified to a preset e-mail address, and a report report and a report report WEB address are reported.
[0060]
FIG. 9 is a diagram illustrating an example of the content of a report report. Report report REP1 is based on the analysis result: (1) Comprehensive judgment contents (simple sentences explaining significant difference between main 2 branches and top 2 candidates) A1, (2) Statistical information A2, (3) Regression tree diagram A3, (4) A box diagram A4 or correlation diagram A5 corresponding to the regression tree diagram A3 is displayed. The regression tree diagram A3 is displayed in a description format such as HTML, for example. By clicking a desired process device, the corresponding (4) box diagram A4 or correlation diagram A5 linked can be selectively displayed. ing.
[0061]
FIG. 10 is a flowchart showing the processing procedure of the data analysis processing. As shown in the figure, the data extraction means 31 extracts and converts data from the manufacturing information database DB1 to obtain analysis data (DATA1) (step S11). Next, the data cleansing / characterizing means 32 obtains analysis data (DATA2) obtained by cleansing and characterizing the data of analysis data (DATA1) (step S12).
[0062]
Next, the data analysis means 33 obtains analysis result data (DATA3) by performing data mining on the analysis data (DATA2) after cleansing and characterization using a regression tree analysis technique (step S13). Next, the analysis result evaluation means 34 evaluates the analysis result using the analysis result data (DATA3) (step S14). At the time of this evaluation, a reliable result is searched. For example, it is determined whether or not the set analysis end condition is satisfied (step S15). If not satisfied (step S15: No), the analysis target period or target lot is changed, and various setting values of the data cleansing / characterizing means 32 and the data analysis means 33 are repeatedly changed from the initial values. Analysis is performed to obtain more reliable analysis result data (DATA3).
[0063]
When an analysis result that satisfies the analysis end condition of the data analysis means 33 is obtained (step S15: Yes), the report report REP1 is output by the report report output means 35 (step S16), and the data analysis process is ended. The above data analysis processing is automatically executed on a weekly or monthly basis, and reports obtained by data analysis of equipment history, test results, design information, various measurement data, etc. used in the manufacturing stage such as the semiconductor manufacturing process. The report REP1 makes it possible to easily find the factor that decreases the yield and to improve the yield.
[0064]
FIG. 11 to FIG. 14 are diagrams showing examples of setting contents of a setting file for setting the analysis processing. FIG. 11 is a chart showing a part of the data cleansing setting file RF3. Replacement of the explanatory variable item value of the analysis data 1 (DATA1) when missing, replacement with a specific value, an abnormal value ratio for deleting items and records, and the like are set. FIG. 12 is a chart showing a part of the item setting file RF2. It is set for each of the identification character examples of the date type item and the device type item. FIG. 13 is a chart showing a part of the data cleansing setting file RF3. Sets the search string when deleting item names that use a specific string.
[0065]
FIG. 14 is a chart showing a part of the analysis setting file RF5 and the search setting file RF6. FIG. 14 (a) is a part of the analysis setting file RF5, which shows the setting contents of time data analysis and the setting conditions of the analysis conditions. The above-mentioned “apparatus history + top N candidate time data analysis” or “apparatus history” is shown. + Independent time data analysis "is set, and the target variable, explanatory variable designation, analysis processing end condition, and the like are set. FIG. 14B is a part of the search setting file RF6, and is set for performing multi-faceted analysis, and a basic utilization method of two-part entanglement and a selection method based on an abnormal value ratio of items are set. ing.
[0066]
The above setting contents can be easily obtained by adding or changing necessary settings to the analysis flow setting file group (setting files RF1 to RF7) when there is a condition specific to the target to which the analysis apparatus is applied. Yes.
[0067]
(Embodiment 2: Data analysis mode designation selection process)
The second embodiment of the present invention is configured to automatically perform the analysis processing and later when analysis target data that has already been cleansed has been obtained. The configuration of the data analysis device 30 is the same as that of the first embodiment, and a description thereof is omitted. The specific data analysis processing in the data analysis means 33 described in the first embodiment will be supplementarily described in the second embodiment.
[0068]
FIG. 15 is a diagram showing a selection screen for analysis processing according to Embodiment 2 of the present invention. When the file name of each input / output data is selected, a display item 40 is displayed as shown. The display item 40 displays four processing analysis modes, and one of 1 to 4 can be selected. 1 “device history data analysis”, 2 “device history + higher (n) candidate time data analysis”, 3 “device history + first candidate time data analysis”, and 4 “Manual (manual analysis)”.
[0069]
FIG. 16 is a chart for explaining the contents of the input data. This figure corresponds to FIG. 5 (manufacturing information database DB1) described above. As shown in FIG. 16, the input data is a CSV format file that includes, for each lot number, the name and processing time of each process used as an explanatory variable, the yield value as a target variable, and the like.
[0070]
The data extraction means 31 of the data analysis device 30 takes in the manufacturing information database DB1 and automatically analyzes the data to narrow down the device name and processing time of the process to be noted. In the item setting file RF2 described with reference to FIG. 12, the device name used for the explanatory variable is an additional character string “_e” added to the process name, and the additional character string “_t” is added to the process name at the processing time. That is, the name of the device used in the process A and the processing time are “A_e” and “A_t”, respectively.
[0071]
In addition to “_t”, “_time” and “_DAY_TIME” are defined as an additional character string indicating a date type item, and the latter two are converted into a representative identification character string “_t” and handled. The same applies to the identification character string indicating the device item. As a result, even data collected from different data sources can be handled as the same category data type.
[0072]
With the above settings, at the time of analysis processing, the data with additional character strings “_t”, “_time”, and “_DAY_TIME” are regarded as time data, while the additional character strings “_e”, “_device”, “_EQUIIP” Data with “” is regarded as device data, and analysis processing corresponding to each processing mode is automatically executed. Note that the automatic execution is performed when a mode other than the above-described 4 “Manual (manual analysis)” is selected.
[0073]
Further, based on the setting contents of the extraction condition setting file RF1 and the item setting file RF2 constituting part of the program initialization file INI, it is determined what category of data is included as an explanatory variable. Then, based on the setting of the item setting file RF2 illustrated in FIG. 12, it is determined that the process using device and the processing time are included. Here, if there is no identification character string that matches the setting contents of the program initialization file INI in the items in the data, all are manually analyzed (determination of the timing of step S1 when the processing mode is specified in FIG. 4).
[0074]
FIG. 17 is a chart showing an example of analysis data 2 (DATA2). The data analysis means 33 performs regression tree analysis on this analysis data. As a result, an analysis result is obtained that the apparatus used in steps 1 to 4 and the processing time affect the yield of all 8 lots. FIG. 18 is a diagram illustrating a trend graph that is an example of the analysis processing result. The horizontal axis is the lot number, and the vertical axis is the yield. In the figure, with respect to the devices V201 and V202 constituting the process 2_e, the yield of four lots (Lot 2 to Lot 5) processed in a certain period is high, and the yield of the lot processed by the apparatus V202 of the process 2 is low. Obtained as an analysis result.
[0075]
Next, analysis processing for each processing mode will be described.
(1) Manual (manual analysis)
When “Manual (manual analysis)” is selected on the analysis processing selection screen shown in FIG. 15, necessary items are displayed in the variable selection item list L1. FIG. 19 is a diagram illustrating a screen at the time of setting an item during manual analysis. In the item list L1, a numerical item K1 as an objective variable and a character item K2 as an explanatory variable are displayed in a list, and analysis is performed after manually selecting necessary numerical items K1 and character items K2 and setting conditions. Execute the process.
[0076]
FIG. 20 is a diagram showing a screen after item setting during manual analysis. In the illustrated example, the yield selected as the objective variable is displayed in the objective variable list S1. As explanatory variables, a state in which used devices and processing times (process 1_e to process 4_e, process 1_t to process 4_t) of all processes are selected is displayed in the explanatory variable list S2. The data analysis means 33 executes an analysis processing program based on this setting. The result of the data analysis is shown in FIG. FIG. 21 is a diagram showing a regression tree and a result evaluation information list obtained as a result of performing data analysis by designating use devices and processing times for all processes.
[0077]
(2) Device history data analysis
FIG. 22 is a diagram showing an item setting screen at the time of device history data analysis. When “analysis of device history data” is selected on the analysis processing selection screen shown in FIG. 15, only the character item K2 of the item list L1 includes “_e” indicating the device used in each process as an additional character string. Are automatically listed as explanatory variables. Thereafter, the objective variables and explanatory variables listed in the numerical item K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0078]
Based on this setting, the data analysis means 33 performs regression tree analysis using only the explanatory variable whose history is the device history. The results of the data analysis are shown in FIG. FIG. 23 is a diagram showing a regression tree and a list of result evaluation information obtained as a result of performing data analysis by designating all process histories.
[0079]
According to FIG. 23, it can be seen that the difference between the processing devices in step 2 is the most significant (step 2 is the first candidate). However, this alone does not reveal the significant difference when viewed in time series. The device V202 with many lots with low yields may have processed many lots by chance during a bad period (for example, a period when there was a change due to factors other than the device used, such as temporary changes in process conditions). There is also.
[0080]
(3) Device history + upper candidate time data analysis
FIG. 24 is a diagram showing an item setting screen at the time of device history data analysis. If “device history + higher candidate time data analysis” is selected on the analysis processing selection screen shown in FIG. 15, all items other than time items (numerical items K1 as objective variables, character items as explanatory variables) are displayed in the item list L1. K2) is automatically listed. Thereafter, the objective variables and explanatory variables listed in the numerical term K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0081]
The data analysis means 33 performs regression tree analysis based on the set objective variable and explanatory variable. The above processing executes the same processing as “(2) Device history data analysis”, and the result in this state is the same as FIG.
[0082]
Next, the data analysis means 33 automatically extracts the obtained data analysis results, and the processes listed in Evaluation Data that are evaluation candidates for set partitioning in the regression tree diagram and in the top hierarchy of the regression tree diagram. A process for newly adding the processing time as an explanatory variable is performed, and the regression tree analysis is automatically executed again.
[0083]
At this time, the processing times of all four steps are added as explanatory variables. Here, the item of the process processing time corresponding to the name of the device used in the process is extracted based on the fact that the items except the additional character string are the same. Specifically, in the second regression tree analysis, the time item process 1_t, process 2_t, process 3_t, and process 4_t are added to the explanatory variables specified in the first time. The results of the data analysis are shown in FIG. FIG. 25 is a diagram illustrating a regression tree and a result evaluation information list as a result of performing data analysis by designating history and time data of all processes. The result shown in FIG. 25 is the same as FIG. 21 described above. The processing result of the second data analysis is newly saved in a folder named “EQ_Time”.
[0084]
According to the above result, it can be confirmed that the difference due to the time of step 1 is the most significant, but when the difference between the top three candidates (processing times of steps 2 and 4) by Evaluation Data is confirmed, the difference is exactly the same and confounded It is assumed that The arrangement (order) of the times at which lots are actually processed in each of these processes is exactly the same, and almost the same transition (trend) is obtained for the next process 3.
[0085]
When the evaluation data output in the first regression tree analysis is up to the third item, step 1_e is excluded from the items listed in the evaluation data shown in FIG. 23 and also exists in the regression tree diagram. do not do. For this reason, in the second regression tree analysis, step 1_t is not added as an explanatory variable. FIG. 26 is a diagram illustrating a regression tree and a result evaluation information list obtained as a result of performing data analysis again by limiting processes.
[0086]
(4) Device history + first candidate time data analysis
FIG. 27 is a diagram illustrating an item setting screen when analyzing device history + first candidate time data. When “device history + first candidate time data analysis” is selected on the analysis processing selection screen shown in FIG. 15, the item list L1 includes a numeric item K1 as an objective variable and a character item K2 as an explanatory variable at a time. All items except items are automatically listed. Thereafter, the objective variables and explanatory variables listed in the numerical item K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0087]
Based on this setting, the data analysis means 33 performs regression tree analysis using only data whose explanatory variable is the device history. The above analysis processing is the same as “(2) Device history data analysis”. The result of the analysis process is automatically extracted, and only the items listed as the first candidate as time data are added to the evaluation data as explanatory variables, and the process until the regression tree analysis is executed again is automatically performed. In the present embodiment, the processing time of step 2 is added as an explanatory variable. FIG. 28 is a diagram showing a regression tree and a list of result evaluation information as a result of performing data analysis again by limiting processes.
[0088]
In the analysis performed in the processing mode of “(3) device history + higher candidate time data analysis” described above, the time data items of each entangled process are collectively listed at the top of Evaluation Data. In this case, if time items of actual several hundred steps are used as explanatory variables, only time data items of each step including the regression tree diagram are output. In many cases, the time data items of these processes are entangled, and there is often only one item of time data as a representative. Therefore, a regression tree analysis on the yield is performed using the data of each device as an explanatory variable in advance, and using the process that is the first candidate as a result, only the time data of the first candidate process is used as the representative variable of the time data. And perform regression tree analysis on yield again.
[0089]
The processing result of the second analysis is newly saved in a folder named “EQ_TimeXX” (XX: the first time data is listed). FIG. 29 is a diagram showing a defect rate trend graph in a certain lot unit. In the figure, the horizontal axis represents each lot, and the vertical axis represents the defect rate of each lot. In the illustrated example, it is shown that defects that occur very rarely occur at a high rate in specific eight lots (LOT3, 4, 6, 7, 8, K, L, M).
[0090]
FIG. 30 is a diagram illustrating a decision tree in which explanatory variables and objective variables are specified. Shows the result of executing decision tree analysis using the device name and processing time of the process as explanatory variables, specifying 8 “L” for the defect occurrence and “L” for the other 15 lots as target variables. ing. All set branches in the decision tree are time items. In the semiconductor manufacturing process, since the processing is performed almost in the order of lot numbers, all the time data are almost entangled and it can be said that the explanatory variables are almost equivalent. Therefore, no matter which time data is adopted, there is no great difference, so even if it is narrowed down to one item that is most significant as time data, there is no big difference, and the analysis result is rather easy to interpret.
[0091]
FIG. 31 is a diagram showing a decision tree in which an explanatory variable and an objective variable are further specified. FIG. 31 shows a decision tree analysis using an explanatory variable in which the processing time A_t of step A is the most significant for the objective variable in the decision tree diagram of FIG. 30 and all time data other than A_t are deleted from the explanatory variables. Results are shown. The set branch at the top layer shown in the figure depends on the process time A_t of the A process as in FIG. 30, but since the time data equivalent to the process time A_t of the A process is almost removed, Differences appear due to the equipment used behind the scenes.
[0092]
According to this, it is shown that when the DM2 machine is used in the process D and the EM4 machine is used in the process E, a high defect rate is obtained. FIG. 32 is a diagram illustrating a state in which the result of FIG. 31 is represented by a trend graph for each processing time. The figure clearly shows that the time fluctuation of the EM No. 4 machine in the process E has influenced the time fluctuation of the high defect rate.
[0093]
According to the second embodiment described above, work that cannot be performed by the automation of data analysis described in the first embodiment, particularly when data analysis is performed by identifying a matter that an operator wants to pay attention to, or for each production process of a production facility It becomes possible to appropriately perform data analysis corresponding to replacement of the production apparatus arranged in the factory.
[0094]
(Embodiment 3: Another example of processing related to a single device process)
According to the analysis process “(1) Device history + time data analysis setting” of the data analysis means 33 described in the first embodiment,
(1) When lots are processed in the first-in first-out manner as a whole manufacturing process, the lot processing order is basically the same in any process, and it is sufficient that the processing time of each process is one explanatory variable.
{Circle around (2)} Since the one-device process is a process in which a difference between devices cannot be confirmed, a problematic factor is obtained from the change with time of the one-device.
(3) Furthermore, as the number of devices increases, a significant difference between the two-part sets is more likely to appear in the regression tree analysis.
[0095]
By applying these three matters (1) to (3), in this embodiment, all process processing times including one machine process can be represented by one explanatory variable, and the variation factors depending on the processing time are as follows. It is determined that one device process is more or less suspicious than other multiple device processes and automatically narrows down suspicious items as problem factors.
[0096]
In the third embodiment, when the entire manufacturing process processes a lot in a first-in first-out manner, the data cleansing / characterizing means 32 described in the first embodiment “(2) Device name change setting and processing based on change over time” Unlike the processes described in “(3) Item name change in one machine process”, the process related to the one machine process is performed without changing the explanatory variable name and the explanatory variable value name. As a result, the processing can be simplified as compared with the first embodiment, and the adverse effect due to the outlier of the objective variable when the number of data items, that is, the number of records (number of lots) is small, is suppressed.
[0097]
FIG. 33 is a flowchart showing a data analysis procedure according to the third embodiment. The processing contents will be described. For analysis data 1 (DATA1) after executing “(1) abnormal value processing condition setting and processing” described in the first embodiment,
[0098]
(1) One machine process extraction (list creation)
{Circle around (2)} As a representative item, a process located in the middle of the process order is selected from all the one apparatus processes, and the value is a time (interval scale). The item name is “one machine process time”.
(3) “One device process time” is added to the explanatory variable of the analysis data, and all other process time items are excluded from the explanatory variable to obtain analysis data 2 (DATA2) (step S20). . FIG. 34 is a chart showing an example of the extracted single device process list REP2.
[0099]
Thereafter, for analysis data 2 (DATA2), “(4) Unnecessary item deletion setting and processing, (5) Item deletion and record deletion setting and processing based on abnormal value ratio” described in the first embodiment, ( 6) “Analysis condition setting and processing of time data”, data analysis processing by the data analysis means 33 (regression tree analysis execution), analysis result evaluation by the analysis result evaluation means 34 (step S21), and analysis end condition If satisfied (step S22: Yes), a report report including a single device process list is created, and these report report REP1 and single device process list REP2 are output (step S23).
[0100]
Next, an analysis process for the analysis data 2 (DATA2) will be described. FIG. 35 is a chart showing an example of analysis data 2 (DATA2) used in the third embodiment. As shown in the figure, the devices used in step 5 are all V501, and this step 5 is a single device step. FIG. 36 is a diagram showing the yield for each lot number shown in FIG.
[0101]
FIG. 37 is a diagram showing a regression tree and a list of result evaluation information in the case where the used device name and processing time of each step shown in FIG. 35 are explanatory variables and the yield value is an objective variable. As shown in FIG. 37, the top three items of Evaluation Data indicating candidates for the set branch at the highest level are entangled, and one of the steps 5_t is the processing time of step 5. The use device process 5_e of process 5 is deleted from the explanatory variable when the regression tree analysis is executed because the number of types is 1.
[0102]
Then, the processing time corresponding to the process (here, process 1, process 2, process 3, process 4, process 5) in which the time variation is effective with respect to the objective variable among those listed in Evaluation Data. Are extracted, and items having the number of types are extracted from the items indicating the processing device of each item.
[0103]
As described above, according to the third embodiment, it is possible to extract that a significant difference due to a change with time represented by one device process time is large. In addition, the description of the single device process list REP2 makes it easy to grasp the process corresponding to the single device process. The single device process in the example shown in FIG. 37 is “process 5 and this one process”.
[0104]
The method related to the data analysis processing described above can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. This program is recorded on various recording media and executed by being read from the recording media by a computer. The program may be a transmission medium that can be distributed via a network such as the Internet.
[0105]
(Appendix 1) A data extraction process for extracting data necessary for desired data analysis from the original data;
A data cleansing step of performing data cleansing of an abnormal value of the data extracted by the data extraction step;
A characterization step for obtaining feature information of data cleansed by the data cleansing step;
A data analysis step of analyzing data using the feature information obtained by the characterization step;
A data analysis method comprising:
[0106]
(Supplementary note 2) In a data analysis method for performing data analysis of process data including an objective variable indicating quality fluctuation of a production process and an explanatory variable explaining fluctuation of the objective variable,
A data extraction step for extracting data necessary for the desired data analysis from the process data;
A data cleansing step of performing data cleansing of abnormal values of explanatory variables of the data extracted by the data extraction step;
A characterization step for obtaining feature information representing a change in an objective variable of data cleansed by the data cleansing step;
A data analysis step for performing data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization step;
A data analysis method comprising:
[0107]
(Appendix 3) In the data extraction step, an additional character string for identifying a category is added to the item name of the explanatory variable of the data,
The data analysis method according to appendix 2, wherein the data analysis step performs data analysis in which a category of an explanatory variable is identified based on the additional character string.
[0108]
(Supplementary Note 4) In the data extraction process, an additional character string indicating the category of the apparatus is added to the manufacturing process name for the item name of the explanatory variable indicating the production apparatus included in the production process, and the production apparatus produces 4. The data analysis method according to appendix 3, wherein an additional character string indicating a time category is added to a manufacturing process name for an item name of an explanatory variable indicating a processing time at which the target is produced.
[0109]
(Supplementary note 5) including a setting step of setting an instruction for adding an additional character string in the data extraction step and a data analysis instruction for identifying a category in the data analysis step in a setting file in advance,
The data analysis according to

appendix

3 or 4, wherein the data extraction step and the data analysis step read the setting file at the time of each processing execution and execute processing based on an instruction set in the setting file Method.
[0110]
(Additional remark 6) The said characterization process calculates | requires the characteristic regarding the fluctuation | variation of the objective variable with progress of time, and is a predetermined symbol corresponding to the calculated | required characteristic with respect to the apparatus name of the said production apparatus used as an explanatory variable of the said data. The data analysis method according to any one of appendices 2 to 5, characterized by adding:
[0111]
(Appendix 7) When only one production device is provided in one production process,
In the data cleansing step, the explanatory variable of the device name corresponding to the one production device is excluded from the data analysis target, and only the one production device is included in the item name of the explanatory variable corresponding to the processing time of the production step. Add an additional character string indicating that the process is configured by
In the data analysis step, data analysis using an explanatory variable of processing time is performed for a step configured only by the one production apparatus to which an additional character string is added in the data cleansing step. The data analysis method according to any one of appendices 2 to 6.
[0112]
(Additional remark 8) The said data cleansing process deletes data other than the data relevant to a production apparatus among the said process data based on the item name of an explanatory variable, In any one of Additional remarks 2-7 characterized by the above-mentioned. The data analysis method described.
[0113]
(Additional remark 9) The said data cleansing process deletes the item and record which exceeded the ratio of the predetermined | prescribed abnormal value with respect to the item and record among the said process data, and the value whose objective variable value is missing or abnormal The data analysis method according to any one of appendices 2 to 8, characterized in that:
[0114]
(Supplementary Note 10) The data cleansing process is configured such that when the period is divided using a setting in which the explanatory variable of the production apparatus treats data indicating the processing time as processing time data, and a predetermined cycle in the production process is used. The data analysis method according to any one of appendices 2 to 9, wherein a setting to be handled as data can be selected.
[0115]
(Additional remark 11) When all the said production processes process a lot for production sequentially by first-in first-out,
Any one of Supplementary notes 2 to 10, wherein the data analysis step uses apparatus items of all production processes and processing times that are candidates for the top number N in all production processes as explanatory variables to be analyzed. The data analysis method described in 1.
[0116]
(Supplementary Note 12) When the production process performs independent processing without first-in first-out the production target lot,
The data analysis step is an independent time determination step for determining whether or not each processing time is an independent processing time for each production step,
A representative time item creating step of creating one representative time item by summarizing the processing times of the production process determined not to be independent by the independent time discriminating step;
Any one of appendices 2 to 10, wherein items of production devices in all production steps and items of representative time created by the representative time item creation step are used as explanatory variables for data analysis. Data analysis method.
[0117]
(Supplementary note 13) The data analysis method according to any one of supplementary notes 2 to 12, wherein the data analysis step includes extracting a rule representing a characteristic or regularity of data to be analyzed by a data mining technique. .
[0118]
(Supplementary note 14) The method according to Supplementary notes 2 to 13, further comprising an evaluation step of obtaining a predetermined comprehensive evaluation value using a plurality of analysis results obtained by changing the lot or processing time of the data in the data analysis step. The data analysis method according to any one of the above.
[0119]
(Additional remark 15) The said evaluation process calculates | requires the set division | segmentation evaluation value showing the clarity of the division | segmentation at the time of dividing the data set obtained by analyzing the said data analysis process into two as information showing the reliability of the said rule 15. The data analysis method according to appendix 14, wherein
[0120]
(Additional remark 16) The said analysis process uses the value of t represented by the following formula | equation as said set division | segmentation evaluation value, The data analysis method of Additional remark 15 characterized by the above-mentioned.
[Equation 3]

[0121]
(Supplementary Note 17) Based on the evaluation result obtained by the evaluation step, a report output step of outputting a report report narrowing down any of the production step, the production apparatus, or the processing time, which is a problem of the production facility, The data analysis method according to appendix 16, wherein the data analysis method is included.
[0122]
(Appendix 18) When all of the production processes sequentially process the production target lots in a first-in first-out manner, and only one production device is provided in one production process,
The data analysis method according to any one of appendices 2 to 17, wherein the characterization step creates a list of manufacturing steps including the one production apparatus.
[0123]
(Supplementary note 19) In a data analysis apparatus that performs data analysis of process data including an objective variable that indicates a variation in quality of a production process and an explanatory variable that describes the variation of the objective variable.
Data extraction means for extracting data necessary for desired data analysis from the process data;
Data cleansing means for data cleansing abnormal values of explanatory variables of the data extracted by the data extraction means;
Characterization means for obtaining feature information representing a change in an objective variable of data cleansed by the data cleansing means;
Data analysis means for performing data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization means;
A data analysis apparatus comprising:
[0124]
(Supplementary Note 20) An explanatory variable conversion unit that adds an additional character string indicating a category of the data item to the explanatory variable of the data cleansed by the data cleansing unit,
The data analysis apparatus according to appendix 19, wherein the data analysis means recognizes a category of the data item based on the additional character string and executes an analysis process for each category.
[0125]
(Supplementary note 21) The explanatory variable conversion means extracts a list of explanatory variables and objective variables included in the data, and can manually select explanatory variables and objective variables used for data analysis. Data analysis device.
[0126]
(Supplementary note 22) A data analysis program for performing data analysis of process data including an objective variable indicating a variation in quality of a production process and an explanatory variable for explaining the variation of the objective variable. ,
The data necessary for the desired data analysis is extracted from the process data,
Data cleansing outliers of the explanatory variables of the extracted data,
Obtaining characteristic information representing a change in an objective variable of the data cleansed data;
Data analysis for searching for a variation factor of the objective variable is performed using the feature information,
A data analysis program for performing a predetermined evaluation on an analysis result obtained by the data analysis.
[0127]
【The invention's effect】
According to the present invention, data to be analyzed is appropriately extracted, data cleansing and characterization is performed, and data analysis is performed. In particular, an additional character string corresponding to a data category is added to an explanatory variable of data. By doing so, it becomes possible to automatically execute a desired data analysis process by clarifying the relationship between the type of data and the data category, and there is an effect that it can be efficiently performed while saving resources. In addition, even when manually setting data analysis conditions, an operational error can be prevented, and desired analysis processing and analysis results can be obtained.
[Brief description of the drawings]
FIG. 1 is a diagram showing a hardware configuration of a computer system used in a data analysis apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining the flow of process data;
3 is a functional block diagram of a data analysis apparatus realized by the system configuration shown in FIG.
FIG. 4 is a flowchart showing an outline of a data processing procedure in the data analysis apparatus of the present invention.
FIG. 5 is a chart showing a part of the contents of analysis data 1 (DATA1).
FIG. 6 is a chart showing a part of the content of analysis data 2 (DATA2) after data cleansing.
FIG. 7 is a chart showing trend marks given to device names.
FIG. 8 is a table for explaining the contents of evaluation processing based on analysis result data (DATA3).
FIG. 9 is a diagram showing an example of the contents of a report report.
FIG. 10 is a flowchart showing a processing procedure for data analysis processing;
FIG. 11 is a chart showing a part of a data cleansing setting file RF3.
FIG. 12 is a chart showing a part of an item setting file RF2.
FIG. 13 is a chart showing a part of a data cleansing setting file RF3.
FIG. 14 is a chart showing a part of an analysis setting file RF5 and a search setting file RF6.
FIG. 15 is a diagram showing a selection screen for analysis processing according to the second embodiment of the present invention.
FIG. 16 is a chart for explaining the contents of input data.
FIG. 17 is a chart showing an example of analysis data 2 (DATA2).
FIG. 18 is a diagram showing a trend graph as an example of an analysis processing result.
FIG. 19 is a diagram showing a screen when setting items during manual analysis;
FIG. 20 is a diagram showing a screen after setting items during manual analysis.
FIG. 21 is a diagram showing a list of regression trees and result evaluation information obtained as a result of data analysis by designating devices used in all processes and processing times;
FIG. 22 is a diagram showing an item setting screen at the time of device history data analysis.
FIG. 23 is a diagram showing a regression tree as a result of data analysis by designating all process histories and a list of result evaluation information.
FIG. 24 is a diagram showing an item setting screen when analyzing device history data.
FIG. 25 is a diagram showing a regression tree as a result of performing data analysis by designating history and time data of all processes and a list of result evaluation information.
FIG. 26 is a diagram illustrating a regression tree and a result evaluation information list obtained as a result of performing data analysis again by limiting processes.
FIG. 27 is a diagram showing an item setting screen when analyzing device history + first candidate time data.
FIG. 28 is a diagram showing a regression tree and a list of result evaluation information obtained as a result of performing data analysis again by limiting processes.
FIG. 29 is a diagram showing a defect rate trend graph in a certain lot unit.
FIG. 30 is a diagram showing a decision tree in which explanatory variables and objective variables are specified.
FIG. 31 is a diagram showing a decision tree further specifying an explanatory variable and an objective variable.
32 is a diagram illustrating a state in which the result of FIG. 31 is represented by a trend graph for each processing time.
FIG. 33 is a flowchart showing a data analysis procedure according to the third embodiment of the present invention.
FIG. 34 is a chart showing an example of an extracted single device process list REP2.
35 is a chart showing an example of analysis data 2 (DATA 2) used in Embodiment 3. FIG.
FIG. 36 is a diagram showing a yield for each lot number shown in FIG.
FIG. 37 is a diagram showing a regression tree and a result evaluation information list in the case where the used device name and processing time of each step shown in FIG. 35 are explanatory variables and the yield value is an objective variable.
FIG. 38 is a flowchart showing a procedure of general data analysis processing.
[Explanation of symbols]
1 Input device
2 Central processing unit
3 Output device
4 storage devices
10a-10n process equipment
11 Management server
DB1 Manufacturing information database
30 Data analyzer
31 Data extraction means
32 Data cleansing / characterizing means
33 Data analysis means
34 Analysis result evaluation means
35 Report report output means
40 display items
A1 Comprehensive judgment content
A2 statistical information
A3 regression tree diagram
A4 box map
A5 correlation diagram
DATA1 Analysis data 1
DATA2 Analysis data 2
K1 Numeric item as objective variable
K2 Character item used as explanatory variable
L1 item list
RF configuration file
RF1 extraction condition setting file
RF2 item setting file
RF3 data cleansing configuration file
RF4 characterization settings file
RF5 analysis setting file
RF6 search setting file
RF7 report condition setting file
REP1 report
REP2 Single machine process list
S1 Objective variable list
S2 explanatory variable list

Claims

生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析方法において、
記憶装置に記憶された前記プロセスデータが入力される入力工程と、
前記入力工程によって入力されたプロセスデータに含まれる各項目のうちの前記目的変数および前記説明変数の指定を受け付ける受付工程と、
データ抽出手段によって、前記入力工程によって入力されたプロセスデータの中から、前記受付工程によって指定された目的変数および説明変数を抽出するデータ抽出工程と、
データクレンジング／特徴化手段によって、前記データ抽出工程により抽出されたデータの説明変数の異常値を特定値に置き換えることによりデータクレンジングするデータクレンジング工程と、
前記データクレンジング／特徴化手段によって、前記データクレンジング工程によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化工程と、
データ解析手段によって、前記特徴化工程により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析工程と、
解析結果評価手段によって、前記データ解析工程により解析して得たデータの集合を２分割する際の分割の明確度を表す集合分割評価値を求める評価工程と、
前記評価工程により得られた評価結果に基づく報告レポートを出力するレポート出力工程と、を含み、
ある一つの生産工程に生産装置が一台のみ設けられる場合、
前記データクレンジング工程は、前記一台の生産装置に相当する装置名の説明変数をデータ解析対象から外すとともに、前記生産工程の処理時刻に相当する説明変数の項目名に前記一台の生産装置のみによって構成された工程であることを示す付加文字列を付加し、
前記特徴化工程は、前記一台の生産装置の目的変数の変動を表す特徴情報を求め、
前記データ解析工程は、前記データクレンジング工程にて付加文字列が付加された前記一台の生産装置のみによって構成された工程に対しては処理時刻の説明変数を用いたデータ解析を行うことを特徴とするデータ解析方法。In a data analysis method for performing data analysis of process data including an objective variable indicating quality fluctuation of a production process and an explanatory variable explaining fluctuation of the objective variable,
An input step in which the process data stored in the storage device is input;
An accepting step for accepting designation of the objective variable and the explanatory variable among the items included in the process data input by the input step;
A data extraction step for extracting the objective variable and the explanatory variable specified by the reception step from the process data input by the input step by the data extraction means;
A data cleansing step of performing data cleansing by replacing an abnormal value of the explanatory variable of the data extracted by the data extraction step with a specific value by the data cleansing / characterizing means;
A characterization step for obtaining characteristic information representing a change in an objective variable of data cleansed by the data cleansing step by the data cleansing / characterizing means;
A data analysis step for performing a data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization step by a data analysis unit;
An evaluation step for obtaining a set division evaluation value representing the degree of clarity of division when the set of data obtained by analyzing the data analysis step is divided into two by the analysis result evaluation means;
A report output step of outputting a report report based on the evaluation result obtained by the evaluation step ,
When only one production device is installed in one production process,
In the data cleansing step, the explanatory variable of the device name corresponding to the one production device is excluded from the data analysis target, and only the one production device is included in the item name of the explanatory variable corresponding to the processing time of the production step. Add an additional character string indicating that the process is configured by
The characterization step obtains feature information representing a change in an objective variable of the one production apparatus ,
In the data analysis step, data analysis using an explanatory variable of processing time is performed for a step configured only by the one production apparatus to which an additional character string is added in the data cleansing step. Data analysis method.

前記データ抽出工程は、前記データの説明変数の項目名にカテゴリを識別するための付加文字列を付加し、
前記データ解析工程は、前記付加文字列に基づき説明変数のカテゴリを識別したデータ解析を行うことを特徴とする請求項１に記載のデータ解析方法。The data extraction step adds an additional character string for identifying a category to the item name of the explanatory variable of the data,
The data analysis method according to claim 1, wherein the data analysis step performs data analysis in which an explanatory variable category is identified based on the additional character string.

前記特徴化工程は、時刻経過による目的変数の変動に関する特徴を求め、前記データの説明変数として用いられる前記生産装置の装置名に対し、前記求められた特徴に対応する所定の記号を付加することを特徴とする請求項１または２に記載のデータ解析方法。 In the characterization step, a feature relating to a change in an objective variable over time is obtained, and a predetermined symbol corresponding to the obtained feature is added to a device name of the production device used as an explanatory variable of the data The data analysis method according to claim 1 or 2.

前記データクレンジング工程は、前記プロセスデータのうち生産装置に関係するデータ以外のデータを説明変数の項目名に基づいて削除することを特徴とする請求項１〜３のいずれか一つに記載のデータ解析方法。The data according to any one of claims 1 to 3, wherein the data cleansing step deletes data other than data related to a production apparatus from the process data based on an item name of an explanatory variable. analysis method.

前記データクレンジング工程は、前記プロセスデータのうち項目及びレコードに対して所定の異常値の割合を超えた項目及びレコードと、目的変数の値が欠損あるいは異常なレコードとを削除することを特徴とする請求項１〜４のいずれか一つに記載のデータ解析方法。The data cleansing step deletes items and records that exceed a predetermined abnormal value ratio with respect to the items and records in the process data, and records in which the value of the objective variable is missing or abnormal. The data analysis method as described in any one of Claims 1-4.

前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理する場合、When all the production processes sequentially process the production target lots in a first-in first-out manner,
前記データ解析工程は、全ての生産工程の装置項目と、全ての生産工程における上位数Ｎの候補となる処理時刻を解析対象の説明変数として用いることを特徴とする請求項１〜５のいずれか一つに記載のデータ解析方法。The data analysis process uses apparatus items of all production processes and processing times that are candidates for the top number N in all production processes as explanatory variables to be analyzed. The data analysis method according to one.

前記生産工程が生産対象のロットを先入れ先出しせずに独立処理する場合、When the production process independently processes the production target lot without first-in first-out,
前記データ解析工程は、各生産工程の処理時刻についてそれぞれが独立した処理時刻でThe data analysis process is an independent processing time for the processing time of each production process. あるか否かを判別する独立時刻判別工程と、An independent time discriminating step for discriminating whether or not there is,
前記独立時刻判別工程によって独立していないと判別された生産工程の処理時刻をまとめて一つの代表時刻の項目を作成する代表時刻項目作成工程と、A representative time item creating step of creating one representative time item by summarizing the processing times of the production process determined not to be independent by the independent time discriminating step;
全ての生産工程の生産装置の項目と、前記代表時刻項目作成工程により作成された代表時刻の項目をデータ解析対象の説明変数として用いることを特徴とする請求項１〜６のいずれか一つに記載のデータ解析方法。The item of the production apparatus of all production processes and the item of the representative time created by the representative time item creation process are used as explanatory variables for data analysis. The data analysis method described.