JP2004029971A

JP2004029971A - Data analyzing method

Info

Publication number: JP2004029971A
Application number: JP2002182064A
Authority: JP
Inventors: Eidai Shirai; 白井　英大; Hidetaka Tsuda; 津田　英隆
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-06-21
Filing date: 2002-06-21
Publication date: 2004-01-29
Anticipated expiration: 2022-06-21
Also published as: JP4275359B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data analyzing method for easily extracting only data necessary for desired data analysis among the data of a production facility or processing time in a production process, and acquiring the analytic result effective for improving yield. <P>SOLUTION: A data extracting means 31 adds an additional character string for identifying the category of a data item to the explanation variables of data, and a data cleansing/characterizing means 32 executes data cleansing to replace or erase the abnormal value of data, and acquires characteristic information based on the fluctuation of the target variables of data. A data analyzing means 33 recognizes the category of data in analysis processing, and efficiently and automatically executes data analysis by condition setting and analytic procedure corresponding to the category. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
この発明は、広く産業界で取り扱われるデータ間の関連を把握し、産業上優位な結果をもたらすための有意性のある結果を抽出するデータ解析方法に関する。
【０００２】
【従来の技術】
例えば、半導体製造工程において歩留りを向上させるため、製造段階で使用された装置の履歴、試験結果、設計情報、各種測定データ等に基づいて歩留りを低下させている要因をできるだけ速やかに見つけ出す作業が行われる。このためには、実際に物理解析を行うよりも事前に収集されたデータに基づいた統計解析を行っておくのが、経済性の面からも優れており、この統計解析を効率的に行うことが重要である。
【０００３】
本願発明者等は、先にこのようなデータを統計解析する装置および方法として特願２０００−２８４５７８号（特開２００１−３０６９９９号公報）を出願している。統計的データ解析により有意差の抽出を行う場合、どのようなデータをどのような解析手法で解析するかは、解析者の持っている経験、技術等により決定される。この場合、一度の解析結果で意思決定がなされるのは稀であり、一般に各解析結果を解釈した後に、次になすべき解析条件（データや解析手法等）が検討、決定されたうえで、解析処理がなされる。
【０００４】
図３８は、一般的なデータ解析処理の手順を示すフローチャートである。決定された解析条件の設定を行い（ステップＳ５０）、設定された解析条件に基づきデータの解析を実行し（ステップＳ５１）、得られた解析結果を解釈し（ステップＳ５２）、意思決定を行う（ステップＳ５３）。意志決定されれば（ステップＳ５３：Ｙｅｓ）、統計解析を終了し、今回の解析結果で意思決定できない場合には（ステップＳ５３：Ｎｏ）、解析条件の変更を行い（ステップＳ５４）、変更した解析条件に基づきデータの解析を実行する。
【０００５】
【発明が解決しようとする課題】
上記ステップＳ５４において変更される解析条件は、解析者によりその説明変数、目的変数、処理終了条件等が設定されたうえで実行プログラムに入力される。従って、この入力の際に操作ミスや、実行結果待ち等が生じるため解析効率が低下していた。ここで指定される解析条件は、ある程度パターン化できるものが多いが、解析結果を得た後でなければ最終的な解析条件を指定できない場合が多い。これらは解析自動化を阻害する要因の一つとなっている。特に、一回の解析に時間を要し、多くのパラメータを扱うデータマイニングのような処理に顕著に現れる。
【０００６】
データ解析を効率的に行うためには、解析者がどのような手順で何を解析したいか、その説明変数、目的変数を何にすべきかを常に意識して進める必要がある。一般には、各解析者が各解析ケース毎に説明変数のカテゴリを認識しておこなっているが、従来は入力されたデータがどのようなカテゴリのものであるか判別せずに処理結果を出していた。
【０００７】
特に、レコード数が少ないにもかかわらず、説明変数の数が多い半導体製造に係るプロセスデータについては、説明変数が複雑に絡み合い、解析対象、解析目的に合った処理手順、説明変数の選択を適切に行わないと、効率的にデータ解析を進めることができなかった。特に、時刻データはプロセスデータ解析において重要な役割を果たしており大量に取得されている。しかし、説明変数が増えすぎたり、交絡しやすくなる（独立でなくなる）ため、統計的有意差の抽出をより困難にしている。対応して計算時間をはじめとする計算機資源も多く必要となっていた。
【０００８】
この発明は、上記問題点に鑑みてなされたものであって、生産工程における生産装置や処理時刻のデータの中から所望するデータ解析に必要なデータのみを容易に抽出でき、歩留り向上に有効な解析結果を効率よく得ることができるデータ解析方法を提供することを目的とする。また、レコード数が少なく説明変数が多いデータのデータ解析を自動的に実行できるデータ解析方法の提供を本発明の目的に含めることができる。
【０００９】
【課題を解決するための手段】
上記目的を達成するため、本発明は、生産装置名や、処理時刻等のデータからデータ解析に必要なデータを抽出し、歩留りを低下させる問題の生産装置名や処理時刻及び出来映えや歩留り等のデータを絞り込みデータ解析を実行する。このデータ解析の実行に際して、データの説明変数に対しデータ項目のカテゴリを識別する付加文字列を付加することにより、解析処理時にカテゴリを認識できカテゴリに対応した解析手順を自動実行できる。この際、データの目的変数、説明変数を選択及び削除して所望する解析結果を得る。また、所望するデータ解析に不要なデータを説明変数の項目名によって削除し、異常値を有するレコードや項目を削除して必要なデータのみを用いたデータ解析を可能にする。
【００１０】
この発明によれば、一通りの解析結果を実行した後に解析条件を指定するといった従来のような手間のかかるデータ解析を行わずとも、解析対象となるデータの説明変数に付加された付加文字列によってデータ項目のカテゴリを認識して必要なデータ解析処理を自動的に順次実行していくことができ、かつ解析結果の信頼性を向上できるようになる。そして、レコード数が少なく説明変数の数が多い半導体製造に係るプロセスデータのように、説明変数が複雑に絡み合うものであっても、解析対象、解析目的に適合した説明変数及び処理手順を選択でき、データ解析を効率的に遂行できるようになる。
【００１１】
【発明の実施の形態】
（実施の形態１：データの自動解析）
以下に添付図面を参照して、この発明に係るデータ解析装置およびデータ解析方法の好適な実施の形態を詳細に説明する。この発明の実施の形態で扱う解析対象のオリジナルデータは、半導体製造に係るプロセスデータを例とし、このプロセスデータは時間変動を有しているものとする。このプロセスデータの解析を効率的に行うには、特に時間変動を示す時刻に関する項目が重要となる。以下に説明する各実施の形態では、歩留り要因等の解析のために各製造工程に配置された生産装置（工程装置）とその処理時刻を用い、低歩留り要因をデータマイニング技法（回帰木分析、決定木分析）の使用によって抽出し、さらに解析効率を向上させようとするものである。
【００１２】
図１は、本発明の実施の形態に係るデータ解析装置に用いられる計算機システムのハードウェア構成を示す図である。このデータ解析装置は、キーボード等の装置操作用の操作手段およびネットワーク等を介してデータを入力するための入力装置１、入力されたデータに対し後述する解析処理を実行するＣＰＵ等を備えた中央処理装置２、ＣＲＴ，ＬＣＤ等の表示手段やプリンタ等の印字手段からなる出力装置３、およびＨＤＤ等のデータを格納保持する記憶装置４によって構成される。
【００１３】
図２は、プロセスデータの流れを説明するための図である。半導体等の被製造対象の製造工程には、複数（Ｎ）の工程装置１０ａ〜１０ｎが配置される。各工程装置１０ａ〜１０ｎは、それぞれの製造工程におけるプロセスデータを管理サーバ１１に送出する。このプロセスデータは、各工程において製造対象を製造した処理時刻、製造に関わった使用装置の名称、歩留り等からなる。管理サーバ１１は、入力されたプロセスデータに基づき、製造情報データベースＤＢ１を作成する。この製造情報データベースＤＢ１は、図１に示す記憶装置４に対しネットワーク等を介して格納される。
【００１４】
図３は、図１に示すシステム構成により実現されるデータ解析装置の機能ブロック図である。このデータ解析装置３０には、図２に示す製造情報データベースＤＢ１に格納された各工程装置のプロセスデータが入力される。このデータ解析装置３０は、データ抽出手段３１，データクレンジング／特徴化手段３２，データ解析手段３３，解析結果評価手段３４，報告レポート出力手段３５を備え、各手段はそれぞれ設定ファイルＲＦ（ＲＦ１〜ＲＦ７）に記述された設定情報に従って処理を実行する。なお、各手段における処理を一連化し自動実行することを自動解析と称す。
【００１５】
このデータ解析装置３０は、回帰木分析等の解析処理プログラムを起動し、必要な入出力ファイル名及び目的変数と説明変数が指定された後、各種解析フロー設定ファイル群に従って自動的に▲１▼データベースからのデータ抽出、▲２▼データのクレンジング及び特徴化、▲３▼回帰木分析、▲４▼解析結果評価を順次行い、▲５▼工程履歴から問題工程と装置と時刻を抽出し、報告レポートＲＥＰ１を出力する。
【００１６】
この際、ロット番号（データ件数即ちレコード数に相当）と各工程の装置名と処理日時のデータに目的変数項目が付加されたデータを抽出及び加工してデータマイニング技法によるデータ解析を行い、自動的に注目すべき工程と装置と時刻を絞り込む。
【００１７】
図４は、本発明のデータ解析装置におけるデータの処理手順の概要を示すフローチャートである。製造情報データベースＤＢ１から抽出された解析対象の解析用データＤＡＴＡは、プログラム初期化ファイルＩＮＩ（後述する各設定ファイルＲＦ１〜ＲＦ７を含む）の設定に従い、解析プログラムの実行によりデータ解析される。ここで、プログラム初期化ファイルＩＮＩの設定に基づき、解析プログラムは、解析用データＤＡＴＡ内のデータ項目のカテゴリを識別する付加文字列＿ｘｘを設定する。この付加文字列＿ｘｘは解析プログラムにより認識される。
【００１８】
そして、解析プログラムで該当する処理モードが指定されると（ステップＳ１）、ｘｘで示すカテゴリ種に対応する処理（目的変数や説明変数等の変数自動選択（ステップＳ２），あるいは変数手動選択（ステップＳ３））の後、回帰木分析等による解析処理が実行される（ステップＳ４）。解析処理は、一回目の解析処理に基づきあらかじめ設定された解析対象、解析手順に合わせて再実行するか否か選択される（ステップＳ５）。再実行時には、実行結果を抽出し（ステップＳ６）、目的変数、説明変数等の変数を自動選択あるいは削除し（ステップＳ７）、ステップＳ４に復帰して次の解析処理を実行して最終的な解析結果を得る。
【００１９】
（データ抽出手段３１について）
データ抽出手段３１は、
（１）データ抽出及び変換の抽出条件設定と処理
（２）データマイニング条件設定と処理
（３）装置名及び時刻設定と処理
をそれぞれ実行する。
【００２０】
（１）データ抽出及び変換の抽出条件設定と処理について
データ抽出及び変換は、抽出条件設定ファイルＲＦ１データ抽出のプログラムに従い、製造情報データベースＤＢ１に貯えられたプロセスデータを、決められた時間に、あるいは定期的に、設定された条件（対象品種、期間、項目）でロット番号と工程名と装置名と処理日時のデータに目的変数項目が付加されたデータを抽出する。
【００２１】
（２）データマイニング条件設定と処理について
項目設定ファイルＲＦ２とマイニング条件設定のプログラムに従い、レコード識別名、説明変数名、目的変数名、説明変数値名を選択し、解析用データ１（ＤＡＴＡ１）を作成し出力する。具体的には、項目設定ファイルＲＦ２を用いて次の内容を設定する。
レコード名：ロット番号
説明変数項目：製造工程（大工程名＋小工程名）
説明変数値名：装置、時刻
目的変数名：ＹＩＥＬＤ、特性値（歩留り）等
【００２２】
（３）装置名及び時刻設定処理について
この解析装置の製造工程履歴解析に必要となる説明変数値として装置名及び時刻を設定する。解析用データ１（ＤＡＴＡ１）は１列目にレコード名、２列目以降に説明変数と目的変数を整列させる。図５は、解析用データ１（ＤＡＴＡ１）の内容の一部を示す図表である。図示のように、説明変数項目名には、説明変数値名にその値の名前が付加され、識別できるようになっている。この例では、製造工程名に「＿装置」又は「＿時刻」が付加され、名前から工程とその値の内容が識別できる。この項目設定で設定された説明変数項目及び説明変数値名により、以降のデータクレンジング／特徴化手段３２及びデータ解析手段３３において製造工程と時刻又は装置を関連付けて識別できる。なお、生産ラインの各工程には複数台の装置（例えば、図示の工程０１＿装置における６ｎｗ２と６ｎｗ４）が配置され並列に動作して生産を行っている状態が示されている。
【００２３】
（データクレンジング／特徴化手段３２について）
データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３，特徴化設定ファイルＲＦ４，およびデータクレンジング／特徴化のプログラムに従って、以下に示すデータクレンジング及び特徴化の処理を実行する。
（１）異常値処理条件設定及び処理
（２）経時変化に基づく装置名の変更設定及び処理
（３）一台装置工程における項目名変更
（４）不要項目削除設定及び処理
（５）異常値割合による項目削除及びレコード削除設定及び処理
（６）時刻データの解析条件設定及び処理
【００２４】
上記（１）〜（６）の各処理を詳細に説明する。
（１）異常値処理条件設定及び処理について
データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３の設定内容に基づき、解析用データ１（ＤＡＴＡ１）の説明変数項目値が欠損している場合には特定値に置き換える。このデータクレンジングの処理内容について説明する。図６は、データクレンジング後の解析用データ２（ＤＡＴＡ２）の内容の一部を示す図表である。図５に示した解析用データ１（ＤＡＴＡ１）の説明変数項目値がＮｕｌｌ（欠損値）である箇所は、図６に示すように特定値（図示の例では欠損する数値は９９９９９、欠損する文字列値はｎｏｐ）に置き換えられる。
【００２５】
そして、データクレンジング／特徴化手段３２は、データクレンジング設定ファイルＲＦ３の設定内容に基づき、この特定値を、値の一つとして解析するか、欠損値として解析するかを設定する。異常判断基準についても、その異常値定義と置換値を設定し、設定に従って異常値を処理する。
【００２６】
（２）経時変化に基づく装置名の変更設定及び処理について
同一装置であっても、何らかのトラブルによってある時期から突然異常な装置に変わることがある。データクレンジング／特徴化手段３２は、この突然の変化による目的変数値の変動を捉えて、異常な状態へ変化した装置に別の名前を付けることにより、さらに問題点を絞り込むことができるようになる。
【００２７】
データクレンジング／特徴化手段３２は、全装置について処理時刻による目的変数推移の特徴をウェーブレット変換等による特徴抽出（フィルタリングによるノイズ除去等）によって確認し、設定された基準に対し特徴の強かった装置については急上昇と急降下時期及び期間により特徴付けを行う。例えば、プログラム初期化ファイルＩＮＩの設定により急変動の量が全体の標準偏差の０．８倍以上を基準とする。また、装置名は、その推移特徴情報を付加した装置名へと変換する。この実施の形態では推移特徴情報を次のように装置名に付加する。
【００２８】
▲１▼区切り文字（＠）を付ける。
▲２▼その装置に対応する目的変数値が急上昇又は急降下した時期によって期間を最大で３つに分け、期間を表す記号（Ｆは前半、Ｍは中盤、Ｌは後半、記号無しは全体）を付ける。
▲３▼その装置の推移の形状を表す記号としてトレンドマークをつける。
▲４▼推移特徴の強さを表す記号として１桁の数字（０：弱い〜９：強い）を付ける。
【００２９】
図７は、装置名に付与されるトレンドマークを示す図表である。図示のように、トレンドマークは、図６に示す解析用データ２（ＤＡＴＡ２）全体を前半、中盤、後半の３つに分け、各期間別の目的変数値の推移の状態を示すものであり、前記ウェーブレット変換等の特徴抽出によって得られた状態が図示のように１（−＾：前半が低く、後半が高い）〜５（−：特徴なし）までに分けられた設定となっている。
【００３０】
推移特徴情報を付加した装置名の例を、図６に示す工程０１＿装置を用いると、
６ｎｗ２＠Ｆ−＾７
が得られる。上記例は、装置名が６ｎｗ２であり、前半期間（Ｆ）における目的変数値の推移の形状は前半が低く後半が高く、推移特徴が７であることを示す。また、推移特徴がまったく無い場合には、６ｎｗ２＠−０となる。このようにデータクレンジング／特徴化手段３２は、各工程装置それぞれについて、推移特徴情報を付加した装置名を作成する。なお、図６に示した解析用データ２（ＤＡＴＡ２）はデータ全体の一部であるため、実際にはデータ全体を用いて特徴抽出された後の推移特徴情報が付加されることになる。
【００３１】
（３）一台装置工程における項目名変更について
前述したが、半導体をはじめとする各種製造ラインの各工程では複数台の装置が配置され並列に動作させることが多い。しかし、ある一つの工程に一台の装置のみを配置し運用する場合もある。このような構成の工程については「一台装置工程」と定義する。一台装置工程では、複数台の装置同士間での差を確認することができないため、問題となる要因を配置された一台の装置における経時変化に基づき求める。
【００３２】
説明変数となる工程の装置名が１種類の場合、その工程の装置間差は出せないため、説明変数から外す。該当する工程の処理時刻を示すデータ項目名に“＿一台装置”を付加する。図６に示す例では、工程０４＿装置の処理時刻を“工程０４＿一台装置”に変更する。なお、該当する処理時刻データ項目が存在しない場合や、該当する処理時刻データ項目も１種類の値の場合は変更しない。この付加文字によって解析時に一台装置工程の時刻データ項目であることが認識される。
【００３３】
（４）不要項目削除設定及び処理について
データを取得する段階で解析に不要となる説明変数項目が混じることがあり、この場合、解析前に不要項目を削除する設定を行う。この例では、初期設定として、直接製品加工を行わない検査工程等を除くために、説明変数項目名に含まれる検査工程名等の文字列を複数設定する。設定に基づいて不要項目を削除する。
【００３４】
（５）異常値割合による項目削除及びレコード削除設定及び処理について
欠損値及び定義された異常値の割合が初期設定値を超えた項目とレコードを削除する。例えば、欠損値及び定義された異常値の割合が６０％以上の項目を削除する。欠損値及び定義された異常値の割合が７０％以上のレコードを削除する。目的変数値が欠損又は異常であるレコードは削除する。なお、説明変数のうち名義尺度の項目は値の種類が１又は１００以上の項目は解析対象としない。
【００３５】
（６）時刻データの解析条件設定及び処理について
装置は、初期設定によって説明変数における時刻データは順序尺度（時刻単位の処理対象）として扱う。また、第２の設定として、製造工程に存在する周期を用いて期間を区切り、期間を表す名前に加工した名義尺度（名前単位の処理対象）としても扱う。例えば、製造作業者の交代周期等がある。
【００３６】
（データ解析手段３３の解析処理について）
データ解析手段３３は、解析設定ファイルＲＦ５及び解析処理プログラムの内容に従って回帰木分析による下記の解析処理を実行する。
【００３７】
（１）装置履歴＋時刻データ解析の設定
製造工程の処理形態に応じて以下のように異なる解析処理を実行する。
▲１▼製造工程全体が先入れ先出しでロットを処理する場合
基本的にどの工程においてもロットの処理順番が同じとなるため、各工程の処理時刻は説明変数としては一つあれば十分であり、全ての装置項目と第１候補の時刻を説明変数とした解析を実行する。多少のロット処理順番の入れ替わりがある場合は、時刻データの数を増やし、「装置履歴＋上位Ｎ候補時刻データ解析」を実行する。Ｎは１〜２０が適正な範囲である。
【００３８】
▲２▼製造工程全体が先入れ先出しでロットを処理しない場合
時刻データの独立性の検定手法を用いて、互いに独立でない工程の時刻データはまとめて一つの代表時刻項目にし、独立な時刻データのみの代表時刻項目群に絞り込む（時刻データの絞り込み）。その後、全ての装置名項目と絞り込んだ代表時刻項目群を説明変数とした「装置履歴＋時刻データ解析」を実行する。
【００３９】
（２）解析終了条件設定
回帰木分析終了条件は、例えば、分割集合の標準偏差が全体の０．５倍以下になった時として設定される。
【００４０】
（３）解析の実行
データ解析手段３３は、解析設定ファイルＲＦ５及び解析処理プログラムの内容に従って解析を実行する。目的変数が複数設定されている場合は、設定された複数の項目を順次選択して解析する。
【００４１】
▲１▼「装置履歴＋上位Ｎ候補時刻データ解析」の処理内容
１．装置名項目及び一台装置の処理時刻項目だけを説明変数として、指定の目的変数について回帰木分析を行う。
２．次にその解析結果で候補となった上位Ｎ個の工程の時刻データを説明変数に追加して再度回帰木分析を行う。なお、第ｋ候補に挙がった項目の時刻データが無かった場合は第ｋ＋１候補以降で時刻データの存在する項目の時刻データを探して追加する。無い場合は無かったことを、あった場合は第何候補であったかを解析結果で明示する（１≦ｋ≦Ｎ）。
【００４２】
▲２▼「装置履歴＋時刻データ解析」の処理内容
全ての装置名項目と絞り込んだ代表時刻項目群を説明変数とした回帰木分析を行う。
【００４３】
（解析結果評価手段３４の解析結果の抽出、評価について）
上述したデータ解析手段３３による自動解析は１度の実行で終わるものではない。解析対象期間又は対象ロットを変化させ、かつ、データクレンジング／特徴化手段３２、及びデータ解析手段３３の各種設定値を初期値から変化させながらくり返し解析を行い解析結果データ（ＤＡＴＡ３）を得る。そして、この解析結果データ（ＤＡＴＡ３）が示す得られた複数の結果を評価し、より信頼できる解析結果を得るようになっている。
【００４４】
ここで回帰木分析及びｔ検定の概要について説明しておく。回帰木分析は、複数の属性を示す説明変数とそれにより影響を受ける目的変数からなるレコードの集合を対象とし、その目的変数に最も影響を与える属性と属性値を判別するものである。解析結果評価手段３４は、データの特徴や規則性を示すルールを出力する。
【００４５】
回帰木分析の処理は、各説明変数（属性）のパラメータ値（属性値）に基づいて集合の２分割を繰り返していくことで実現される。その集合分割の際、分割前の目的変数の平方和をＳ０、分割後の２つの集合のそれぞれの目的変数の平方和をＳ１およびＳ２としたとき、下記式（１）で示すΔＳが最大となるように、分割するレコードの説明変数とそのパラメータ値を求める。
【００４６】
ΔＳ＝Ｓ０−（Ｓ１＋Ｓ２）　・・・（１）
【００４７】
ここで得られる説明変数とそのパラメータ値は、回帰木では分岐点に対応している。以降、分割された集合についても同様な処理を繰り返し、説明変数の目的変数に対する影響を調べる。以上が、一般によく知られている回帰木分析の手法であるが、集合分割の明確さをより詳しく把握するために、複数の上位分割候補に関して、ΔＳの他に以下のパラメータ（ａ）〜（ｄ）も回帰木分析結果の定量的な評価として使用する。
【００４８】
（ａ）Ｓ比：
集合分割による平方和の低減率であり、集合分割により平方和がどの程度低減したかを示すパラメータである。この値が小さいほど集合分割の効果は大きく、集合分割が明確におこなわれているので、有意差が大である。
【００４９】
Ｓ比＝（（Ｓ１＋Ｓ２）／２）／Ｓ０　・・・（２）
【００５０】
（ｂ）ｔ値：
回帰木分析の処理実行により集合が２分割されるが、分割された２つの集合の平均（／Ｘ１，／Ｘ２）の差の検定のための値である。ここで、“／”は上線を示す。統計のｔ検定は、分割された集合における目的変数の平均値の有意差を示す基準となる。自由度、即ちデータ数が同じであるなら、ｔが大きいほど集合が明確に分割されており、有意差が大である。
【００５１】
この際、分割された集合の分散に有意差がない場合には下記式（３）によりｔ値を求め、分割された集合の分散に有意差がある場合には下記式（４）によりｔ値を求める。ここで、Ｎ１およびＮ２は、それぞれ分割した集合１および集合２の要素数である。また、／Ｘ１および／Ｘ２はそれぞれ分割後の各集合の平均である。Ｓ１およびＳ２は、それぞれ分割後の各集合の目的変数の平方和である。
【００５２】
【数１】

【００５３】
【数２】

【００５４】
（ｃ）分割された集合の目的変数の平均値の差：
この値が大きいほど有意差が大である。
【００５５】
（ｄ）分割された各集合のデータ数：
両者の差が小さいほど異常値（ノイズ）による影響が小である。
【００５６】
（１）解析結果の評価
評価は、探索設定ファイルＲＦ６及び解析結果評価プログラムに基づき行う。この評価情報は各解析結果ごとに算出するが、各解析結果間で比較できる。
（２）信頼性の高い解析結果の探索
解析結果評価手段３４は、探索設定ファイルＲＦ６の設定に基づき、解析結果データ（ＤＡＴＡ３）に対し信頼性の高い解析結果の探索の判断を行う。例えば、「回帰木第一分岐のｔ検定値で比較する。ただし、２分割グループの各データ数が設定基準以上であること」を条件とした比較評価値を用いる。この比較評価値によって、より信頼できる解析結果を探索することができる。そして、各設定ファイルＲＦ１〜ＲＦ５の各設定値を変化させる範囲を限定すること又は自動解析時間を限定することにより、探索を終了させ、得られた複数の解析結果と各総合評価値と順位を得て、行った解析の中で最も信頼できる解析結果を抽出する。設定値を変化させる範囲と方法の設定の一部を図１４（ｂ）に示す。この設定に従って、２分割交絡度及び各項目の異常値割合の活用による多角的な分析を実行し、より明確な分析結果の探索を行っている。
【００５７】
図８は、解析結果データ（ＤＡＴＡ３）に基づく評価処理の内容を説明するための図表である。図示のように、以下の評価処理によって、項目名として各工程装置と各評価値（以下に説明するｔ検定値、低いグループと高いグループの装置名及び件数、平均値等）を算出し、ｔ検定の検定値の値が大きい順に問題が大であるとして、順位Ｎｏを付与する。
【００５８】
（報告レポート出力手段３５について）
報告レポート出力手段３５は、最も信頼できる解析結果について報告レポートを作成し、出力する。
（１）報告レポート作成について
解析結果にはファイルに回帰木のルール情報と、第一分岐の所定数候補（例えば上位２０候補）の評価用統計値、及び各候補間の２分割交絡度、並びにレポート情報ファイルとしてＨＴＭＬファイルに回帰木図と主要２分岐及び上位２候補の有意差を簡潔な文章、及び箱髭図又は相関図をデータと関連付けて出力する。
【００５９】
（２）具体的報告方法について
解析結果の報告は、報告条件設定ファイルＲＦ７の設定及び報告処理プログラムに基づき処理される。例えば、報告内容は、画面表示及びあらかじめ設定した電子メールアドレスにアラーム通知し、報告レポートと報告レポートＷＥＢアドレスを報告する。
【００６０】
図９は、報告レポートの内容の一例を示す図である。報告レポートＲＥＰ１は、解析結果に基づき、▲１▼総合判定内容（主要２分岐及び上位２候補の有意差を説明する簡潔な文章）Ａ１，▲２▼統計的情報Ａ２，▲３▼回帰木図Ａ３，▲４▼回帰木図Ａ３に対応する箱髭図Ａ４，又は相関図Ａ５を表示する。回帰木図Ａ３は例えば、ＨＴＭＬ等の記述形式で表示され、所望する工程装置をクリックすることにより、リンクされた該当する▲４▼箱髭図Ａ４あるいは相関図Ａ５を選択的に表示可能となっている。
【００６１】
図１０は、データ解析処理の処理手順を示すフローチャートである。図示のように、データ抽出手段３１は製造情報データベースＤＢ１からデータの抽出及び変換を行い解析用データ（ＤＡＴＡ１）を得る（ステップＳ１１）。次に、データクレンジング／特徴化手段３２は解析用データ（ＤＡＴＡ１）のデータをクレンジング及び特徴化した解析用データ（ＤＡＴＡ２）を得る（ステップＳ１２）。
【００６２】
次に、データ解析手段３３は、クレンジング及び特徴化後の解析用データ（ＤＡＴＡ２）を回帰木分析の手法によりデータマイニングし解析結果データ（ＤＡＴＡ３）を得る（ステップＳ１３）。次に、解析結果評価手段３４は、解析結果データ（ＤＡＴＡ３）を用いて解析結果を評価する（ステップＳ１４）。この評価時、信頼性の高い結果を探索する。例えば、設定した解析終了条件を満たすか否かを判断する（ステップＳ１５）。満たしていない場合には（ステップＳ１５：Ｎｏ）、解析対象期間又は対象ロットを変化させ、かつ、データクレンジング／特徴化手段３２、及びデータ解析手段３３の各種設定値を初期値から変化させながらくり返し解析を行いより信頼できる解析結果データ（ＤＡＴＡ３）を得る。
【００６３】
データ解析手段３３の解析終了条件を満たす解析結果が得られると（ステップＳ１５：Ｙｅｓ）、報告レポート出力手段３５による報告レポートＲＥＰ１を出力し（ステップＳ１６）、データ解析処理を終了する。以上のデータ解析処理は、週単位や月単位で自動実行され、半導体製造工程等の製造段階で使用された装置の履歴、試験結果、設計情報、各種測定データ等のデータ解析によって得られた報告レポートＲＥＰ１によって、歩留りを低下させている要因を容易に見つけ出すことができ、歩留りの向上が図れるようになる。
【００６４】
図１１〜図１４は、上記解析処理を設定する設定ファイルの設定内容の一例を示す図である。図１１は、データクレンジング設定ファイルＲＦ３の一部を示す図表である。解析用データ１（ＤＡＴＡ１）の説明変数項目値の欠損時の置換や特定値への置き換え、項目及びレコードを削除する異常値割合等が設定される。図１２は、項目設定ファイルＲＦ２の一部を示す図表である。日付型項目と装置型項目の識別文字例についてそれぞれ設定される。図１３は、データクレンジング設定ファイルＲＦ３の一部を示す図表である。特定文字列が使われている項目名を削除する場合の検索文字列が設定される。
【００６５】
図１４は、解析設定ファイルＲＦ５と探索設定ファイルＲＦ６の一部を示す図表である。図１４（ａ）は解析設定ファイルＲＦ５の一部であり、時刻データ解析の設定内容及び解析条件の設定内容が示され、前述した「装置履歴＋上位Ｎ候補時刻データ解析」、あるいは「装置履歴＋独立時刻データ解析」のいずれかが設定され、目的変数、説明変数の指定、解析処理の終了条件等が設定される。図１４（ｂ）は探索設定ファイルＲＦ６の一部であり、多角的な分析を行うための設定がなされ、２分割交絡度の基本的活用方法や項目の異常値割合による選別方法等が設定されている。
【００６６】
上記各設定内容は、この解析装置を適用する対象に固有の条件があった場合には、解析フロー設定ファイル群（設定ファイルＲＦ１〜ＲＦ７）に対し必要な設定を追加又は変更することで容易に対応できる。
【００６７】
（実施の形態２：データ解析のモード指定選択処理について）
本発明の実施の形態２は、既にクレンジング済みの解析対象データが得られている場合に、解析処理以降を自動的に行う構成である。データ解析装置３０の構成は実施の形態１と同様であり説明を省略する。なお、実施の形態１で説明したデータ解析手段３３における具体的なデータ解析処理についてこの実施の形態２で補足説明する。
【００６８】
図１５は、本発明の実施の形態２における解析処理の選択画面を示す図である。入出力データそれぞれのファイル名を選択すると図示のように表示項目４０が表示される。表示項目４０は４つの処理解析モードが表示されており、１〜４のいずれかを選択できる。１「装置履歴データ解析」，２「装置履歴＋上位（ｎ）候補時刻データ解析」，３「装置履歴＋第１候補時刻データ解析」，４「Ｍａｎｕａｌ（手動解析）」である。
【００６９】
図１６は、入力データの内容を説明するための図表である。同図は、前述した図５（製造情報データベースＤＢ１）に相当する。入力データは、図１６に示すように、各ロット番号毎に、説明変数とする各工程の使用装置名と処理時刻、目的変数とする歩留り値等からなるＣＳＶ形式のファイルである。
【００７０】
データ解析装置３０のデータ抽出手段３１は、上記の製造情報データベースＤＢ１を取り込み自動的にデータ解析をして注目すべき工程の装置名や処理時刻を絞り込む。図１２を用いて説明した項目設定ファイルＲＦ２では、説明変数の使用装置名は工程名に付加文字列”＿ｅ”、処理時刻には工程名に付加文字列”＿ｔ”を付与する。即ち、工程Ａでの使用装置名、処理時刻は各々”Ａ＿ｅ”、”Ａ＿ｔ”である。
【００７１】
また、日付型項目を示す付加文字列として”＿ｔ”の他に、”＿時刻”、”＿ＤＡＹ＿ＴＩＭＥ”が定義され、後者２つは代表識別文字列である”＿ｔ”に変換されて扱われる。装置項目を示す識別文字列についても同様である。これにより、異なるデータソースから収集されたデータであっても同一のカテゴリデータ種として扱えるようになる。
【００７２】
上記の設定により、解析処理時には、付加文字列”＿ｔ”、”＿時刻”、”＿ＤＡＹ＿ＴＩＭＥ”が付いたものを時刻データとみなし、一方、付加文字列”＿ｅ”、”＿装置”、”＿ＥＱＵＩＰ”が付いたものを装置データとみなし、各処理モードに対応した解析処理を自動実行する。なお、自動実行は、上記４「Ｍａｎｕａｌ（手動解析）」以外のモード選択時に行われる。
【００７３】
また、プログラム初期化ファイルＩＮＩの一部を構成している抽出条件設定ファイルＲＦ１，項目設定ファイルＲＦ２の設定内容に基づき、どのようなカテゴリのデータを説明変数として有しているかを判断する。そして、図１２に例示した項目設定ファイルＲＦ２の設定に基づき、工程の使用装置と処理時刻を有していると判断する。ここで、プログラム初期化ファイルＩＮＩの設定内容と一致する識別文字列がデータ内の項目にない場合は、全てマニュアル解析となる（図４における処理モード指定時ステップＳ１の時期の判断）。
【００７４】
図１７は、解析用データ２（ＤＡＴＡ２）の一例を示す図表である。データ解析手段３３にはこの解析用データに対する回帰木分析を行う。その結果、全８ロットの歩留りに影響を及ぼすのは工程１〜４の使用装置とその処理時刻であるとの解析結果を得る。図１８は、解析処理結果の一例であるトレンドグラフを示す図である。横軸はロット番号、縦軸は歩留りである。同図には、工程２＿ｅを構成する装置Ｖ２０１，Ｖ２０２について、ある期間に処理した４ロット（Ｌｏｔ２〜Ｌｏｔ５）の歩留りが高い状態と、工程２の装置Ｖ２０２で処理したロットの歩留りが低い状態が解析結果として得られる。
【００７５】
次に、上記各処理モード別の解析処理を説明する。
（１）Ｍａｎｕａｌ（手動解析）
図１５に示した解析処理の選択画面で「Ｍａｎｕａｌ（手動解析）」を選択すると、変数選択の項目リストＬ１に必要な項目が表示される。図１９は、手動解析時の項目設定時の画面を示す図である。項目リストＬ１には、目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２が一覧表示されており、必要な数値項目Ｋ１，文字項目Ｋ２の選択と、条件設定を手動で行った後に解析処理を実行する。
【００７６】
図２０は、手動解析時の項目設定後の画面を示す図である。図示の例は、目的変数として選択した歩留りが目的変数リストＳ１に表示される。説明変数としては、全工程の使用装置と処理時刻（工程１＿ｅ〜工程４＿ｅ，工程１＿ｔ〜工程４＿ｔ）を選択した状態が説明変数リストＳ２に表示される。データ解析手段３３は、この設定に基づき解析処理プログラムを実行する。上記データ解析の結果を図２１に示す。図２１は、全工程の使用装置と処理時刻を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００７７】
（２）装置履歴データ解析
図２２は、装置履歴データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴データ解析」を選択すると、項目リストＬ１の文字項目Ｋ２には、各工程での使用装置を示す”＿ｅ”が付加文字列であるものだけが説明変数として自動的に一覧表示される。この後、手動で数値項目Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００７８】
データ解析手段３３は、この設定に基づき説明変数が装置履歴であるものだけを用いて回帰木分析を実行する。上記データ解析の結果を図２３に示す。図２３は、全工程の履歴を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００７９】
図２３によれば、工程２による処理装置差が最も有意（工程２が第一候補）であることが分かる。ただし、これだけでは時系列で見た場合の有意差が不明である。歩留りの低いロットが多い装置Ｖ２０２は、偶然に悪い期間（例えば、プロセス条件の一時的な変更等の使用装置以外の要因による変動があった期間等）に多くのロットを処理していた可能性もある。
【００８０】
（３）装置履歴＋上位候補時刻データ解析
図２４は、装置履歴データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴＋上位候補時刻データ解析」を選択すると、項目リストＬ１に時刻項目以外の全項目（目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２）が自動的に一覧表示される。この後、手動で数値項Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００８１】
データ解析手段３３は、これら設定された目的変数と説明変数に基づき、回帰木分析を実行する。以上の処理は「（２）装置履歴データ解析」と同じ処理を実行しこの状態での結果は図２３と同じである。
【００８２】
次に、データ解析手段３３は、得られたデータ解析結果を自動的に抽出し、回帰木図内および回帰木図の最上階層での集合分割の評価候補であるＥｖａｌｕａｔｉｏｎ　Ｄａｔａに挙げられた工程の処理時刻を新たに説明変数として追加する処理を行い、再度回帰木分析を自動的に実行する。
【００８３】
この際、全４工程の処理時刻が説明変数として追加される。ここで、工程の使用装置名に対応する工程処理時刻の項目は、付加文字列を除いたものが同一であることに基づき抽出する。具体的に説明すると、２回目の回帰木分析では１回目で指定された説明変数に対して、時刻項目工程１＿ｔ，工程２＿ｔ，工程３＿ｔ，工程４＿ｔが追加される。上記データ解析の結果を図２５に示す。図２５は、全工程の履歴と時刻データを指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。なお、この図２５に示す結果は前述した図２１と同じになる。２回目のデータ解析の処理結果は新たに”ＥＱ＿Ｔｉｍｅ”という名称のフォルダに保存される。
【００８４】
上記結果によれば、工程１の時刻による差が最も有意であることが確認できるが、Ｅｖａｌｕａｔｉｏｎ　Ｄａｔａによる上位３候補（工程２，４の処理時刻）の差を確認すると、差がまったく同じで交絡していると想定される。実際にこれらの各工程でロットを処理した時刻の並び（順番）はまったく同じで、次の工程３についてもほとんど同様の推移（トレンド）が得られる。
【００８５】
１回目の回帰木分析で出力されるＥｖａｌｕａｔｉｏｎ　Ｄａｔａを３項目目までとした場合は、図２３に示したＥｖａｌｕａｔｉｏｎ　Ｄａｔａに挙がっている項目のうち工程１＿ｅは除外され、かつ回帰木図内にも存在しない。このため、２回目の回帰木分析では工程１＿ｔは説明変数として追加されないこととなる。図２６は、工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００８６】
（４）装置履歴＋第１候補時刻データ解析
図２７は、装置履歴＋第１候補時刻データ解析時における項目設定画面を示す図である。図１５に示した解析処理の選択画面で「装置履歴＋第１候補時刻データ解析」を選択すると、項目リストＬ１には、目的変数とする数値項目Ｋ１，説明変数とする文字項目Ｋ２には時刻項目以外の全項目が自動的に一覧表示される。この後、手動で数値項目Ｋ１，文字項目Ｋ２に一覧表示された目的変数、説明変数を選択し、目的変数リストＳ１，説明変数リストＳ２に表示させる。
【００８７】
データ解析手段３３は、この設定に基づき説明変数が装置履歴であるデータだけを用いて回帰木分析を実行する。以上の解析処理は「（２）装置履歴データ解析」と同じ解析処理を行う。解析処理の結果は自動的に抽出され、Ｅｖａｌｕａｔｉｏｎ　Ｄａｔａに時刻データとして第一候補に挙がった項目のみを説明変数に追加し、再度回帰木分析を実行するまでを自動的に行う。本実施の形態では工程２の処理時刻が説明変数として追加される。図２８は、工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【００８８】
前述した「（３）装置履歴＋上位候補時刻データ解析」の処理モードで行った解析では交絡している各工程の時刻データ項目がまとまってＥｖａｌｕａｔｉｏｎ　Ｄａｔａの上位に挙げられる。この場合、実際の数百工程の時刻項目を説明変数にすると、回帰木図内を含めて各工程の時刻データ項目だけが出力される。これら各工程の時刻データ項目は交絡している場合が多く、代表になる時刻データは１項目だけで良い場合が多い。そこで、あらかじめ各装置単位のデータを説明変数として歩留りに関する回帰木分析を行い、その結果で第１候補になった工程を用いて、第１候補工程の時刻データのみを時刻データの代表として説明変数に追加し、再度、歩留りに関する回帰木分析を行う。
【００８９】
２回目の解析の処理結果は、新たに”ＥＱ＿ＴｉｍｅＸＸ”（ＸＸ：最初に時刻データが挙がった順位）という名称のフォルダに保存される。図２９は、あるロット単位の不良率トレンドグラフを示す図である。図中横軸は各ロット、縦軸は各ロットの不良率である。図示の例では、非常に稀にしか発生しない不良が特定の８ロット（ＬＯＴ３，４，６，７，８，Ｋ，Ｌ，Ｍ）で高い割合で生じていることが示されている。
【００９０】
図３０は、説明変数と目的変数を指定した決定木を示す図である。説明変数として工程の使用装置名、処理時刻を用い、目的変数として前記不良発生の８ロットが”Ｈ”，他の１５ロットが”Ｌ”を指定して決定木分析を実行した結果が示されている。決定木図内の集合分岐は全て時刻項目である。そして、半導体製造工程においては、ほぼロット番号順に処理されていくので、時刻データは全てがほぼ交絡し、説明変数としてはほぼ等価であるといえる。従って、どの時刻データを採用しても大差がないので時刻データとしては最も有意とされた１項目に絞っても大差がなく、解析結果がむしろ解釈しやすいものとなる。
【００９１】
図３１は、説明変数と目的変数をさらに指定した決定木を示す図である。この図３１には、図３０の決定木図においてＡ工程の処理時刻Ａ＿ｔが最も目的変数に対して有意とされ、次に説明変数からＡ＿ｔ以外の時刻データを全て削除した説明変数による決定木分析結果が示されている。同図に示す最上階層での集合分岐は、図３０と同様にＡ工程の処理時刻Ａ＿ｔによるが、以下はほとんどこのＡ工程の処理時刻Ａ＿ｔと等価である時刻データが除去されているため、その背後に隠れていた使用装置による差が現れる。
【００９２】
これによると、工程ＤでＤＭ２号機を使用し、工程ＥでＥＭ４号機を使用した場合に高不良率となることが示されている。図３２は、図３１の結果を処理時刻別のトレンドグラフで表した状態を示す図である。図には工程ＥのＥＭ４号機の時間変動が高不良率の時間変動に影響を及ぼしていたことが明確に現れている。
【００９３】
以上説明した実施の形態２によれば、実施の形態１で説明したデータ解析の自動化では行えない作業、特にオペレータが注目したい事項を特定してデータ解析を行う場合や、生産設備の各生産工程に配置される生産装置の入れ替え等に対応したデータ解析を適切に行えるようになる。
【００９４】
（実施の形態３：一台装置工程に関する他の処理例）
前述した実施の形態１において説明したデータ解析手段３３の解析処理「（１）装置履歴＋時刻データ解析の設定」によれば、
▲１▼製造工程全体が先入れ先出しでロットを処理する場合、基本的にどの工程においてもロットの処理順番が同じであり、各工程の処理時刻は説明変数としては一つあれば十分である。
▲２▼一台装置工程は、装置間差を確認することができない工程であるため、問題となる要因をその一台装置の経時変化に求める。
▲３▼さらに、装置台数が多い工程ほど、回帰木分析で２分割集合間の有意差が出やすい。
【００９５】
これら▲１▼〜▲３▼の３つの事柄を応用して、本実施の形態では一台装置工程を含む全ての工程処理時刻は一つの説明変数で代表でき、かつ処理時刻による変動要因としては一台装置工程が他の複数台装置工程と同等以上に疑わしいと判断して、自動的に問題要因として疑わしい項目を絞り込むものである。
【００９６】
この実施の形態３では、製造工程全体が先入れ先出しでロットを処理する場合、データクレンジング／特徴化手段３２は、実施の形態１で説明した「（２）経時変化に基く装置名の変更設定及び処理」及び「（３）一台装置工程における項目名変更」で説明した各処理と異なり、説明変数名及び説明変数値名を変えずに一台装置工程に関する処理を行う。これにより、実施の形態１に比べて処理を単純にでき、データ件数、即ちレコード数（ロット数）が少ない場合における目的変数の外れ値による悪影響を抑えるようにしたものである。
【００９７】
図３３は、実施の形態３によるデータ解析手順を示すフローチャートである。処理内容を説明すると、実施の形態１で説明した「（１）異常値処理条件設定及び処理」を実行後の解析用データ１（ＤＡＴＡ１）に対し、
【００９８】
▲１▼一台装置工程抽出（リスト作成）
▲２▼代表項目として全一台装置工程の中で工程順の中間に位置する工程を選び、値は時刻（間隔尺度）とする。項目名は”一台装置工程時刻”とする。
▲３▼”一台装置工程時刻”を解析用データの説明変数に加え、他の全ての工程時刻項目は説明変数から除外する各処理を行い解析用データ２（ＤＡＴＡ２）を得る（ステップＳ２０）。図３４は、抽出された一台装置工程リストＲＥＰ２の一例を示す図表である。
【００９９】
この後、解析用データ２（ＤＡＴＡ２）に対しては、実施の形態１で説明した「（４）不要項目削除設定及び処理，（５）異常値割合による項目削除及びレコード削除設定及び処理，（６）時刻データの解析条件設定及び処理」、データ解析手段３３でのデータ解析処理（回帰木分析実施）、解析結果評価手段３４での解析結果評価を実行し（ステップＳ２１）、解析終了条件を満たすと（ステップＳ２２：Ｙｅｓ）、一台装置工程リストを含む報告レポートを作成し、これら報告レポートＲＥＰ１と一台装置工程リストＲＥＰ２を出力する（ステップＳ２３）。
【０１００】
次に、上記解析用データ２（ＤＡＴＡ２）に対する解析処理について説明する。図３５は、実施の形態３において用いる解析用データ２（ＤＡＴＡ２）の一例を示す図表である。図示のように、工程５の使用装置が全てＶ５０１であり、この工程５が一台装置工程とされる。図３６は、図３５に示すロット番号別の歩留りを示す図である。
【０１０１】
図３７は、図３５に示す各工程の使用装置名と処理時刻を説明変数、歩留り値を目的変数とした場合の回帰木と結果評価情報一覧を示す図である。図３７に示すように、最上階層での集合分岐の候補を示すＥｖａｌｕａｔｉｏｎ　Ｄａｔａの上位３項目は交絡しており、そのうちの一つの工程５＿ｔは、工程５の処理時刻である。工程５の使用装置工程５＿ｅは、種類数が１であるため回帰木分析実行時には説明変数から削除される。
【０１０２】
そして、Ｅｖａｌｕａｔｉｏｎ　Ｄａｔａに挙げられたうち、目的変数に対してその時刻変動が効いているとされた工程（ここでは、工程１，工程２，工程３，工程４，工程５）について対応する処理時刻を示す項目が抽出され、各項目の処理装置を示す項目でその種類数が１であるものが抽出される。
【０１０３】
上記のように、実施の形態３によれば、一台装置工程時刻に代表される経時変化による有意差が大きいことを抽出することができる。また、一台装置工程リストＲＥＰ２の記載によって、一台装置工程に該当する工程を容易に把握できるようになる。図３７に示す例における一台装置工程は、「工程５であり、この一つの工程」となる。
【０１０４】
以上説明したデータ解析処理に係る方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。このプログラムは、各種記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。またこのプログラムは、インターネット等のネットワークを介して配布することが可能な伝送媒体であってもよい。
【０１０５】
（付記１）所望するデータ解析に必要なデータをオリジナルデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いてデータの解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。
【０１０６】
（付記２）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析方法において、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの説明変数の異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。
【０１０７】
（付記３）前記データ抽出工程は、前記データの説明変数の項目名にカテゴリを識別するための付加文字列を付加し、
前記データ解析工程は、前記付加文字列に基づき説明変数のカテゴリを識別したデータ解析を行うことを特徴とする付記２に記載のデータ解析方法。
【０１０８】
（付記４）前記データ抽出工程は、前記生産工程が備える生産装置を示す説明変数の項目名に対しては製造工程名に装置のカテゴリを意味する付加文字列を付加し、該生産装置が生産対象を生産した処理時刻を示す説明変数の項目名に対しては製造工程名に時刻のカテゴリを意味する付加文字列を付加することを特徴とする付記３に記載のデータ解析方法。
【０１０９】
（付記５）前記データ抽出工程による付加文字列を付加する指示と、及び前記データ解析工程におけるカテゴリを識別したデータ解析の指示とをあらかじめ設定ファイルに設定する設定工程を含み、
前記データ抽出工程及び前記データ解析工程は、それぞれの処理実行時に前記設定ファイルを読み出し、該設定ファイルに設定された指示に基づく処理を実行することを特徴とする付記３または４に記載のデータ解析方法。
【０１１０】
（付記６）前記特徴化工程は、時刻経過による目的変数の変動に関する特徴を求め、前記データの説明変数として用いられる前記生産装置の装置名に対し、前記求められた特徴に対応する所定の記号を付加することを特徴とする付記２〜５のいずれか一つに記載のデータ解析方法。
【０１１１】
（付記７）ある一つの生産工程に生産装置が一台のみ設けられる場合、
前記データクレンジング工程は、前記一台の生産装置に相当する装置名の説明変数をデータ解析対象から外すとともに、前記生産工程の処理時刻に相当する説明変数の項目名に前記一台の生産装置のみによって構成された工程であることを示す付加文字列を付加し、
前記データ解析工程は、前記データクレンジング工程にて付加文字列が付加された前記一台の生産装置のみによって構成された工程に対しては処理時刻の説明変数を用いたデータ解析を行うことを特徴とする付記２〜６のいずれか一つに記載のデータ解析方法。
【０１１２】
（付記８）前記データクレンジング工程は、前記プロセスデータのうち生産装置に関係するデータ以外のデータを説明変数の項目名に基づいて削除することを特徴とする付記２〜７のいずれか一つに記載のデータ解析方法。
【０１１３】
（付記９）前記データクレンジング工程は、前記プロセスデータのうち項目及びレコードに対して所定の異常値の割合を超えた項目及びレコードと、目的変数の値が欠損あるいは異常なレコードとを削除することを特徴とする付記２〜８のいずれか一つに記載のデータ解析方法。
【０１１４】
（付記１０）前記データクレンジング工程は、前記生産装置の説明変数が処理時刻を示すデータを処理時刻データとして扱う設定と、生産工程における所定の周期を用いて期間を区切った際に該期間名のデータとして扱う設定とを選択可能なことを特徴とする付記２〜９のいずれか一つに記載のデータ解析方法。
【０１１５】
（付記１１）前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理する場合、
前記データ解析工程は、全ての生産工程の装置項目と、全ての生産工程における上位数Ｎの候補となる処理時刻を解析対象の説明変数として用いることを特徴とする付記２〜１０のいずれか一つに記載のデータ解析方法。
【０１１６】
（付記１２）前記生産工程が生産対象のロットを先入れ先出しせずに独立処理する場合、
前記データ解析工程は、各生産工程の処理時刻についてそれぞれが独立した処理時刻であるか否かを判別する独立時刻判別工程と、
前記独立時刻判別工程によって独立していないと判別された生産工程の処理時刻をまとめて一つの代表時刻の項目を作成する代表時刻項目作成工程と、
全ての生産工程の生産装置の項目と、前記代表時刻項目作成工程により作成された代表時刻の項目をデータ解析対象の説明変数として用いることを特徴とする付記２〜１０のいずれか一つに記載のデータ解析方法。
【０１１７】
（付記１３）前記データ解析工程は、解析すべきデータの特徴性や規則性を表すルールをデータマイニング技法により抽出することを特徴とする付記２〜１２のいずれか一つに記載のデータ解析方法。
【０１１８】
（付記１４）前記データ解析工程により前記データのロットあるいは処理時刻を変化させて得た複数の解析結果を用いて所定の総合評価値を得る評価工程を含むことを特徴とする付記２〜１３のいずれか一つに記載のデータ解析方法。
【０１１９】
（付記１５）前記評価工程は、前記ルールの信頼度を表す情報として、前記データ解析工程により解析して得たデータの集合を２分割する際の分割の明確度を表す集合分割評価値を求めることを特徴とする付記１４に記載のデータ解析方法。
【０１２０】
（付記１６）前記評価工程は、前記集合分割評価値として次の式で表されるｔの値を用いることを特徴とする付記１５に記載のデータ解析方法。
【数３】

【０１２１】
（付記１７）前記評価工程により得られた評価結果に基づき、前記生産設備の問題となる前記生産工程、前記生産装置、あるいは前記処理時刻のいずれかを絞り込んだ報告レポートを出力するレポート出力工程を含むことを特徴とする付記１６に記載のデータ解析方法。
【０１２２】
（付記１８）前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理し、ある一つの生産工程に生産装置が一台のみ設けられる場合、
前記特徴化工程は、前記一台の生産装置からなる製造工程のリストを作成することを特徴とする付記２〜１７のいずれか一つに記載のデータ解析方法。
【０１２３】
（付記１９）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析装置において、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出するデータ抽出手段と、
前記データ抽出手段により抽出されたデータの説明変数の異常値をデータクレンジングするデータクレンジング手段と、
前記データクレンジング手段によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化手段と、
前記特徴化手段により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析手段と、
を備えたことを特徴とするデータ解析装置。
【０１２４】
（付記２０）前記データクレンジング手段によりデータクレンジングされたデータの説明変数に対し、データ項目のカテゴリを示す付加文字列を付加する説明変数変換手段を備え、
前記データ解析手段は、前記付加文字列に基づきデータ項目のカテゴリを認識しカテゴリ別の解析処理を実行することを特徴とする付記１９に記載のデータ解析装置。
【０１２５】
（付記２１）前記説明変数変換手段は、データに含まれる説明変数及び目的変数の一覧を抽出し、データ解析に用いる説明変数及び目的変数を手動選択可能なことを特徴とする付記２０に記載のデータ解析装置。
【０１２６】
（付記２２）生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析プログラムであって、該プログラムは、コンピュータに対し、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出させ、
前記抽出されたデータの説明変数の異常値をデータクレンジングさせ、
前記データクレンジングされたデータの目的変数の変動を表す特徴情報を求めさせ、
前記特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行わせ、
前記データ解析によって得られた解析結果に対する所定の評価を行わせることを特徴とするデータ解析プログラム。
【０１２７】
【発明の効果】
本発明によれば、解析対象のデータを適切に抽出し、データクレンジング及び特徴化を行ってデータ解析を実行するものであり、特に、データカテゴリに対応する付加文字列をデータの説明変数に付加することにより、データの種別やデータカテゴリ間の関連性を明確にして所望するデータ解析処理を自動実行できるようになり、省資源で効率的に行えるという効果を奏する。加えて、手動によりデータ解析の条件設定等を行う場合においても、操作ミスを防ぐことができ、所望する解析処理と解析結果を得ることができるという効果を奏する。
【図面の簡単な説明】
【図１】本発明の実施の形態に係るデータ解析装置に用いられる計算機システムのハードウェア構成を示す図である。
【図２】プロセスデータの流れを説明するための図である。
【図３】図１に示すシステム構成により実現されるデータ解析装置の機能ブロック図である。
【図４】本発明のデータ解析装置におけるデータの処理手順の概要を示すフローチャートである。
【図５】解析用データ１（ＤＡＴＡ１）の内容の一部を示す図表である。
【図６】データクレンジング後の解析用データ２（ＤＡＴＡ２）の内容の一部を示す図表である。
【図７】装置名に付与されるトレンドマークを示す図表である。
【図８】解析結果データ（ＤＡＴＡ３）に基づく評価処理の内容を説明するための図表である。
【図９】報告レポートの内容の一例を示す図である。
【図１０】データ解析処理の処理手順を示すフローチャートである。
【図１１】データクレンジング設定ファイルＲＦ３の一部を示す図表である。
【図１２】項目設定ファイルＲＦ２の一部を示す図表である。
【図１３】データクレンジング設定ファイルＲＦ３の一部を示す図表である。
【図１４】解析設定ファイルＲＦ５と探索設定ファイルＲＦ６の一部を示す図表である。
【図１５】本発明の実施の形態２における解析処理の選択画面を示す図である。
【図１６】入力データの内容を説明するための図表である。
【図１７】解析用データ２（ＤＡＴＡ２）の一例を示す図表である。
【図１８】解析処理結果の一例であるトレンドグラフを示す図である。
【図１９】手動解析時の項目設定時の画面を示す図である。
【図２０】手動解析時の項目設定後の画面を示す図である。
【図２１】全工程の使用装置と処理時刻を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２２】装置履歴データ解析時における項目設定画面を示す図である。
【図２３】全工程の履歴を指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２４】装置履歴データ解析時における項目設定画面を示す図である。
【図２５】全工程の履歴と時刻データを指定してデータ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２６】工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２７】装置履歴＋第１候補時刻データ解析時における項目設定画面を示す図である。
【図２８】工程を限定して再度データ解析を行った結果の回帰木と結果評価情報一覧を示す図である。
【図２９】あるロット単位の不良率トレンドグラフを示す図である。
【図３０】説明変数と目的変数を指定した決定木を示す図である。
【図３１】説明変数と目的変数をさらに指定した決定木を示す図である。
【図３２】図３１の結果を処理時刻別のトレンドグラフで表した状態を示す図である。
【図３３】本発明の実施の形態３によるデータ解析手順を示すフローチャートである。
【図３４】抽出された一台装置工程リストＲＥＰ２の一例を示す図表である。
【図３５】実施の形態３において用いる解析用データ２（ＤＡＴＡ２）の一例を示す図表である。
【図３６】図３５に示すロット番号別の歩留りを示す図である。
【図３７】図３５に示す各工程の使用装置名と処理時刻を説明変数、歩留り値を目的変数とした場合の回帰木と結果評価情報一覧を示す図である。
【図３８】一般的なデータ解析処理の手順を示すフローチャートである。
【符号の説明】
１　入力装置
２　中央処理装置
３　出力装置
４　記憶装置
１０ａ〜１０ｎ　工程装置
１１　管理サーバ
ＤＢ１　製造情報データベース
３０　データ解析装置
３１　データ抽出手段
３２　データクレンジング／特徴化手段
３３　データ解析手段
３４　解析結果評価手段
３５　報告レポート出力手段
４０　表示項目
Ａ１　総合判定内容
Ａ２　統計的情報
Ａ３　回帰木図
Ａ４　箱髭図
Ａ５　相関図
ＤＡＴＡ１　解析用データ１
ＤＡＴＡ２　解析用データ２
Ｋ１　目的変数とする数値項目
Ｋ２　説明変数とする文字項目
Ｌ１　項目リスト
ＲＦ　設定ファイル
ＲＦ１　抽出条件設定ファイル
ＲＦ２　項目設定ファイル
ＲＦ３　データクレンジング設定ファイル
ＲＦ４　特徴化設定ファイル
ＲＦ５　解析設定ファイル
ＲＦ６　探索設定ファイル
ＲＦ７　報告条件設定ファイル
ＲＥＰ１　報告レポート
ＲＥＰ２　一台装置工程リスト
Ｓ１　目的変数リスト
Ｓ２　説明変数リスト[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a data analysis method for grasping a relationship between data handled in a wide range of industries and extracting a significant result for providing an industrially superior result.
[0002]
[Prior art]
For example, in order to improve the yield in the semiconductor manufacturing process, it is necessary to find out factors that reduce the yield as quickly as possible based on the history of the equipment used in the manufacturing stage, test results, design information, various measurement data, and the like. Is For this purpose, performing statistical analysis based on data collected in advance rather than actually performing physical analysis is superior in terms of economics, and it is necessary to perform this statistical analysis efficiently. is important.
[0003]
The inventors of the present application have previously filed a Japanese Patent Application No. 2000-284578 (Japanese Patent Application Laid-Open No. 2001-306999) as an apparatus and method for statistically analyzing such data. When a significant difference is extracted by statistical data analysis, what data is analyzed by what analysis method is determined by the experience, technique, and the like of the analyst. In this case, it is rare that a decision is made based on a single analysis result. Generally, after interpreting each analysis result, the analysis conditions (data, analysis method, etc.) to be performed next are examined and determined, Analysis processing is performed.
[0004]
FIG. 38 is a flowchart showing the procedure of a general data analysis process. The determined analysis conditions are set (step S50), the data is analyzed based on the set analysis conditions (step S51), the obtained analysis result is interpreted (step S52), and a decision is made (step S52). Step S53). If the decision is made (Step S53: Yes), the statistical analysis is terminated. If the decision cannot be made based on the current analysis result (Step S53: No), the analysis conditions are changed (Step S54), and the changed analysis is performed. Perform data analysis based on conditions.
[0005]
[Problems to be solved by the invention]
The analysis conditions changed in step S54 are input to the execution program after their explanatory variables, objective variables, processing end conditions, and the like are set by the analyst. Therefore, at the time of this input, an operation error or an execution result wait occurs, so that the analysis efficiency is reduced. Many of the analysis conditions specified here can be patterned to some extent, but in many cases, the final analysis conditions cannot be specified until after the analysis result is obtained. These are one of the factors that hinder analysis automation. In particular, it takes time for one analysis, and appears remarkably in processing such as data mining that handles many parameters.
[0006]
In order to perform data analysis efficiently, it is necessary for the analyst to always be aware of what procedure and what to analyze, and what the explanatory variables and objective variables should be. Generally, each analyst recognizes the category of the explanatory variable for each analysis case, but conventionally, processing results are output without discriminating the category of the input data. Was.
[0007]
In particular, for process data related to semiconductor manufacturing with a large number of explanatory variables despite a small number of records, the explanatory variables are intricately entangled, and the analysis procedure, the processing procedure suited to the analysis purpose, and the selection of the explanatory variables are appropriate. Otherwise, efficient data analysis could not be performed. In particular, time data plays an important role in process data analysis and is acquired in large quantities. However, the number of explanatory variables becomes too large, and confounding becomes easy (they are not independent), which makes it difficult to extract a statistically significant difference. Correspondingly, a lot of computer resources such as calculation time were required.
[0008]
The present invention has been made in view of the above problems, and can easily extract only data necessary for desired data analysis from data of a production apparatus and processing time in a production process, and is effective for improving the yield. An object of the present invention is to provide a data analysis method capable of efficiently obtaining an analysis result. Further, the object of the present invention can include providing a data analysis method capable of automatically executing data analysis of data having a small number of records and a large number of explanatory variables.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, the present invention extracts production equipment names, data necessary for data analysis from data such as processing time, and the like, and the production equipment name, processing time, workmanship, yield, and the like, which reduce the yield. Perform data analysis by narrowing down the data. At the time of executing the data analysis, by adding an additional character string for identifying the category of the data item to the explanatory variable of the data, the category can be recognized at the time of the analysis processing, and the analysis procedure corresponding to the category can be automatically executed. At this time, a desired analysis result is obtained by selecting and deleting the objective variable and the explanatory variable of the data. Further, data unnecessary for desired data analysis is deleted by the item name of the explanatory variable, and records and items having abnormal values are deleted, thereby enabling data analysis using only necessary data.
[0010]
According to the present invention, an additional character string added to an explanatory variable of data to be analyzed can be performed without performing a complicated data analysis as in the related art such as specifying analysis conditions after executing a single analysis result. Thereby, the category of the data item can be recognized, necessary data analysis processing can be automatically executed sequentially, and the reliability of the analysis result can be improved. And, even if the explanatory variables are complicatedly intertwined, such as process data relating to semiconductor manufacturing with a small number of records and a large number of explanatory variables, it is possible to select an explanatory variable and a processing procedure suitable for the analysis object and analysis purpose. Thus, data analysis can be performed efficiently.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
(Embodiment 1: Automatic analysis of data)
Hereinafter, preferred embodiments of a data analysis device and a data analysis method according to the present invention will be described in detail with reference to the accompanying drawings. The original data to be analyzed handled in the embodiment of the present invention is, for example, process data relating to semiconductor manufacturing, and it is assumed that the process data has a time variation. In order to efficiently analyze the process data, an item related to a time indicating a time variation is particularly important. In each embodiment described below, a production device (process device) arranged in each manufacturing process and its processing time are used for analysis of a yield factor and the like, and a low yield factor is analyzed by a data mining technique (regression tree analysis, The decision tree analysis is used to extract the data and further improve the analysis efficiency.
[0012]
FIG. 1 is a diagram illustrating a hardware configuration of a computer system used in a data analysis device according to an embodiment of the present invention. The data analysis device includes a central device including an operation device for operating a device such as a keyboard, an input device 1 for inputting data via a network or the like, and a CPU for executing an analysis process to be described later on the input data. The processing device 2 includes an output device 3 including display means such as a CRT and an LCD and printing means such as a printer, and a storage device 4 for storing and holding data such as an HDD.
[0013]
FIG. 2 is a diagram for explaining the flow of process data. A plurality (N) of processing apparatuses 10a to 10n are arranged in a manufacturing process of a manufacturing target such as a semiconductor. Each of the process apparatuses 10a to 10n sends process data in the respective manufacturing process to the management server 11. The process data includes the processing time at which the manufacturing target was manufactured in each process, the names of the devices used in the manufacturing, the yield, and the like. The management server 11 creates the manufacturing information database DB1 based on the input process data. This manufacturing information database DB1 is stored in the storage device 4 shown in FIG. 1 via a network or the like.
[0014]
FIG. 3 is a functional block diagram of a data analyzer realized by the system configuration shown in FIG. The process data of each process device stored in the manufacturing information database DB1 shown in FIG. The data analysis device 30 includes a data extraction unit 31, a data cleansing / characterizing unit 32, a data analysis unit 33, an analysis result evaluation unit 34, and a report output unit 35. Each unit includes a setting file RF (RF1 to RF7). The processing is executed in accordance with the setting information described in (1). It should be noted that a series of processes in each means and automatic execution are referred to as automatic analysis.
[0015]
The data analysis device 30 starts an analysis processing program such as regression tree analysis, and after specifying necessary input / output file names, objective variables, and explanatory variables, automatically (1) according to various analysis flow setting file groups. Data extraction from the database, (2) data cleansing and characterization, (3) regression tree analysis, (4) analysis result evaluation are performed in order, and (5) problem process, device and time are extracted from the process history and reported. Output report REP1.
[0016]
At this time, data in which a target variable item is added to data of a lot number (corresponding to the number of data, that is, the number of records), an apparatus name of each process, and a processing date and time are extracted and processed, and data analysis is performed by a data mining technique. Narrow down processes, devices, and times that should be noticed.
[0017]
FIG. 4 is a flowchart showing an outline of a data processing procedure in the data analysis device of the present invention. The analysis data DATA to be analyzed extracted from the manufacturing information database DB1 is subjected to data analysis by executing the analysis program in accordance with the settings of the program initialization file INI (including each of the setting files RF1 to RF7 described later). Here, based on the setting of the program initialization file INI, the analysis program sets an additional character string _xx for identifying the category of the data item in the analysis data DATA. This additional character string _xx is recognized by the analysis program.
[0018]
When a corresponding processing mode is designated by the analysis program (step S1), processing corresponding to the category type indicated by xx (automatic selection of variables such as objective variables and explanatory variables (step S2) or manual selection of variables (step S1)) After S3)), analysis processing by regression tree analysis or the like is performed (step S4). The analysis process is selected based on the first analysis process to determine whether or not to re-execute in accordance with the analysis target and analysis procedure set in advance (step S5). At the time of re-execution, an execution result is extracted (step S6), variables such as an objective variable and an explanatory variable are automatically selected or deleted (step S7), and the process returns to step S4 to execute the next analysis processing and finalize. Obtain analysis results.
[0019]
(About the data extraction means 31)
The data extraction means 31
(1) Extraction condition setting and processing for data extraction and conversion
(2) Data mining condition setting and processing
(3) Device name and time setting and processing
Respectively.
[0020]
(1) Data extraction and conversion extraction condition setting and processing
The data extraction and conversion are performed according to the extraction condition setting file RF1 data extraction program, and the process data stored in the manufacturing information database DB1 is set at a predetermined time or periodically, under the set conditions (target type, period, Item), data in which a target variable item is added to data of a lot number, a process name, an apparatus name, and a processing date and time is extracted.
[0021]
(2) Data mining condition setting and processing
According to the item setting file RF2 and the mining condition setting program, a record identification name, an explanatory variable name, a target variable name, and an explanatory variable value name are selected, and analysis data 1 (DATA1) is created and output. Specifically, the following contents are set using the item setting file RF2.
Record name: Lot number
Explanation variable item: Manufacturing process (large process name + small process name)
Explanatory variable value name: device, time
Object variable name: YIELD, characteristic value (yield), etc.
[0022]
(3) Device name and time setting processing
An apparatus name and a time are set as explanatory variable values required for analyzing a manufacturing process history of this analysis apparatus. In the analysis data 1 (DATA1), the record name is arranged in the first column, the explanatory variable and the objective variable are arranged in the second and subsequent columns. FIG. 5 is a chart showing a part of the content of the analysis data 1 (DATA1). As shown in the figure, the name of the explanatory variable value is added to the explanatory variable value name so that the explanatory variable item name can be identified. In this example, “_apparatus” or “_time” is added to the manufacturing process name, and the name of the process and its contents can be identified from the name. The data cleansing / characterizing means 32 and the data analyzing means 33 can identify the manufacturing process and the time or the apparatus in association with the explanatory variable item and the explanatory variable value name set in the item setting. It is to be noted that a plurality of devices (for example, 6nw2 and 6nw4 in the illustrated process 01_device) are arranged in each step of the production line, and the production is performed by operating in parallel.
[0023]
(About the data cleansing / characterizing means 32)
The data cleansing / characterizing means 32 executes the following data cleansing / characterizing processing in accordance with the data cleansing setting file RF3, the characterization setting file RF4, and the data cleansing / characterizing program.
(1) Abnormal value processing condition setting and processing
(2) Change setting and processing of device name based on aging
(3) Item name change in single-unit process
(4) Unnecessary item deletion setting and processing
(5) Item deletion and record deletion setting and processing by abnormal value ratio
(6) Time data analysis condition setting and processing
[0024]
Each of the processes (1) to (6) will be described in detail.
(1) Abnormal value processing condition setting and processing
The data cleansing / characterizing unit 32 replaces the explanatory variable item value of the analysis data 1 (DATA1) with a specific value based on the setting content of the data cleansing setting file RF3, if the value is missing. The processing contents of this data cleansing will be described. FIG. 6 is a chart showing a part of the content of the analysis data 2 (DATA2) after the data cleansing. The part where the explanatory variable item value of the analysis data 1 (DATA1) shown in FIG. 5 is Null (missing value) is a specific value as shown in FIG. 6 (in the example shown, the missing value is 99999, the missing character is The column value is replaced by nop).
[0025]
Then, the data cleansing / characterizing means 32 sets whether to analyze this specific value as one of the values or to analyze it as a missing value, based on the setting contents of the data cleansing setting file RF3. As for the abnormality criterion, the abnormal value definition and the replacement value are set, and the abnormal value is processed according to the setting.
[0026]
(2) Change setting and processing of device name based on aging
Even the same device may suddenly change to an abnormal device from a certain time due to some trouble. The data cleansing / characterizing means 32 can further narrow down the problem by capturing the change in the target variable value due to the sudden change and giving a different name to the device that has changed to the abnormal state. .
[0027]
The data cleansing / characterizing means 32 checks the characteristics of the transition of the target variable according to the processing time for all devices by feature extraction (such as noise removal by filtering) by wavelet transform or the like, and for devices that have strong characteristics with respect to the set reference. Is characterized by the time and duration of the ascent and descent. For example, the amount of abrupt fluctuation based on the setting of the program initialization file INI is based on 0.8 times or more of the entire standard deviation. Further, the device name is converted into a device name to which the transition feature information is added. In this embodiment, the transition feature information is added to the device name as follows.
[0028]
(1) Add a delimiter (文字).
{Circle around (2)} The period is divided into a maximum of three according to the time when the objective variable value corresponding to the device sharply rises or falls, and the symbols representing the periods (F is the first half, M is the middle stage, L is the latter half, and no symbol is the whole) wear.
(3) A trend mark is attached as a symbol representing the transition shape of the device.
(4) A one-digit number (0: weak to 9: strong) is given as a symbol representing the strength of the transition feature.
[0029]
FIG. 7 is a chart showing a trend mark added to the device name. As shown in the drawing, the trend mark divides the entire analysis data 2 (DATA2) shown in FIG. 6 into three parts, a first half, a middle part, and a second half, and indicates a state of a transition of a target variable value for each period. As shown in the figure, the state obtained by the feature extraction such as the wavelet transform is set to 1 (-＾: the first half is low and the second half is high) to 5 (-: no feature).
[0030]
Using an example of the device name to which the transition feature information is added, using the process 01_device shown in FIG.
6nw2 @ F- $ 7
Is obtained. The above example shows that the device name is 6nw2, the shape of the transition of the target variable value in the first half period (F) is low in the first half, high in the second half, and the transition feature is 7. When there is no transition feature, 6nw2 ＠ −0. As described above, the data cleansing / characterizing unit 32 creates an apparatus name to which transition characteristic information is added for each process apparatus. Since the analysis data 2 (DATA2) shown in FIG. 6 is a part of the entire data, the transition feature information after the feature extraction using the entire data is actually added.
[0031]
(3) Item name change in single-unit process
As described above, in each process of various production lines including semiconductors, a plurality of devices are often arranged and operated in parallel. However, there is a case where only one device is arranged and operated in one certain process. A process having such a configuration is defined as a “single device process”. In the single-apparatus process, it is not possible to confirm a difference between a plurality of apparatuses. Therefore, a factor that causes a problem is obtained based on a change over time in the arranged single apparatus.
[0032]
If there is only one type of device name of a process to be an explanatory variable, a difference between the devices of that process cannot be obtained, so that the process is excluded from the explanatory variables. “_One device” is added to the data item name indicating the processing time of the corresponding process. In the example shown in FIG. 6, the processing time of the process 04_device is changed to “process 04_single device”. If there is no corresponding processing time data item, or if the corresponding processing time data item also has one type of value, no change is made. At the time of analysis, it is recognized from the additional characters that the item is a time data item of the single device process.
[0033]
(4) Unnecessary item deletion setting and processing
At the stage of acquiring data, there may be a case where explanatory variable items that are unnecessary for analysis are mixed. In this case, a setting is made to delete unnecessary items before analysis. In this example, as initial settings, a plurality of character strings such as inspection process names included in the explanatory variable item names are set in order to exclude inspection processes that do not directly process products. Delete unnecessary items based on the settings.
[0034]
(5) Item deletion and record deletion setting and processing based on abnormal value ratio
Delete items and records where the ratio of missing values and defined abnormal values exceeds the default value. For example, an item in which the ratio of the missing value and the defined abnormal value is 60% or more is deleted. Delete records where the ratio of missing values and defined abnormal values is 70% or more. Records whose objective variable values are missing or abnormal are deleted. Note that among the explanatory variables, the items of the nominal scale whose value type is 1 or 100 or more are not analyzed.
[0035]
(6) Time data analysis condition setting and processing
The apparatus treats the time data in the explanatory variable as an ordinal scale (processing target in time units) by default. Further, as a second setting, a period is divided using a cycle existing in the manufacturing process, and is treated as a nominal scale (processed in units of names) processed into a name representing the period. For example, there is a change cycle of a manufacturing worker.
[0036]
(About the analysis processing of the data analysis means 33)
The data analysis means 33 executes the following analysis processing by regression tree analysis according to the contents of the analysis setting file RF5 and the analysis processing program.
[0037]
(1) Setting of device history + time data analysis
Different analysis processes are executed as follows according to the processing mode of the manufacturing process.
(1) When the whole manufacturing process processes lots on a first-in first-out basis
Since the processing order of lots is basically the same in any process, it is sufficient that the processing time of each process is one explanatory variable, and all the device items and the times of the first candidates are used as explanatory variables. Perform analysis. If there is some change in the lot processing order, the number of time data is increased, and “device history + upper N candidate time data analysis” is executed. N is an appropriate range of 1 to 20.
[0038]
(2) When the whole manufacturing process does not process lots on a first-in first-out basis
Using a method of testing the independence of time data, time data of processes that are not independent of each other are collectively combined into one representative time item, and narrowed down to a representative time item group including only independent time data (time data narrowing). After that, “apparatus history + time data analysis” is executed using all the apparatus name items and the narrowed representative time item group as explanatory variables.
[0039]
(2) Analysis end condition setting
The regression tree analysis end condition is set, for example, when the standard deviation of the divided set becomes 0.5 times or less of the whole.
[0040]
(3) Execution of analysis
The data analysis unit 33 executes the analysis according to the contents of the analysis setting file RF5 and the analysis processing program. When a plurality of target variables are set, a plurality of set items are sequentially selected and analyzed.
[0041]
(1) Processing details of “device history + top N candidate time data analysis”
1. A regression tree analysis is performed on a specified target variable using only the device name item and the processing time item of one device as explanatory variables.
2. Next, the time data of the top N steps that are candidates in the analysis result are added to the explanatory variables, and the regression tree analysis is performed again. When there is no time data of the item listed as the k-th candidate, the time data of the item having the time data exists after the (k + 1) -th candidate is searched and added. When there is no candidate, the candidate is clearly indicated, and when there is, the candidate is clearly indicated by the analysis result (1 ≦ k ≦ N).
[0042]
(2) Processing details of "device history + time data analysis"
A regression tree analysis is performed using all the device name items and the narrowed representative time item group as explanatory variables.
[0043]
(Extraction and evaluation of analysis results by analysis result evaluation means 34)
The automatic analysis by the data analysis means 33 described above does not end in one execution. By repeating the analysis while changing the analysis target period or the target lot and changing various setting values of the data cleansing / characterizing means 32 and the data analysis means 33 from the initial values, analysis result data (DATA3) is obtained. Then, a plurality of obtained results indicated by the analysis result data (DATA3) are evaluated to obtain more reliable analysis results.
[0044]
Here, an outline of the regression tree analysis and the t-test will be described. The regression tree analysis targets a set of records including explanatory variables indicating a plurality of attributes and objective variables affected by the attributes, and determines attributes and attribute values that most influence the objective variables. The analysis result evaluation means 34 outputs rules indicating characteristics and regularities of the data.
[0045]
The processing of the regression tree analysis is realized by repeatedly dividing the set into two based on the parameter value (attribute value) of each explanatory variable (attribute). In the set division, when the sum of the squares of the objective variable before the division is S0 and the sum of the squares of the respective objective variables of the two sets after the division are S1 and S2, ΔS represented by the following equation (1) is a maximum. Thus, the explanatory variable of the record to be divided and its parameter value are obtained.
[0046]
ΔS = S0− (S1 + S2) (1)
[0047]
The explanatory variables and their parameter values obtained here correspond to the branch points in the regression tree. Thereafter, the same processing is repeated for the divided sets, and the influence of the explanatory variables on the objective variables is examined. The above is a generally well-known method of regression tree analysis. In order to grasp the clarity of set division in more detail, the following parameters (a) to ( d) is also used as a quantitative evaluation of the regression tree analysis result.
[0048]
(A) S ratio:
It is a reduction rate of the sum of squares by set division, and is a parameter indicating how much the sum of squares is reduced by set division. The smaller this value is, the greater the effect of the set division is, and since the set division is clearly performed, the significant difference is large.
[0049]
S ratio = ((S1 + S2) / 2) / S0 (2)
[0050]
(B) t value:
The set is divided into two by the execution of the regression tree analysis, and is a value for testing the difference between the average (/ X1, / X2) of the two divided sets. Here, “/” indicates an overline. The statistical t-test is a criterion indicating a significant difference between the average values of the objective variables in the divided sets. If the degree of freedom, that is, the number of data is the same, the larger the t, the more clearly the set is divided, and the greater the significant difference.
[0051]
At this time, if there is no significant difference in the variance of the divided sets, the t value is obtained by the following equation (3). If the variance of the divided sets has a significant difference, the t value is obtained by the following equation (4). Ask for. Here, N1 and N2 are the numbers of elements of the divided sets 1 and 2, respectively. Also, / X1 and / X2 are the averages of each set after division. S1 and S2 are the sum of squares of the objective variable of each set after division.
[0052]
(Equation 1)

[0053]
(Equation 2)

[0054]
(C) Difference between the mean values of the objective variables of the divided sets:
The larger this value is, the larger the significant difference is.
[0055]
(D) Number of data of each divided set:
The smaller the difference between the two, the smaller the effect of abnormal values (noise).
[0056]
(1) Evaluation of analysis results
The evaluation is performed based on the search setting file RF6 and the analysis result evaluation program. This evaluation information is calculated for each analysis result, but can be compared between the analysis results.
(2) Search for highly reliable analysis results
The analysis result evaluation means 34 determines a search for a highly reliable analysis result for the analysis result data (DATA3) based on the setting of the search setting file RF6. For example, a comparison evaluation value is used under the condition that “the comparison is performed using the t-test value of the first branch of the regression tree. However, the number of data in each of the two divided groups is equal to or larger than the set standard”. With this comparative evaluation value, a more reliable analysis result can be searched. Then, by limiting the range in which each set value of each of the setting files RF1 to RF5 is changed or by limiting the automatic analysis time, the search is terminated, and the obtained plurality of analysis results, each comprehensive evaluation value, and ranking are determined. Then, the most reliable analysis result among the analyzes performed is extracted. FIG. 14B shows a part of the setting of the range and method for changing the set value. According to this setting, diversified analysis is executed by utilizing the two-part confounding degree and the abnormal value ratio of each item, and a more clear analysis result search is performed.
[0057]
FIG. 8 is a chart for explaining the contents of the evaluation process based on the analysis result data (DATA3). As shown in the figure, by the following evaluation process, each process device and each evaluation value (t-test value described below, device names and numbers of low and high groups, average value, etc.) are calculated as item names, and t The rank No. is assigned assuming that the problem is greater in descending order of the test value of the test.
[0058]
(About the report output means 35)
The report report output unit 35 creates and outputs a report report on the most reliable analysis result.
(1) Report creation
The analysis results include regression tree rule information in a file, statistical values for evaluation of a predetermined number of first branch candidates (for example, the top 20 candidates), a two-fold confounding degree between each candidate, and an HTML file as a report information file. The regression tree diagram, the significant difference between the main two branches and the top two candidates are output in a simple sentence, and the box-whisker diagram or the correlation diagram is associated with the data.
[0059]
(2) Specific reporting method
The report of the analysis result is processed based on the setting of the report condition setting file RF7 and the report processing program. For example, the contents of the report are displayed on the screen and an alarm is notified to a preset e-mail address, and the report report and the report report Web address are reported.
[0060]
FIG. 9 is a diagram illustrating an example of the content of a report report. Based on the analysis results, the report report REP1 includes: (1) comprehensive judgment contents (simplified sentences explaining significant differences between the main two branches and the top two candidates) A1, (2) statistical information A2, (3) regression tree diagram A3, {circle around (4)} A box plot A4 corresponding to the regression tree diagram A3 or a correlation diagram A5 is displayed. The regression tree diagram A3 is displayed, for example, in a description format such as HTML, and by clicking on a desired process device, it becomes possible to selectively display the linked (4) box-whisker diagram A4 or correlation diagram A5. ing.
[0061]
FIG. 10 is a flowchart illustrating a processing procedure of the data analysis processing. As shown in the figure, the data extracting means 31 extracts and converts data from the manufacturing information database DB1 to obtain analysis data (DATA1) (step S11). Next, the data cleansing / characterizing means 32 obtains analysis data (DATA2) obtained by cleansing and characterizing the data of the analysis data (DATA1) (step S12).
[0062]
Next, the data analysis unit 33 performs data mining on the analysis data (DATA2) after the cleansing and characterization by a regression tree analysis method to obtain analysis result data (DATA3) (step S13). Next, the analysis result evaluation means 34 evaluates the analysis result using the analysis result data (DATA3) (step S14). At the time of this evaluation, a highly reliable result is searched. For example, it is determined whether or not the set analysis end condition is satisfied (step S15). If not satisfied (step S15: No), the analysis target period or the target lot is changed, and various setting values of the data cleansing / characterizing means 32 and the data analysis means 33 are repeatedly changed while being changed from the initial values. The analysis is performed to obtain more reliable analysis result data (DATA3).
[0063]
When an analysis result satisfying the analysis end condition of the data analysis unit 33 is obtained (step S15: Yes), the report report REP1 is output by the report report output unit 35 (step S16), and the data analysis process ends. The above data analysis processing is automatically executed on a weekly or monthly basis, and reports obtained by data analysis of the history, test results, design information, various measurement data, etc. of the equipment used in the manufacturing stage such as the semiconductor manufacturing process The report REP1 makes it possible to easily find the factor that reduces the yield, and to improve the yield.
[0064]
FIGS. 11 to 14 are diagrams showing an example of the setting contents of a setting file for setting the analysis processing. FIG. 11 is a chart showing a part of the data cleansing setting file RF3. Replacement of the explanatory variable item value of the analysis data 1 (DATA1) at the time of loss or replacement with a specific value, an abnormal value ratio for deleting an item and a record, and the like are set. FIG. 12 is a chart showing a part of the item setting file RF2. It is set for each of the identification character examples of the date type item and the device type item. FIG. 13 is a chart showing a part of the data cleansing setting file RF3. A search string is set when deleting an item name that uses a specific string.
[0065]
FIG. 14 is a chart showing a part of the analysis setting file RF5 and a part of the search setting file RF6. FIG. 14A shows a part of the analysis setting file RF5, which shows the setting contents of the time data analysis and the setting contents of the analysis conditions. Independent time data analysis ”is set, and a target variable and an explanatory variable are specified, and a condition for terminating the analysis process is set. FIG. 14B shows a part of the search setting file RF6, in which settings for performing diversified analysis are set, and a basic method of using the two-part confounding degree, a selection method based on an abnormal value ratio of items, and the like are set. ing.
[0066]
The above setting contents can be easily obtained by adding or changing necessary settings to the analysis flow setting file group (setting files RF1 to RF7) when there is a condition specific to an object to which the analysis apparatus is applied. Can respond.
[0067]
(Embodiment 2: About the mode specification selection process of data analysis)
The second embodiment of the present invention is configured to automatically perform the analysis process and the subsequent processes when cleansing target data has been obtained. The configuration of the data analysis device 30 is the same as that of the first embodiment, and the description is omitted. A specific data analysis process in the data analysis unit 33 described in the first embodiment will be supplementarily described in the second embodiment.
[0068]
FIG. 15 is a diagram showing a selection screen of an analysis process according to Embodiment 2 of the present invention. When a file name of each of the input and output data is selected, a display item 40 is displayed as shown in the figure. As the display item 40, four processing analysis modes are displayed, and any one of 1 to 4 can be selected. 1 "device history data analysis", 2 "device history + top (n) candidate time data analysis", 3 "device history + first candidate time data analysis", and 4 "Manual (manual analysis)".
[0069]
FIG. 16 is a chart for explaining the contents of the input data. This figure corresponds to FIG. 5 (manufacturing information database DB1) described above. As shown in FIG. 16, the input data is a CSV format file including, for each lot number, a device name and a processing time of each process as an explanatory variable, a yield value as an objective variable, and the like.
[0070]
The data extracting means 31 of the data analysis device 30 takes in the above-mentioned manufacturing information database DB1 and automatically analyzes the data to narrow down the device names and processing times of the steps to be noted. In the item setting file RF2 described with reference to FIG. 12, an additional character string “_e” is added to the process name as the device name of the explanatory variable, and an additional character string “_t” is added to the process name at the processing time. That is, the used device name and the processing time in the process A are “A_e” and “A_t”, respectively.
[0071]
In addition, "_time" and "_DAY_TIME" are defined in addition to "_t" as an additional character string indicating a date type item, and the latter two are converted to a representative identification character string "_t" and handled. The same applies to the identification character string indicating the device item. As a result, data collected from different data sources can be handled as the same category data type.
[0072]
With the above setting, at the time of analysis processing, the data with the additional character strings “_t”, “_time”, and “_DAY_TIME” are regarded as time data, while the additional character strings “_e”, “_device”, “_EQUIP” The data with "" is regarded as the device data, and the analysis processing corresponding to each processing mode is automatically executed. The automatic execution is performed when a mode other than the above-mentioned 4 “Manual (manual analysis)” is selected.
[0073]
In addition, based on the setting contents of the extraction condition setting file RF1 and the item setting file RF2 which constitute a part of the program initialization file INI, it is determined what category of data is included as an explanatory variable. Then, based on the setting of the item setting file RF2 illustrated in FIG. 12, it is determined that the process use device and the processing time are included. Here, when there is no identification character string in the data that matches the setting content of the program initialization file INI, all of the items are subjected to manual analysis (determination of the timing of the processing mode designation step S1 in FIG. 4).
[0074]
FIG. 17 is a chart showing an example of the analysis data 2 (DATA2). The data analysis unit 33 performs a regression tree analysis on the analysis data. As a result, it is possible to obtain an analysis result that it is the apparatus used in steps 1 to 4 and the processing time that affect the yield of all eight lots. FIG. 18 is a diagram illustrating a trend graph as an example of the analysis processing result. The horizontal axis is the lot number, and the vertical axis is the yield. The figure shows a state where the yield of four lots (Lot2 to Lot5) processed in a certain period is high and a state where the yield of the lot processed by the device V202 in step 2 is low, for the devices V201 and V202 constituting the process 2_e. Obtained as an analysis result.
[0075]
Next, the analysis processing for each processing mode will be described.
(1) Manual (manual analysis)
When "Manual (manual analysis)" is selected on the analysis processing selection screen shown in FIG. 15, necessary items are displayed in a variable selection item list L1. FIG. 19 is a diagram showing a screen when setting items during manual analysis. The item list L1 lists numerical items K1 as objective variables and character items K2 as explanatory variables. The necessary numerical items K1 and character items K2 are selected, and analysis is performed after manually setting conditions. Execute the process.
[0076]
FIG. 20 is a diagram showing a screen after setting items during manual analysis. In the illustrated example, the yield selected as the target variable is displayed in the target variable list S1. As the explanatory variables, a state in which the used devices of all the processes and the processing times (steps 1_e to 4_e, steps 1_t to 4_t) are selected is displayed in the explanatory variable list S2. The data analysis means 33 executes an analysis processing program based on this setting. FIG. 21 shows the result of the data analysis. FIG. 21 is a diagram illustrating a regression tree and a list of result evaluation information as a result of performing data analysis by designating a device to be used in all processes and a processing time.
[0077]
(2) Device history data analysis
FIG. 22 is a diagram showing an item setting screen when analyzing apparatus history data. When "device history data analysis" is selected on the selection screen of the analysis process shown in FIG. 15, only those in which "_e" indicating the device used in each process is an additional character string are included in the character item K2 of the item list L1. Are automatically listed as explanatory variables. Thereafter, the objective variables and the explanatory variables listed in the numerical item K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0078]
The data analysis unit 33 executes the regression tree analysis based on the setting, using only those whose explanatory variable is the device history. FIG. 23 shows the result of the data analysis. FIG. 23 is a diagram showing a regression tree and a result evaluation information list as a result of performing data analysis by specifying the histories of all processes.
[0079]
According to FIG. 23, it can be seen that the difference between the processing apparatuses in step 2 is the most significant (step 2 is the first candidate). However, this alone does not reveal a significant difference when viewed in chronological order. There is a possibility that the device V202 having many low-yield lots accidentally processed many lots during a bad period (for example, a period during which there was a change due to factors other than the device used, such as a temporary change in process conditions). There is also.
[0080]
(3) Device history + top candidate time data analysis
FIG. 24 is a diagram showing an item setting screen when analyzing apparatus history data. When "device history + upper candidate time data analysis" is selected on the analysis processing selection screen shown in FIG. 15, all items other than the time item (the numerical item K1 as the objective variable, the character item as the explanatory variable) are displayed in the item list L1. K2) is automatically displayed as a list. Thereafter, the objective variable and the explanatory variable listed in the numerical term K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0081]
The data analysis unit 33 executes a regression tree analysis based on the set objective variables and explanatory variables. The above processing executes the same processing as "(2) apparatus history data analysis", and the result in this state is the same as FIG.
[0082]
Next, the data analysis means 33 automatically extracts the obtained data analysis result, and performs the process of the process listed in Evaluation Data which is an evaluation candidate for set division in the regression tree diagram and at the top hierarchy of the regression tree diagram. A process of adding the processing time as a new explanatory variable is performed, and the regression tree analysis is automatically executed again.
[0083]
At this time, the processing times of all four processes are added as explanatory variables. Here, the item of the process processing time corresponding to the used device name of the process is extracted based on the fact that the items except the additional character string are the same. More specifically, in the second regression tree analysis, time item steps 1_t, 2_t, 3_t, and 4_t are added to the explanatory variables specified in the first iteration. FIG. 25 shows the result of the data analysis. FIG. 25 is a diagram showing a regression tree and a list of result evaluation information as a result of performing data analysis by designating the history and time data of all processes. The result shown in FIG. 25 is the same as that of FIG. 21 described above. The processing result of the second data analysis is newly stored in a folder named “EQ_Time”.
[0084]
According to the above results, it can be confirmed that the difference due to the time of step 1 is the most significant. However, when the difference of the top three candidates (processing time of steps 2 and 4) by the evaluation data is confirmed, the difference is exactly the same and confounding It is assumed that The order (order) of the times at which the lots were actually processed in each of these steps is exactly the same, and almost the same transition (trend) is obtained in the next step 3.
[0085]
When the evaluation data output in the first regression tree analysis is up to the third item, the step 1_e is excluded from the items listed in the evaluation data shown in FIG. 23 and is present in the regression tree diagram. do not do. Therefore, in the second regression tree analysis, step 1_t is not added as an explanatory variable. FIG. 26 is a diagram showing a regression tree and a result evaluation information list of the result of performing the data analysis again with the process limited.
[0086]
(4) Device history + first candidate time data analysis
FIG. 27 is a diagram showing an item setting screen at the time of analyzing the device history + first candidate time data. When "device history + first candidate time data analysis" is selected on the analysis processing selection screen shown in FIG. 15, the item list L1 includes a numeric item K as an objective variable and a character item K2 as an explanatory variable includes a time item. All items except items are listed automatically. Thereafter, the objective variables and the explanatory variables listed in the numerical item K1 and the character item K2 are manually selected and displayed in the objective variable list S1 and the explanatory variable list S2.
[0087]
The data analysis unit 33 executes the regression tree analysis using only the data whose explanatory variable is the device history based on this setting. The analysis processing described above performs the same analysis processing as “(2) apparatus history data analysis”. The result of the analysis processing is automatically extracted, and only the items listed as the first candidates as time data in the evaluation data are added to the explanatory variables, and the processing until the regression tree analysis is performed again is automatically performed. In the present embodiment, the processing time of step 2 is added as an explanatory variable. FIG. 28 is a diagram showing a regression tree and a list of result evaluation information as a result of re-analyzing the data by limiting the steps.
[0088]
In the analysis performed in the processing mode of “(3) Apparatus history + upper candidate time data analysis” described above, the time data items of each of the confounding processes are collectively ranked higher than Evaluation Data. In this case, if the time items of the actual hundreds of processes are set as explanatory variables, only the time data items of each process including those in the regression tree diagram are output. In many cases, the time data items of these processes are entangled, and only one representative time data is required in many cases. Therefore, a regression tree analysis on the yield is performed in advance using the data of each unit as an explanatory variable, and the time data of the first candidate process is used as a representative of the time variable using only the process that became the first candidate as a result. And perform a regression tree analysis on the yield again.
[0089]
The processing result of the second analysis is newly stored in a folder named “EQ_TimeXX” (XX: the order in which the time data was listed first). FIG. 29 is a diagram showing a defect rate trend graph for a certain lot. In the figure, the horizontal axis represents each lot, and the vertical axis represents the defect rate of each lot. In the example shown in the drawing, it is shown that a defect that occurs very rarely occurs at a high rate in specific eight lots (

LOTs

3, 4, 6, 7, 8, K, L, and M).
[0090]
FIG. 30 is a diagram illustrating a decision tree in which an explanatory variable and an objective variable are specified. The results of executing the decision tree analysis by using the name of the apparatus used in the process and the processing time as the explanatory variables, and specifying "H" for the 8 lots in which the defect has occurred and "L" for the other 15 lots as the objective variables are shown. ing. All set branches in the decision tree diagram are time items. Then, in the semiconductor manufacturing process, since the processing is performed almost in the order of lot numbers, all of the time data are almost confounded, and it can be said that they are almost equivalent as explanatory variables. Therefore, there is no significant difference in any of the time data, so that there is no great difference even if the time data is narrowed down to the most significant item, and the analysis result is rather easy to interpret.
[0091]
FIG. 31 is a diagram illustrating a decision tree in which explanatory variables and objective variables are further specified. In FIG. 31, the processing time A_t of the process A in the decision tree diagram of FIG. 30 is the most significant for the objective variable, and the decision tree analysis using the explanatory variables in which all the time data other than A_t are deleted from the explanatory variables The results are shown. The set branching at the uppermost hierarchy shown in FIG. 29 depends on the processing time A_t of the process A as in FIG. 30. However, since time data that is almost equivalent to the processing time A_t of the process A is removed, Differences due to the used equipment hidden behind appear.
[0092]
According to this, it is shown that when the DM2 machine is used in the process D and the EM4 machine is used in the process E, a high defect rate is obtained. FIG. 32 is a diagram illustrating a state in which the result of FIG. 31 is represented by a trend graph for each processing time. The figure clearly shows that the time fluctuation of the EM4 in the process E affected the time fluctuation of the high defect rate.
[0093]
According to the second embodiment described above, operations that cannot be performed by the automation of the data analysis described in the first embodiment, in particular, a case in which an operator specifies data to be focused on and performs a data analysis, It is possible to appropriately perform data analysis corresponding to, for example, replacement of a production device to be arranged.
[0094]
(Embodiment 3: Another processing example related to the single-device process)
According to the analysis process “(1) setting of device history + time data analysis” of the data analysis means 33 described in the first embodiment,
{Circle around (1)} When the whole manufacturing process processes a lot on a first-in first-out basis, the lot processing order is basically the same in any process, and it is sufficient for the processing time of each process to be one explanatory variable.
{Circle around (2)} Since the single-device process is a process in which a difference between devices cannot be confirmed, a problematic factor is determined from the change over time of the single device.
{Circle around (3)} Further, as the number of devices increases, the regression tree analysis tends to cause a significant difference between the two sets.
[0095]
Applying these three items (1) to (3), in the present embodiment, all the process processing times including one apparatus process can be represented by one explanatory variable, and the variation factors due to the processing time are as follows. It is determined that the one-device process is as suspicious as the other multiple-device processes, and the suspicious item is automatically narrowed down as a problem factor.
[0096]
In the third embodiment, when the entire manufacturing process processes lots on a first-in first-out basis, the data cleansing / characterizing unit 32 performs the processing described in the first embodiment under “(2) Change setting and processing of device name based on temporal change”. Unlike the processes described in "" and "(3) Change of item name in single-device process", the process relating to the single-device process is performed without changing the explanatory variable name and the explanatory variable value name. Thus, the processing can be simplified as compared with the first embodiment, and the adverse effect due to outliers of the objective variable when the number of data, that is, the number of records (the number of lots) is small is suppressed.
[0097]
FIG. 33 is a flowchart showing a data analysis procedure according to the third embodiment. Explaining the processing contents, the analysis data 1 (DATA1) after executing “(1) abnormal value processing condition setting and processing” described in the first embodiment will be described.
[0098]
▲ 1 ▼ Extraction of one device process (list creation)
{Circle around (2)} As a representative item, a process located in the middle of the process order among all the one-device processes is selected, and the value is a time (interval scale). The item name is “single device process time”.
{Circle around (3)} Add "single device process time" to the explanatory variable of the analysis data, and perform all other processes excluding all other process time items from the explanatory variable to obtain analysis data 2 (DATA2) (step S20). . FIG. 34 is a chart showing an example of the extracted one-apparatus process list REP2.
[0099]
Thereafter, for analysis data 2 (DATA2), “(4) unnecessary item deletion setting and processing, (5) item deletion and record deletion setting and processing based on abnormal value ratio, described in the first embodiment, (5) 6) Setting and Processing of Time Data Analysis Conditions ”, data analysis processing (regression tree analysis is performed) by the data analysis unit 33, and analysis result evaluation is performed by the analysis result evaluation unit 34 (step S21). If the conditions are satisfied (Step S22: Yes), a report including the single device process list is created, and the report report REP1 and the single device process list REP2 are output (Step S23).
[0100]
Next, an analysis process for the analysis data 2 (DATA2) will be described. FIG. 35 is a chart showing an example of the analysis data 2 (DATA2) used in the third embodiment. As shown in the figure, all of the devices used in step 5 are V501, and this step 5 is a single-device step. FIG. 36 is a diagram showing the yield for each lot number shown in FIG.
[0101]
FIG. 37 is a diagram showing a regression tree and a list of result evaluation information when the used device name and the processing time of each step shown in FIG. 35 are used as explanatory variables, and the yield value is used as an objective variable. As shown in FIG. 37, the upper three items of Evaluation Data indicating a set branch candidate at the top hierarchy are confounded, and one of the steps 5_t is the processing time of the step 5. Since the number of types of the used device process 5_e in the process 5 is 1, it is deleted from the explanatory variables when the regression tree analysis is performed.
[0102]
The processing time corresponding to the step (here, step 1, step 2, step 3, step 4, step 5) for which the time variation is effective for the objective variable among the evaluation data is listed in the evaluation data. Are extracted, and an item indicating the processing device of each item whose number of types is 1 is extracted.
[0103]
As described above, according to the third embodiment, it is possible to extract that a significant difference due to a temporal change represented by a single device process time is large. In addition, the description corresponding to the single device process list REP2 makes it possible to easily understand the process corresponding to the single device process. The single device process in the example shown in FIG. 37 is “process 5, which is one process”.
[0104]
The method relating to the data analysis processing described above can be realized by executing a prepared program on a computer such as a personal computer or a workstation. This program is recorded on various recording media, and is executed by being read from the recording medium by a computer. The program may be a transmission medium that can be distributed via a network such as the Internet.
[0105]
(Supplementary Note 1) a data extraction step of extracting data necessary for desired data analysis from the original data;
A data cleansing step of data cleansing an abnormal value of the data extracted in the data extraction step,
A characterization step of determining feature information of the data that has been data cleansed by the data cleansing step;
A data analysis step of analyzing data using the feature information obtained in the characterization step,
A data analysis method comprising:
[0106]
(Supplementary Note 2) In a data analysis method for performing data analysis of process data including an objective variable indicating a change in the quality of a production process and an explanatory variable explaining the change in the objective variable,
A data extraction step of extracting data necessary for desired data analysis from the process data,
A data cleansing step of data cleansing an abnormal value of an explanatory variable of the data extracted in the data extracting step,
A characterization step of determining feature information representing a change in an objective variable of the data that has been data cleansed by the data cleansing step;
A data analysis step of performing a data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization step,
A data analysis method comprising:
[0107]
(Supplementary Note 3) In the data extracting step, an additional character string for identifying a category is added to an item name of an explanatory variable of the data;
3. The data analysis method according to claim 2, wherein the data analysis step performs data analysis in which a category of an explanatory variable is identified based on the additional character string.
[0108]
(Supplementary Note 4) In the data extraction step, an additional character string indicating a category of the device is added to the name of the manufacturing process for the item name of the explanatory variable indicating the production device included in the production process. 3. The data analysis method according to claim 3, wherein an additional character string indicating a time category is added to the name of the manufacturing process for the item name of the explanatory variable indicating the processing time at which the target was produced.
[0109]
(Supplementary Note 5) A setting step of setting an instruction to add an additional character string in the data extraction step and an instruction of data analysis identifying a category in the data analysis step in a setting file in advance,
5. The data analysis method according to claim 3, wherein the data extraction step and the data analysis step read the setting file at the time of execution of each processing, and execute processing based on an instruction set in the setting file. Method.
[0110]
(Supplementary Note 6) In the characterization step, a characteristic relating to a change in an objective variable with the passage of time is obtained, and a predetermined symbol corresponding to the obtained characteristic is added to an apparatus name of the production apparatus used as an explanatory variable of the data. 6. The data analysis method according to any one of supplementary notes 2 to 5, wherein
[0111]
(Supplementary Note 7) When only one production device is provided in one production process,
The data cleansing step excludes an explanatory variable of an apparatus name corresponding to the one production apparatus from data analysis targets, and includes only the one production apparatus in an item name of an explanatory variable corresponding to a processing time of the production step. An additional character string indicating that the process is composed of
In the data analysis step, data analysis using an explanatory variable of a processing time is performed for a step configured by only the one production device to which an additional character string is added in the data cleansing step. 7. The data analysis method according to any one of supplementary notes 2 to 6, wherein
[0112]
(Supplementary note 8) The data cleansing step according to any one of Supplementary notes 2 to 7, wherein data other than the data related to the production device in the process data is deleted based on the item name of the explanatory variable. Data analysis method described.
[0113]
(Supplementary Note 9) In the data cleansing step, in the process data, items and records exceeding a predetermined abnormal value ratio with respect to the items and records, and records whose objective variable values are missing or abnormal are deleted. 9. The data analysis method according to any one of supplementary notes 2 to 8, wherein
[0114]
(Supplementary Note 10) The data cleansing step includes a setting in which an explanatory variable of the production apparatus treats data indicating a processing time as processing time data, and a setting of the period name when the period is divided using a predetermined cycle in the production process. 10. The data analysis method according to any one of supplementary notes 2 to 9, wherein a setting to be handled as data can be selected.
[0115]
(Supplementary Note 11) When all the production processes sequentially process lots to be produced by first-in first-out,
The data analysis step may use, as an explanatory variable of an analysis target, an apparatus item of every production step and a processing time that is a candidate of the top number N in every production step. Data analysis method described in (1).
[0116]
(Supplementary Note 12) When the production process independently processes lots to be produced without first-in first-out,
The data analysis step is an independent time determination step of determining whether the processing time of each production process is an independent processing time.
A representative time item creation step of creating one representative time item by summarizing the processing times of the production processes determined to be not independent by the independent time determination step;
11. The apparatus according to any one of Appendices 2 to 10, wherein the items of the production devices in all the production processes and the item of the representative time created in the representative time item creation process are used as explanatory variables for data analysis. Data analysis method.
[0117]
(Supplementary note 13) The data analysis method according to any one of Supplementary notes 2 to 12, wherein the data analysis step extracts a rule representing a characteristic or regularity of data to be analyzed by a data mining technique. .
[0118]
(Supplementary note 14) The supplementary note 2 to 13, further comprising an evaluation step of obtaining a predetermined comprehensive evaluation value using a plurality of analysis results obtained by changing a lot or processing time of the data in the data analysis step. The data analysis method according to any one of the above.
[0119]
(Supplementary Note 15) In the evaluation step, a set division evaluation value indicating a degree of clarity of division when dividing a set of data obtained by the data analysis step into two is obtained as information indicating the reliability of the rule. 14. The data analysis method according to supplementary note 14, wherein:
[0120]
(Supplementary note 16) The data analysis method according to supplementary note 15, wherein the evaluation step uses a value of t represented by the following equation as the set division evaluation value.
[Equation 3]

[0121]
(Supplementary Note 17) A report output step of outputting a report that narrows down any of the production process, the production apparatus, or the processing time, which is a problem of the production facility, based on the evaluation result obtained in the evaluation process. 17. The data analysis method according to Supplementary Note 16, wherein the data analysis method includes:
[0122]
(Supplementary Note 18) When all the production processes sequentially process lots to be produced in a first-in first-out manner, and only one production device is provided in one production process,
The data analysis method according to any one of supplementary notes 2 to 17, wherein in the characterization step, a list of manufacturing steps including the one production device is created.
[0123]
(Supplementary Note 19) In a data analysis device for performing data analysis of process data including an objective variable indicating a change in the quality of a production process and an explanatory variable explaining the change in the objective variable,
Data extraction means for extracting data necessary for desired data analysis from the process data,
Data cleansing means for data cleansing an abnormal value of the explanatory variable of the data extracted by the data extracting means,
Characterization means for obtaining characteristic information representing a change in an objective variable of data that has been data cleansed by the data cleansing means,
A data analysis unit that performs data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization unit,
A data analysis device comprising:
[0124]
(Supplementary Note 20) An explanatory variable conversion unit that adds an additional character string indicating a category of a data item to an explanatory variable of the data that has been data cleansed by the data cleansing unit,
20. The data analysis apparatus according to claim 19, wherein the data analysis unit recognizes a category of the data item based on the additional character string and executes an analysis process for each category.
[0125]
(Supplementary note 21) The supplementary note 20, wherein the explanatory variable conversion unit extracts a list of explanatory variables and objective variables included in the data, and can manually select the explanatory variables and objective variables used for data analysis. Data analyzer.
[0126]
(Supplementary Note 22) A data analysis program for performing data analysis of process data including an objective variable indicating a variation in the quality of a production process and an explanatory variable explaining the variation of the objective variable. ,
The data necessary for the desired data analysis is extracted from the process data,
Data cleansing the abnormal values of the explanatory variables of the extracted data,
Let the characteristic information representing the variation of the objective variable of the data cleansed data,
Using the feature information to perform a data analysis to search for a variation factor of the objective variable,
A data analysis program for causing a predetermined evaluation to be performed on an analysis result obtained by the data analysis.
[0127]
【The invention's effect】
According to the present invention, data to be analyzed is appropriately extracted, and data analysis is performed by performing data cleansing and characterization. In particular, an additional character string corresponding to a data category is added to an explanatory variable of data. By doing so, it becomes possible to clarify the relationship between data types and data categories, and to automatically execute desired data analysis processing, thereby achieving an effect of being able to perform resources efficiently and efficiently. In addition, even when the conditions for data analysis are manually set, operation errors can be prevented, and a desired analysis process and analysis result can be obtained.
[Brief description of the drawings]
FIG. 1 is a diagram showing a hardware configuration of a computer system used in a data analysis device according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining a flow of process data.
FIG. 3 is a functional block diagram of a data analysis device realized by the system configuration shown in FIG. 1;
FIG. 4 is a flowchart showing an outline of a data processing procedure in the data analysis device of the present invention.
FIG. 5 is a chart showing a part of the content of analysis data 1 (DATA1).
FIG. 6 is a table showing a part of contents of analysis data 2 (DATA2) after data cleansing.
FIG. 7 is a table showing a trend mark added to a device name.
FIG. 8 is a chart for explaining the contents of an evaluation process based on analysis result data (DATA3).
FIG. 9 is a diagram showing an example of the contents of a report report.
FIG. 10 is a flowchart illustrating a processing procedure of a data analysis process.
FIG. 11 is a table showing a part of a data cleansing setting file RF3.
FIG. 12 is a chart showing a part of an item setting file RF2.
FIG. 13 is a table showing a part of a data cleansing setting file RF3.
FIG. 14 is a table showing a part of an analysis setting file RF5 and a part of a search setting file RF6.
FIG. 15 is a diagram showing a selection screen of an analysis process according to the second embodiment of the present invention.
FIG. 16 is a chart for explaining contents of input data.
FIG. 17 is a chart showing an example of analysis data 2 (DATA2).
FIG. 18 is a diagram showing a trend graph as an example of the analysis processing result.
FIG. 19 is a diagram showing a screen when setting items during manual analysis.
FIG. 20 is a diagram showing a screen after setting items during manual analysis.
FIG. 21 is a diagram illustrating a regression tree and a list of result evaluation information as a result of performing data analysis by designating a device to be used in all processes and a processing time.
FIG. 22 is a diagram showing an item setting screen when analyzing apparatus history data.
FIG. 23 is a diagram showing a regression tree and a result evaluation information list as a result of performing data analysis by designating histories of all processes.
FIG. 24 is a diagram showing an item setting screen when analyzing apparatus history data.
FIG. 25 is a diagram showing a regression tree and a result evaluation information list as a result of performing data analysis by designating history and time data of all processes.
FIG. 26 is a diagram showing a regression tree and a list of result evaluation information as a result of performing data analysis again with a limited number of steps.
FIG. 27 is a diagram showing an item setting screen when analyzing device history + first candidate time data.
FIG. 28 is a diagram showing a regression tree and a result evaluation information list as a result of re-analyzing the data by limiting the steps.
FIG. 29 is a diagram showing a defect rate trend graph for a certain lot.
FIG. 30 is a diagram showing a decision tree in which an explanatory variable and an objective variable are designated.
FIG. 31 is a diagram showing a decision tree further specifying an explanatory variable and an objective variable.
FIG. 32 is a diagram showing a state in which the result of FIG. 31 is represented by a trend graph for each processing time.
FIG. 33 is a flowchart showing a data analysis procedure according to the third embodiment of the present invention.
FIG. 34 is a table showing an example of the extracted single-device process list REP2.
FIG. 35 is a table showing an example of analysis data 2 (DATA2) used in the third embodiment.
FIG. 36 is a diagram showing a yield for each lot number shown in FIG. 35;
FIG. 37 is a diagram showing a regression tree and a list of result evaluation information when the used device name and the processing time of each step shown in FIG. 35 are used as explanatory variables and the yield value is used as an objective variable.
FIG. 38 is a flowchart showing a procedure of a general data analysis process.
[Explanation of symbols]
1 Input device
2 Central processing unit
3 Output device
4 Storage device
10a-10n process equipment
11 Management server
DB1 Manufacturing information database
30 Data analyzer
31 Data extraction means
32 Data cleansing / characterizing means
33 Data analysis means
34 Analysis result evaluation means
35 Report Report Output Means
40 display items
A1 Comprehensive judgment contents
A2 Statistical information
A3 regression tree diagram
A4 box mustache figure
A5 correlation diagram
DATA1 Analysis data 1
DATA2 Analysis data 2
K1 Numerical items to be used as objective variables
K2 Text item used as an explanatory variable
L1 item list
RF configuration file
RF1 extraction condition setting file
RF2 item setting file
RF3 data cleansing setting file
RF4 characterization setting file
RF5 analysis setting file
RF6 search setting file
RF7 report condition setting file
REP1 Report
REP2 Single device process list
S1 Object variable list
S2 Explanation variable list

Claims

所望するデータ解析に必要なデータをオリジナルデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いてデータの解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。A data extraction step of extracting data necessary for desired data analysis from the original data,
A data cleansing step of data cleansing an abnormal value of the data extracted in the data extraction step,
A characterization step of determining feature information of the data that has been data cleansed by the data cleansing step;
A data analysis step of analyzing data using the feature information obtained in the characterization step,
A data analysis method comprising:

生産工程の品質の変動を示す目的変数と、該目的変数の変動を説明する説明変数とを含むプロセスデータのデータ解析を行うデータ解析方法において、
所望するデータ解析に必要なデータを前記プロセスデータの中から抽出するデータ抽出工程と、
前記データ抽出工程により抽出されたデータの説明変数の異常値をデータクレンジングするデータクレンジング工程と、
前記データクレンジング工程によりデータクレンジングされたデータの目的変数の変動を表す特徴情報を求める特徴化工程と、
前記特徴化工程により求められた特徴情報を用いて前記目的変数の変動要因を探索するためのデータ解析を行うデータ解析工程と、
を含むことを特徴とするデータ解析方法。In a data analysis method for performing data analysis of process data including an objective variable indicating a change in the quality of a production process and an explanatory variable explaining the change in the objective variable,
A data extraction step of extracting data necessary for desired data analysis from the process data,
A data cleansing step of data cleansing an abnormal value of an explanatory variable of the data extracted in the data extracting step,
A characterization step of determining feature information representing a change in an objective variable of the data that has been data cleansed by the data cleansing step;
A data analysis step of performing a data analysis for searching for a variation factor of the objective variable using the feature information obtained by the characterization step,
A data analysis method comprising:

前記データ抽出工程は、前記データの説明変数の項目名にカテゴリを識別するための付加文字列を付加し、
前記データ解析工程は、前記付加文字列に基づき説明変数のカテゴリを識別したデータ解析を行うことを特徴とする請求項２に記載のデータ解析方法。In the data extracting step, an additional character string for identifying a category is added to an item name of an explanatory variable of the data,
3. The data analysis method according to claim 2, wherein the data analysis step performs data analysis in which a category of an explanatory variable is identified based on the additional character string.

前記特徴化工程は、時刻経過による目的変数の変動に関する特徴を求め、前記データの説明変数として用いられる前記生産装置の装置名に対し、前記求められた特徴に対応する所定の記号を付加することを特徴とする請求項２または３に記載のデータ解析方法。In the characterization step, a characteristic relating to a change in the objective variable with the passage of time is obtained, and a predetermined symbol corresponding to the obtained characteristic is added to an apparatus name of the production apparatus used as an explanatory variable of the data. 4. The data analysis method according to claim 2, wherein:

ある一つの生産工程に生産装置が一台のみ設けられる場合、前記データクレンジング工程は、前記一台の生産装置に相当する装置名の説明変数をデータ解析対象から外すとともに、前記生産工程の処理時刻に相当する説明変数の項目名に前記一台の生産装置のみによって構成された工程であることを示す付加文字列を付加し、
前記データ解析工程は、前記データクレンジング工程にて付加文字列が付加された前記一台の生産装置のみによって構成された工程に対しては処理時刻の説明変数を用いたデータ解析を行うことを特徴とする請求項２〜４のいずれか一つに記載のデータ解析方法。When only one production device is provided in one production process, the data cleansing process excludes an explanatory variable of a device name corresponding to the one production device from data analysis targets, and sets a processing time of the production process. An additional character string indicating that the process is configured by only one production device is added to the item name of the explanatory variable corresponding to
In the data analysis step, data analysis using an explanatory variable of a processing time is performed for a step configured by only the one production device to which an additional character string is added in the data cleansing step. The data analysis method according to any one of claims 2 to 4, wherein

前記データクレンジング工程は、前記プロセスデータのうち生産装置に関係するデータ以外のデータを説明変数の項目名に基づいて削除することを特徴とする請求項２〜５のいずれか一つに記載のデータ解析方法。The data according to any one of claims 2 to 5, wherein the data cleansing step deletes data other than data related to a production device in the process data based on an item name of an explanatory variable. analysis method.

前記データクレンジング工程は、前記プロセスデータのうち項目及びレコードに対して所定の異常値の割合を超えた項目及びレコードと、目的変数の値が欠損あるいは異常なレコードとを削除することを特徴とする請求項２〜６のいずれか一つに記載のデータ解析方法。The data cleansing step is characterized in that, in the process data, items and records that exceed a predetermined abnormal value ratio with respect to the items and records, and records in which the value of the objective variable is missing or abnormal are deleted. The data analysis method according to any one of claims 2 to 6.

前記生産工程の全てが生産対象のロットを先入れ先出しにより順次処理する場合、
前記データ解析工程は、全ての生産工程の装置項目と、全ての生産工程における上位数Ｎの候補となる処理時刻を解析対象の説明変数として用いることを特徴とする請求項２〜７のいずれか一つに記載のデータ解析方法。When all of the production processes sequentially process lots to be produced by first-in first-out,
8. The data analysis process according to claim 2, wherein the device items of all the production processes and the processing time which is a candidate of the top N in all the production processes are used as explanatory variables of the analysis target. Data analysis method described in one.

前記生産工程が生産対象のロットを先入れ先出しせずに独立処理する場合、
前記データ解析工程は、各生産工程の処理時刻についてそれぞれが独立した処理時刻であるか否かを判別する独立時刻判別工程と、
前記独立時刻判別工程によって独立していないと判別された生産工程の処理時刻をまとめて一つの代表時刻の項目を作成する代表時刻項目作成工程と、
全ての生産工程の生産装置の項目と、前記代表時刻項目作成工程により作成された代表時刻の項目をデータ解析対象の説明変数として用いることを特徴とする請求項２〜８のいずれか一つに記載のデータ解析方法。When the production process is independently processed without first-in-first-out production lots,
The data analysis step is an independent time determination step of determining whether the processing time of each production process is an independent processing time.
A representative time item creation step of creating one representative time item by summarizing the processing times of the production processes determined to be not independent by the independent time determination step;
The method according to any one of claims 2 to 8, wherein the items of the production devices in all the production processes and the item of the representative time created in the representative time item creation process are used as explanatory variables for data analysis. Data analysis method described.

前記データ解析工程により前記データのロットあるいは処理時刻を変化させて得た複数の解析結果を用いて所定の総合評価値を得る評価工程を含むことを特徴とする請求項２〜９のいずれか一つに記載のデータ解析方法。10. A method according to claim 2, further comprising an evaluation step of obtaining a predetermined comprehensive evaluation value using a plurality of analysis results obtained by changing a lot or processing time of the data in the data analysis step. Data analysis method described in (1).