CN117194963B

CN117194963B - Industrial FDC quality root cause analysis method, device and storage medium

Info

Publication number: CN117194963B
Application number: CN202311443057.4A
Authority: CN
Inventors: 谢箭; 龚雁鹏; 王涛; 郑捷
Original assignee: Hefei Zheta Technology Co ltd
Current assignee: Hefei Zheta Technology Co ltd
Priority date: 2023-11-02
Filing date: 2023-11-02
Publication date: 2024-02-09
Anticipated expiration: 2043-11-02
Also published as: CN117194963A

Abstract

According to the industrial FDC quality root cause analysis method, equipment and storage medium, a set of complete data preprocessing and feature extraction flow is designed, the processing of duplicate removal, noise reduction, alignment and missing values of the completed data is further screened out by using Z test, and the time domain feature values which can represent the information of the sensor-step best are extracted, wherein the time domain feature values comprise maximum values, minimum values and average values; the extracted features and labels are modeled by using PLS algorithm, and PLS is suitable for processing high-dimensional industrial data, and has strong fault tolerance, strong interpretability and great advantage in processing complex multi-dimensional industrial data. By combining with the VIP selection algorithm, the root cause can be accurately positioned, and the sensor-step causing the product failure can be found, so that the production environment state can be adjusted to improve the quality of the semiconductor product. The method can automatically select the variable with stronger correlation with the target variable, avoid the influence of irrelevant variables in data on modeling, and improve the accuracy of the model.

Description

Industrial FDC quality root cause analysis method, device and storage medium

Technical Field

The invention relates to the technical field of intelligent manufacturing and artificial intelligence, in particular to an industrial FDC quality root cause analysis method, equipment and a storage medium.

Background

The processing technology of electronic products such as semiconductors, integrated circuits, panels and the like is complex, and a plurality of processes (step) and a plurality of sensors (sensor) are involved, wherein the sensors monitor detailed parameter information such as temperature, humidity, air pressure, flow and the like in the production and processing process of the products, and the parameters and step information are summarized and remain to be analyzed. The causal relation between the information and the product quality is mined, a step affecting the yield is found, the production environment state is further adjusted, the yield of good products is improved, and the method has important significance for reducing the cost and occupying the market in high and new technology industries such as semiconductors.

The existing data analysis method is mostly based on the observation of curve characteristics, and in addition, a method for solving the difference of defective product parameter values by drawing curve envelope curves is also provided. These methods require labor costs and expert knowledge, and have problems of low efficiency, inaccurate positioning, difficulty in large-scale application, and the like. Particularly, under the condition of high data dimension and more sensors and steps, the problems presented by the traditional data analysis method are more obvious. Therefore, there is an urgent need for an intelligent method that is automated and can be applied on a large scale.

Disclosure of Invention

The invention provides an industrial FDC data quality root cause analysis method for overcoming the defects of the prior art, and solves the problem that the root cause of a fault is difficult to trace in industrial production. Aiming at the problems of high dimension and large noise of industrial data, a set of complete automatic data preprocessing and feature extraction processes are realized, and the sensitive features which can most represent data information are cleaned, integrated and selected. Aiming at the problem that the root cause of the sensor-step fault is difficult to locate, the invention adopts PLS algorithm to build a model, uses VIP algorithm to calculate the importance of characteristic variables, and finds out the sensor-step causing the product fault.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the invention relates to an industrial FDC data quality root cause analysis method, which specifically comprises the following steps:

step 1, preprocessing semiconductor FDC data, reducing weight, denoising, processing missing values and the like;

step 2, deleting irrelevant sensor-step by using a Z test method, wherein the meaning of the sensor-step is a certain sensor parameter value in one step;

step 3, constructing a characteristic project, and extracting three characteristics of the mean value, the maximum value and the minimum value of each sensor parameter of each step;

step 4, modeling and analyzing the extracted features and the corresponding product labels by using a PLS (partial least squares) algorithm;

step 5, analyzing the model training result in the step 4 by using a VIP algorithm to obtain a root cause importance sorting result, and finding out a sensor-step which causes product faults;

in the step 1, the pretreatment of the data is mainly completed, firstly, the missing values are processed, the sensors with the missing values exceeding 20% are deleted, and the rest missing values are filled by adopting the average value; for the problem of partial missing of a processing step related to a product, taking the intersection of the steps; in the actual application scene of industrial diagnosis control, the jump of the sensor parameters at certain points belongs to a normal phenomenon and is not judged to be an abnormal point, so that the 3sigma principle is used for deleting outliers in step in the step; assuming random variablesObeying Gaussian distribution, i.e.)>The probability density function is ∈>Wherein->Is standard deviation (S)>Is the mean value;

3sigma principle: the numerical values are distributed inThe probability of (2) is 0.9974, the interval can be basically +.>The random variable X is considered as a practically possible value interval, and other points can be considered as outliers。

In the step 2, the Z test method is used to delete irrelevant sensor-steps, and the purpose of the step is to remove some steps with insignificant sensor variation amplitude, because the steps are insufficient to prove the influence on the yield result;

the Z test is used to check if the mean of one sample differs significantly from the overall mean, and the detailed steps are as follows:

(1) it is proposed to assume that: let a certain sensor-step overall mean value beThe mean value of each sample is +.>Sample standard deviation of->Sample volume is +.>. Then the assumption can be expressed as:

original hypothesis(the overall mean is equal to a given value)

Alternative hypothesis(the overall mean value is not equal to a given value)

(2) Setting a level of salience；

(3) Calculating statistics: calculating a statistic Z value according to sample data, wherein the calculation formula is as follows:

(4) determining a critical value: degree of freedom based on level of salience and sample dataSearching a Z distribution table to obtain a critical value of a Z value;

(5) judging a reject domain and an accept domain: determining a reject domain and an accept domain according to the double-side inspection requirements;

(6) judging whether to reject the original assumption: if the calculated Z value is in the reject domain, rejecting the original hypothesis, and considering that the sample mean value is obviously different from the overall mean value; otherwise, the original assumption is accepted, the sample mean value is considered to have no significant difference from the overall mean value, and if 95% of the sample mean value has no significant difference from the overall mean value, the sensor-step can be eliminated.

Step 3, constructing feature engineering, extracting feature values of a sensor-step of each semiconductor wafer, including a mean value, a maximum value and a minimum value, wherein the sensor-step and the feature values form an input feature of one dimension, and calculating three features of all sensor-steps of all parameters to form an input feature matrix；

In the step 4, the PLS algorithm is used for the feature matrixAnd the corresponding product label->Modeling and analyzing; the algorithm flow is as follows:

(1) first of all it is necessary to，/>The method is standardized, the influence of dimensional difference and weight difference among features is eliminated, so that a model is more stable and accurate, and a standardized matrix is +.>And->；

(2) Solving for independent variablesAnd dependent variable->Is->、/>The method comprises the steps of carrying out a first treatment on the surface of the Build->、/>And a first principal component->、/>And calculates the residual matrix +.>、/>；

(3) By using、/>Replace->、/>Form new independent variable, dependent variable, solve +.>、/>Is->、/>Namely the original independent variable and the dependent variable +.>、/>Is a second major component of (a);

(4) establishing new argumentsDependent variable->And a second main component->、/>And calculate the residual matrix，/>；

(5) Repeating the steps (3) and (4) until the main component meeting the condition is obtained;

(6) cross checking, namely determining the number of main components meeting the condition;

(7) and establishing a regression equation and calculating a regression coefficient.

Let the number of independent variables beHerein referred to asThe number of sensor-step-features is 1, the dependent variable is a product label, and the number of samples is +.>Refers to the wafer number of product semiconductor wafers; the idea of partial least squares is to solve the principal components while ensuring that the correlation between the independent and dependent variables is maximized. I.e. solving the principal component of X->And Y as a main component->，/>And->The following requirements need to be met:

(1)and->Carry +.>，/>Information of (2);

(2)and->Can reach the maximum degree of correlation;

simultaneously meets the two requirements, and can obtain the following objective functions:

the optimization objective function can calculate the principal components meeting the requirements and regression coefficients in a model, and the number of principal components to be extracted can be specified in the model。

In the step 5, an important variable is selected by using a VIP (VariableImportanceinProjection) algorithm, and a calculation formula of the VIP algorithm is as follows:

the explanation for this formula is as follows:

: corresponding to->VIP value of individual sensor-step-feature

: predicting a total number of variables;

: total number of PLS principal components;

matrix: />Each of the PLS components ∈>All correspond to a set of coefficients->Will->Conversion into component scores, coefficient matrix writing +.>The size is +.>；/>；/>；

Matrix: />Of the samples, each sample calculates +.>Score of each component, score matrix is recorded as +.>The size is，/>Represents->Sample No. H>A score list of individual components; wherein->Is->Transposed matrix of->Dot product of matrix, < >>Is a label.

In yet another aspect, the invention also discloses a computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.

In yet another aspect, the invention also discloses a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method as above.

According to the technical scheme, the industrial FDC data quality root cause analysis method is provided. Firstly, designing a complete data preprocessing and feature extraction flow, performing processing of de-duplication, noise reduction, alignment and missing value of the completed data, further screening out a sensor-step with little influence on a yield result by using Z test, and extracting a time domain feature value which can most represent the information of the sensor-step and comprises a maximum value, a minimum value and a mean value; the extracted features and labels are modeled by using PLS algorithm, and PLS is suitable for processing high-dimensional industrial data, and has strong fault tolerance, strong interpretability and great advantage in processing complex multi-dimensional industrial data. By combining with the VIP selection algorithm, the root cause can be accurately positioned, and the sensor-step causing the product failure can be found, so that the production environment state can be adjusted to improve the quality of the semiconductor product.

The invention relates to an industrial FDC (fault detection control) quality root cause analysis method, which is used for tracing industrial production data of semiconductors, integrated circuits and the like, mining sensor parameters and corresponding production steps (hereinafter referred to as steps) which cause product faults so as to improve the product quality. Firstly, preprocessing data, deleting outliers in steps, taking step intersections of all wafers as step sets to be analyzed, and screening out steps which do not need to be analyzed by using Z test, wherein the steps contained in different wafers are different due to incomplete data histories; then constructing a characteristic engineering, and calculating the average value, the maximum value and the minimum value of each step of each sensor as characteristics; modeling and analyzing the extracted features by using PLS (partial least squares) algorithm; finally, VIP (VariableImportanceinProjection) algorithm is used to select the sensor and step that have the greatest impact on the product quality results.

In particular, the invention has the advantages compared with the prior art that:

the data preprocessing work in the early stage of the invention completes the data cleaning and integration work, so that the complicated and chaotic industrial data are neat and clear, the data are further screened by adopting a Z test method, and the Z test is a common hypothesis test method, and has wide use and easy understanding and operation. When the sample size is large, the reliability of the Z test is very high, whether the difference between the two samples is obvious or not can be accurately tested, and a considerable part of sensor-step is screened out by using the Z test, so that the subsequent PLS-VIP algorithm can more accurately locate abnormal parameters and corresponding step. The representative sensitive time domain features of the sensor-step are extracted, and the average value, the maximum value and the minimum value are the average value, the maximum value and the minimum value, so that the model calculation efficiency is higher, and the tracing root cause is more accurate.

In addition, the PLS algorithm is suitable for processing high-dimensional data, can avoid overfitting even if the feature space is larger than the number of samples, has strong fault tolerance capability, and can effectively model and predict even if missing values exist in data. The PLS algorithm model has strong interpretability, and can be visualized and interpreted in a coefficient matrix, a contribution graph and other modes, so that a user is helped to understand the inherent structure of data and the model prediction process. The method can automatically select the variable with stronger correlation with the target variable, avoid influence of irrelevant variables in data on modeling, and improve the accuracy of the model. The VIP selection algorithm can select a plurality of variables at the same time, does not need an additional cross-validation process, greatly simplifies the complexity of model establishment, does not need to determine a threshold value in advance, and can be flexibly applied to different data sets and models. The VIP selection algorithm can be applied to data analysis in different fields such as multiple regression, classification, clustering and the like, and has wide application value.

Drawings

FIG. 1 is an overall flow chart of a method implementation of the present invention;

FIG. 2 is a flow chart of preprocessing and feature extraction in the present invention;

fig. 3 is a modeling process of PLS algorithm in the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.

In this example, the method of the present invention is used for semiconductor production quality root cause analysis.

Root cause analysis is performed on semiconductor FDC data, the root cause of the semiconductor poor yield is mined, specific production environment states (sensor parameter values) and processing steps (step) are traced back, and the main implementation flow is shown in FIG. 1. The detailed implementation steps are as follows:

in the step 1, the pretreatment of data is mainly completed, firstly, the missing values are processed, the sensors with the missing values more than 20% are deleted, and the rest missing values are filled by adopting the average value; for the problem of partial missing of a processing step related to a product, taking the intersection of the steps; in the actual application scene of industrial diagnosis control, the jump of the sensor parameters at certain points belongs to a normal phenomenon and is not judged to be an abnormal point, so that the 3sigma principle is used for deleting outliers in step in the step; assuming random variablesObeying Gaussian distribution, i.e.)>The probability density function is ∈>Wherein->Is standard deviation (S)>Is the mean value;

3sigma principle: the numerical values are distributed inThe probability of (2) is 0.9974, the interval can be basically +.>The random variable X is considered as a practically possible value interval, and other points can be considered as outliers.

(the overall mean is equal to a given value)

(the overall mean value is not equal to a given value）

(2) Setting a level of salience；

As shown in fig. 2, step 3 is to construct a feature engineering, extract the feature values of the sensor-step of each semiconductor wafer, including the mean, the maximum and the minimum, the sensor-step and the feature values form an input feature of one dimension, calculate three features of all the sensor-steps of all the parameters, and form an input feature matrix；

As shown in fig. 3, in the step 4, the PLS algorithm is used to perform the feature matrixAnd the corresponding product label->Modeling and analyzing;the algorithm flow is as follows:

Let the number of independent variables beHere, the number of sensor-step-features is 1, the number of dependent variables is 1, the product label is the number of samples is +.>The number of products wafer; the idea of partial least squares is to solve the principal components while ensuring that the correlation between the independent and dependent variables is maximized. I.e. solving the principal component of X->And Y as a main component->，/>And->The following requirements need to be met:

(1)and->Carry +.>，/>Information of (2);

(2)and->Can reach the maximum degree of correlation;

the explanation for this formula is as follows:

: corresponding to->VIP value of individual sensor-step-feature

: predicting a total number of variables;

: total number of PLS Main Components；

In summary, the embodiment of the invention designs a complete data preprocessing and feature extraction flow, processes the duplicate removal, noise reduction, alignment and missing value of the completed data, uses Z test to further screen out the sensor-step with little influence on the yield result, and extracts the time domain feature value which can most represent the information of the sensor-step, including the maximum value, the minimum value and the average value; the extracted features and labels are modeled by using PLS algorithm, and PLS is suitable for processing high-dimensional industrial data, and has strong fault tolerance, strong interpretability and great advantage in processing complex multi-dimensional industrial data. By combining with the VIP selection algorithm, the root cause can be accurately positioned, and the sensor-step causing the product failure can be found, so that the production environment state can be adjusted to improve the quality of the semiconductor product.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the industrial FDC quality root cause analysis method of any of the above embodiments.

It may be understood that the system provided by the embodiment of the present invention corresponds to the method provided by the embodiment of the present invention, and explanation, examples and beneficial effects of the related content may refer to corresponding parts in the above method.

The embodiment of the application also provides an electronic device, which comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus,

a memory for storing a computer program;

and the processor is used for realizing the industrial FDC quality root cause analysis method when executing the program stored in the memory.

The communication bus mentioned in the above electronic device may be a Peripheral Component Interconnect (PCI) bus or an Extended Industry Standard Architecture (EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like.

The communication interface is used for communication between the electronic device and other devices.

The memory may include a Random Access Memory (RAM) or a Non-volatile memory (NVM), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), etc.; but also digital signal processors (DSP for short), application specific integrated circuits (ASIC for short), field-programmable gate arrays (FPGA for short), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk SolidStateDisk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. An industrial FDC quality root cause analysis method is characterized by comprising the following steps,

step 1, acquiring semiconductor FDC data and preprocessing the data;

step 2, deleting irrelevant sensor-step based on the processed data by using a Z test method, wherein the meaning of the sensor-step is a certain sensor parameter value in one step;

step 3, constructing a feature engineering based on the step 2, and extracting three features of the mean value, the maximum value and the minimum value of each sensor parameter of each step;

step 4, modeling and analyzing the extracted features and the corresponding product labels by using a partial least square algorithm;

wherein the step 4 specifically includes using a partial least squares algorithm to perform a feature matrixAnd the corresponding product label->Modeling analysis, the algorithm flow is as follows:

s41, firstly, the characteristic matrix is formedCorresponding product tag->Normalized, normalized matrix is +.>And->；

S42, solving independent variablesAnd dependent variable->Is->、/>The method comprises the steps of carrying out a first treatment on the surface of the Establishing an argument->And dependent variable->First main component、/>And calculates the residual matrix +.>、/>；

S43, use、/>Replace->、/>Form new independent variable, dependent variable, solve +.>、/>Is->、/>Namely, the original independent variable and the original factorVariable->、/>Is a second major component of (a);

s44, establishing new independent variablesDependent variable->And a second main component->、/>And calculates the residual matrix +.>，；

S45, repeating the steps S43 and S44 until the main component meeting the condition is obtained;

s46, cross checking, and determining the number of main components meeting the condition;

s47, establishing a regression equation, and calculating a regression coefficient as follows:

let the number of independent variables beHere, the number of sensor-step-features is 1, the number of dependent variables is 1, the product label is the number of samples is +.>The number of products wafer; solving for the principal component of X->And Y as a main component->，/>Andthe following requirements need to be met:

(1)and->Carry +.>，/>Information of (2);

(2)and->Can reach the maximum degree of correlation;

simultaneously meets the two requirements, and obtains the following objective functions:

optimizing an objective function to obtain main components meeting requirements and regression coefficients in a model, and specifying the number of the main components to be extracted in the model;

in the step 5, an important variable is selected by using a VIP algorithm, and the calculation formula of the VIP algorithm is as follows:

the explanation for this formula is as follows:

: corresponding to->VIP values for individual sensor-step-features;

: predicting a total number of variables;

: total number of PLS principal components;

Matrix: />Of the samples, each sample calculates +.>Score of each component, score matrix is recorded as +.>The size is +.>，Represents->Sample No. H>A score list of individual components; wherein->Is->Transposed matrix of->Dot product of matrix, < >>Is a label.

2. The industrial FDC quality root cause analysis method according to claim 1, characterized in that: the preprocessing step in the step 1 comprises the steps of firstly processing missing values, deleting sensors with the missing values exceeding 20%, and filling the rest missing values by adopting an average value;

for the problem of partial missing of a processing step related to a product, taking the intersection of the steps;

deleting outliers in step by using a 3sigma principle; assuming random variablesObeying Gaussian distribution, i.e.)>The probability density function is ∈>Wherein->Is standard deviation (S)>Is the mean value;

3sigma principle: the numerical values are distributed inThe probability of the interval is 0.9974, the interval is divided intoThe random variable X is considered as a practically possible value interval, and other points are considered as outliers.

3. The industrial FDC quality root cause analysis method according to claim 2, characterized in that: in the step 2, the Z test method is used for deleting irrelevant sensor-step, and the specific steps are as follows:

s21, assume: let a certain sensor-step overall mean value beThe mean value of each sample is +.>Sample standard deviation of->Sample volume is +.>；

Then assume that it is expressed as:

original hypothesisIndicating that the overall average value is equal to a given value;

alternative hypothesisIndicating that the overall mean value is not equal to a given value;

s22, setting significance level；

S23, calculating statistics: calculating a statistic Z value according to sample data, wherein the calculation formula is as follows:

s24, determining a critical value: degree of freedom based on level of salience and sample dataThe Z-distribution table is looked up,obtaining a critical value of the Z value;

s25, judging a reject domain and an accept domain: determining a reject domain and an accept domain according to the double-side inspection requirements;

s26, judging whether to reject the original assumption: if the calculated Z value is in the reject domain, rejecting the original hypothesis, and considering that the sample mean value is obviously different from the overall mean value; otherwise, the original assumption is accepted, the sample mean value is considered to have no significant difference from the overall mean value, and if 95% of the sample mean value has no significant difference from the overall mean value, the sensor-step is rejected.

4. The industrial FDC quality root cause analysis method according to claim 3, characterized in that: the step 3 specifically includes constructing feature engineering, extracting feature values of a sensor-step of each semiconductor wafer, including an average value, a maximum value and a minimum value, wherein the sensor-step and the feature values form an input feature of one dimension, and calculating three features of all sensor-steps of all parameters to form an input feature matrix。

5. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 4.

6. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 4.