CN110647117A - Chemical process fault identification method and system - Google Patents
Chemical process fault identification method and system Download PDFInfo
- Publication number
- CN110647117A CN110647117A CN201910844132.5A CN201910844132A CN110647117A CN 110647117 A CN110647117 A CN 110647117A CN 201910844132 A CN201910844132 A CN 201910844132A CN 110647117 A CN110647117 A CN 110647117A
- Authority
- CN
- China
- Prior art keywords
- data
- principal component
- variable
- chemical process
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 111
- 238000001311 chemical methods and process Methods 0.000 title claims abstract description 43
- 238000012706 support-vector machine Methods 0.000 claims abstract description 31
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 16
- 239000011159 matrix material Substances 0.000 claims description 19
- 230000014509 gene expression Effects 0.000 claims description 17
- 238000007781 pre-processing Methods 0.000 claims description 14
- 238000012824 chemical production Methods 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012935 Averaging Methods 0.000 claims description 5
- 238000013075 data extraction Methods 0.000 claims description 3
- 238000003860 storage Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 2
- 239000000126 substance Substances 0.000 abstract description 7
- 230000002159 abnormal effect Effects 0.000 abstract description 6
- 238000004458 analytical method Methods 0.000 abstract description 2
- 238000011161 development Methods 0.000 abstract description 2
- 239000000498 cooling water Substances 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000000513 principal component analysis Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 238000013145 classification model Methods 0.000 description 2
- 239000002131 composite material Substances 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000002759 z-score normalization Methods 0.000 description 2
- 101100289061 Drosophila melanogaster lili gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B19/00—Programme-control systems
- G05B19/02—Programme-control systems electric
- G05B19/418—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
- G05B19/41885—Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B2219/00—Program-control systems
- G05B2219/30—Nc systems
- G05B2219/32—Operator till task planning
- G05B2219/32339—Object oriented modeling, design, analysis, implementation, simulation language
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/02—Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]
Landscapes
- Engineering & Computer Science (AREA)
- Manufacturing & Machinery (AREA)
- General Engineering & Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Automation & Control Theory (AREA)
- Testing And Monitoring For Control Systems (AREA)
- General Factory Administration (AREA)
Abstract
The method and the system are applied to the field of chemical fault recognition with expensive labels, a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM for short) is used for recognizing the operation state of the chemical process, a principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of traditional supervised learning on the number of label data is made up, and the recognition accuracy of the semi-supervised learning is improved. The main component analysis method is adopted to eliminate noise and redundant data of the chemical process, abnormal working condition fault identification is carried out by combining historical information and future information, unmarked data with high entropy is effectively selected and marked, the performance of the identification model is improved by fully utilizing the unmarked data, efficient and complete fault identification work of the chemical process is realized, the identification accuracy is higher, and the identification speed is higher, so that the development of chemical safety is promoted.
Description
Technical Field
The disclosure relates to the technical field related to chemical process fault identification, in particular to a chemical process fault identification method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
According to the statistical analysis of accidents in chemical enterprises, a plurality of tiny exceptions are inevitably generated before any major accident occurs. Therefore, the method carries out fault identification research aiming at the chemical process, finds out potential abnormal conditions in time, and has important theoretical and practical significance for keeping the safe and stable operation of the chemical device.
The inventor finds that the existing process fault identification method mainly comprises the following steps: qualitative models, quantitative models, and data-driven methods. In all data-driven fault identification methods, the supervised learning technology shows a good identification result for fault identification in the chemical process, and the identification precision reaches more than 92%. However, the number of label data in the actual chemical process often does not meet the requirement of supervised learning, labels are added to the non-label data generally through manpower according to experience, and the cost of marking a large amount of easily collected non-label chemical data is expensive.
Semi-supervised learning has been currently applied in a number of fields, such as digital recognition, emotion classification, medical image classification, and so on. In some researches, the requirement of the traditional supervised learning on the number of label data is higher, and the existing semi-supervised learning method shows worse performance than the supervised learning under the condition of the same number of label data. Therefore, applying semi-supervised learning to chemical process fault identification is a topic of little research.
Disclosure of Invention
In order to solve the problems, the invention provides a chemical process fault identification method and a chemical process fault identification system, which are applied to the field of chemical fault identification with expensive labels, and the method combines a principal component analysis method and a dynamic active safety semi-supervised support vector machine, and adopts a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM model for short) to identify the operating state of the chemical process, so that the requirement of the traditional supervised learning on the number of label data is met, and the identification precision of the semi-supervised learning is improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
one or more embodiments provide a chemical process fault identification method, including the steps of:
acquiring operation data in a chemical production process in real time;
preprocessing the acquired running data;
selecting key characteristic data in the operating data by adopting a principal component analysis method;
and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
Further, the key characteristic data comprises tag data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding a tag to the label-free data by adopting an active learning method.
Further, the method for selecting key feature data in the operating data by adopting the principal component analysis method comprises the following steps:
calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;
establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;
obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;
normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;
and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.
Further, calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.
Or
Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
Further, the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
preprocessing the acquired historical data;
selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.
Further, the step of adding a label to the non-label data by adopting an active learning method comprises the following steps:
optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;
and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
Further, the method for optimizing the confidence of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:
classifying the historical data according to faults to obtain k classes corresponding to the k faults;
and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.
A chemical process fault identification system, comprising:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.
An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above method.
Compared with the prior art, the beneficial effect of this disclosure is:
according to the method, the principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of the traditional supervised learning on the number of the label data is made up, and the identification precision of the semi-supervised learning is improved. The method can eliminate noise and redundant data in the chemical process, combines historical information and future information to identify abnormal working condition faults, effectively selects and marks unmarked data with high entropy, establishes a graphical scenario object model based on a knowledge body, and determines the label of the unmarked data according to the established graphical scenario object model and expert knowledge. Make full use of no label data promotes the identification model performance, has realized high-efficient and complete the work of carrying out chemical process fault identification, and the identification accuracy is higher, and the recognition rate is of value to the development that promotes chemical industry safety sooner.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.
FIG. 1 is a flow diagram of a chemical process fault identification method in accordance with one or more embodiments;
FIG. 2 is a principal component variance percentage of TE process fault 4 during training of embodiment 1 of the present disclosure;
FIG. 3 is a principal component eigenvalue of TE process fault 4 during training in embodiment 1 of the present disclosure;
FIG. 4 is a key measured variable weight of TE process fault 4 during training of embodiment 1 of the present disclosure;
FIG. 5 is a graph of key measured variables determined in 20 TE faults in the training process of embodiment 1 of the present disclosure;
FIG. 6 is a comparison of PCA-DAS4VM accuracy at different unlabeled data volumes;
FIG. 7 is a TE process graphical scenario object model based on ontology;
FIG. 8 is a graphical scenario object model of TE process fault 4;
FIG. 9 is a comparison of F1 scores for the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 10 is a comparison of the FPRs of the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 11 is a FDR comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 12 is a G-mean comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 13 is a comparison of the accuracy of the DSSAE, ALSemiFDA and PCA-DAS4VM models.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.
Example 1
In the technical solution disclosed in one or more embodiments, as shown in fig. 1, a method for identifying a fault in a chemical process includes the following steps:
and 4, establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
The operation data obtained in the step 1 in the chemical production process comprises the flow of each material in the chemical production, control parameter data in each device and the like. The following table 1 is the operating data parameters for one production example listed in this example.
TABLE 1
The preprocessing of the data in the step 2 includes performing Z-score normalization on the acquired operational data by subtracting the mean value and comparing the standard deviation with the mean value, as follows:
wherein, x in the equationqgIs the process data; μ is the mean of the data matrix; σ is the standard deviation of the data matrix; z is the value of the data normalization.
The pretreatment converts multiple groups of data into unitless Z-Score values, so that the data standard is normalized, the data comparability is improved, and the data interpretability is weakened.
The step 3 of selecting key characteristic data in the operation data by adopting a principal component analysis method comprises the following steps:
(31) calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; and sequencing the variance contribution rates from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable. The proportional threshold can be set to 80% or more in this embodiment.
The calculation formula of the feature variance and the feature value may be as follows:
|C-λE|=0 (3)
c in the equation is a feature covariance matrix obtained by calculation; m is the dimension of the data matrix; λ is the eigenvalue of the data matrix; e is an identity matrix and Z is a data matrix.
(32) Establishing principal component linear expressions according to the principal component variables, calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values, and calculating through the following formula:
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the value of the d-th principal component variable (abbreviated as principal element) of the variable q; e is the feature root of the d-th pivot.
(33) Obtaining a comprehensive score model according to coefficients of principal component variables in the principal component linear expression, wherein the comprehensive score model comprises variables and variable coefficients, and the variable coefficients in the comprehensive score model have the following calculation formula:
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
(34) And normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight.
(35) And sorting the weight values of all fault variables, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data. The set threshold may be set according to specific conditions, and may be 80% or more.
The key characteristic data comprises tag data and non-tag data, and whether a fault or a fault hidden danger exists in the current chemical process can be obtained by taking the key characteristic data as the input of the dynamic active safety semi-supervised support vector machine model.
In the step 4, the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
step 4-1, acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
step 4-2, preprocessing the acquired historical data;
4-3, selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
and 4-4, adding a label to the non-label data by adopting an active learning method, inputting the labeled data and the labeled data as input into the dynamic active safety semi-supervised support vector machine model for training, and outputting the fault type or normal operation to obtain the parameters of the dynamic active safety semi-supervised support vector machine model.
In the step 4-4, the method for adding the label to the non-label data by adopting the active learning method comprises the following steps:
and 441, optimizing the confidence coefficient of the pseudo label of the identification model by adopting an averaging method by combining historical information and future information of the chemical process data.
Wherein the value of the current time is Pj,kThe historical information is the previous value (P)j-1,k) The future information is the value (P) at the next momentj+1,k)。
The optimization method described in this embodiment is averaging and is used to filter outliers. If the normal state is 5, the value at the previous time is 4.8, the value at the next time is 5.1, and the value at the current time is 8, the average value of the three values is 5.97, which is the value at the current time.
And 442, calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
The knowledge inference method based on the knowledge ontology is adopted to add labels to the label-free data, so that the cost of adding the labels is reduced, and the labor waste is reduced.
Ontology is a displayed description of the concepts of a domain and their relationships. For example, in this embodiment, a scenario object model (as shown in fig. 7) of the TE process is established by applying a knowledge ontology to express implicit information in the process, that is, information obtained from a brainstorming storm is converted into a logical relationship expressed in a graphic form. The ontology converts technical experience into a graphical relationship.
Each step in the training process is described in detail below:
in the training process of the embodiment, a data set of a Tennessee-Eastman process (TE process for short) is selected, and other data sets or historical data in the chemical production process can be selected or directly obtained for different chemical production processes.
Taking the fault 4 in the TE process as a case, firstly, Z-score normalization is performed on the data in the fault 4, the variable dimension is unified, and the specific method of preprocessing may be the same as that in step 1. The operating parameters of the set model are shown in table 2:
TABLE 2
And 4-3, selecting key characteristic data in the historical operating data by adopting a principal component analysis method, wherein the method can be the same as the method in the step 3.
(4-31), calculating the characteristic covariance matrix of the preprocessed data matrix, and the eigenvalue and the eigenvector of the covariance matrix; and sequencing the variance contribution rates from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable. The proportional threshold can be set to 80% or more in this embodiment. The results of the calculation of the data of fault 4 in TE process show that the sum of the variance contribution rates of the first 12 principal components is 83.15%, and the information of the first 12 principal components can represent all the variable information, such as variance percentage of the first 12 principal components of fault 4 in TE process in fig. 2, and feature value of the first 12 principal components of fault 4 in TE process in fig. 3. Each principal component corresponds to a plurality of measured variables, and the following steps are used to determine the weights of the key variables corresponding to the principal components.
(4-32) determining coefficients of the variables in each pivot linear expression, the method may be the same as step 32.
(4-33) the method of determining the coefficient of variation in the composite score model may be the same as in step 33.
(4-34) normalizing the variable coefficients in the obtained composite score model to re-determine the variable weights, which may be the same as step 34. Fig. 4 is the key measured variable weight for fault 4 in the TE process.
And (4-35) sorting the weight values of all fault variables, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data. The set threshold may be set according to specific conditions, and may be 80% or more. In this embodiment, the variable weight values in 20 faults of the tennessee-iseman process are ranked, the variables with the weight sum higher than 80% are used as input data, and the result of fig. 4 shows that the weight sum of the first 13 variables of the fault 4 in the TE process is 80.06%, so all the variables are represented by the first 13 variables. Similarly, with the sum of weights above 80%, key variables for other failures of the TE process can be determined, with the results shown in figure 5.
In step 441, the method for optimizing the confidence of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically includes:
441-1, classifying the historical data according to faults to obtain k classes corresponding to k faults; specific categories are shown in table 3, for example.
TABLE 3
442-2 calculating the confidence coefficient of each data belonging to each category K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient, wherein the calculation formula is as follows:
wherein, Pj,kIs the confidence that the jth data belongs to class k;
in step 442, entropy of the key feature data is calculated according to the confidence of the pseudo tag, active learning is adopted to select the key feature data with a high entropy, and a method for adding a data tag to the key feature data based on a knowledge ontology specifically includes:
1) and establishing a plot object model based on the knowledge ontology based on the corresponding relation between the process mechanism and the events. Any one of the circles in fig. 7 represents an event.
2) And selecting the key characteristic data with high entropy value by adopting active learning. The value range of the entropy value is [0,1], the method for determining the high-entropy value can be realized by setting a threshold value, the threshold value can be set to be 0.8, the amount of information carried by the key characteristic data of the high-entropy value is large, and the fault reason is obtained through reverse reasoning according to a scenario object model such as the scenario object model established based on the knowledge body in fig. 7, so that the label is determined.
The formula for calculating the entropy of the key feature data is as follows:
wherein, Pj,kIs the confidence that the jth data belongs to class k; ent (ent)jIs the entropy value of the jth data; k is the number of categories (K ═ 1,2, …, K); a is a stopping criterion for active learning.
The plot object model based on the knowledge ontology is established by production experience, and a TE process is divided into five parts by taking a fault 4 as an example: reactor, condenser, product separator, recycle compressor and stripper, reactor temperature is the first warning variable for fault 4. Since the reaction is exothermic, there are three direct reasons that may affect the reactor temperature: abnormal circulation flow, abnormal reactor feed flow rate, and abnormal reactor cooling water temperature. This failure results in 4 main consequences: high reactor cooling water outlet temperature, high condenser cooling water outlet temperature, high product separator temperature, and high reactor pressure. Analysis shows that the circulating flow and the reactor feeding flow rate have no deviation, so that the temperature abnormality of the cooling water of the reactor is most likely to cause the temperature abnormality of the reactor. Fig. 7 is a TE process graphical scenario object model based on ontology, and fig. 8 is a TE process fault 4 graphical scenario object model.
A simulation experiment is performed to illustrate the identification effect of the dynamic active safety semi-supervised support vector machine model (abbreviated as PCA-DAS4VM model) established in this embodiment to identify the operation state of the chemical process, which is specifically as follows.
The PCA-DAS4VM model is: the dynamic active safety semi-supervised support vector machine model of the present embodiment.
The DAS4VM model is: the dynamic active safety semi-supervised support vector machine model carries out fault identification on the measured variables by using the DAS4VM model without selecting key variables of a Principal Component Analysis (PCA).
The PCA-S4VM model is: and selecting PCA key variables, and carrying out safe semi-supervised support vector machine model.
The DSSAE model was: a model described in "Jiang L, Ge Z, Song Z.semi-super fault classification based on dynamic spark Stack auto-encoders model [ J ]. Chemimetry and Intelligent Laboratory Systems,2017: S0169743917302496" is cited.
The ALSemiFDA model is: a model described in "Lili yin, huangguang w, etc. incorporated active leaving to semi-assisted fault classification [ J ], Journal of Process control78(2019) 88-97" is cited.
Transverse comparison: fig. 6 illustrates that the more the unlabeled data volume is, the better the proposed PCA-DAS4VM model of the present embodiment is for fault identification.
Longitudinal comparison: the model of the present embodiment is compared with other models. And calculating F1 scores, False Positive Rate (FPR), Fault Diagnosis Rate (FDR), G-mean and accuracy comparison curves of different comparison models according to the identification effect. As shown in FIGS. 9-12, G-mean is another system performance evaluation criterion of uniform accuracy and recall, defined as the geometric mean of accuracy and recall. The higher the F1 score, the better the FPR, the higher the FDR, the better the G-mean, the better the Accuracy. These are only some machine learning indexes, and the F1 score considers both the accuracy rate of the classification model and the fault diagnosis, and can be regarded as a harmonic average rate of the model accuracy rate and the recall rate. The accuracy of the classification model refers to the proportion of samples that are identified as positive classes that are truly positive classes. Accuracy is the total weight of all predictions correct (positive class negative class).
FIG. 13 is a comparison of the accuracy of the DSSAE, ALSemiFDA and PCA-DAS4VM models, illustrating the higher accuracy of the models of this embodiment.
Example 2
The embodiment provides a chemical process fault identification system, includes:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
Example 3
The present embodiment provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, implement the steps of the method in embodiment 1.
Example 4
The present embodiment provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of embodiment 1.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.
Claims (10)
1. A chemical process fault identification method is characterized by comprising the following steps:
acquiring operation data in a chemical production process in real time;
preprocessing the acquired running data;
selecting key characteristic data in the operating data by adopting a principal component analysis method;
and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
2. A chemical process fault identification method as claimed in claim 1, wherein: the key characteristic data comprises label data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding labels to the label-free data by adopting an active learning method.
3. A chemical process fault identification method as claimed in claim 1, wherein: the method for selecting the key characteristic data in the operating data by adopting the principal component analysis method comprises the following steps:
calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;
establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;
obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;
normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;
and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.
4. A chemical process fault identification method as claimed in claim 3, wherein:
calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.
Or
Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
5. A chemical process fault identification method as claimed in claim 1, wherein: the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
preprocessing the acquired historical data;
selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.
6. A chemical process fault identification method as claimed in claim 2 or 5, wherein: the method for adding the label to the label-free data by adopting the active learning method comprises the following steps:
optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;
and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
7. A chemical process fault identification method as claimed in claim 6, wherein: the method for optimizing the confidence coefficient of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:
classifying the historical data according to faults to obtain k classes corresponding to the k faults;
and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.
8. A chemical process fault identification system is characterized by comprising:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910844132.5A CN110647117B (en) | 2019-09-06 | 2019-09-06 | Chemical process fault identification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910844132.5A CN110647117B (en) | 2019-09-06 | 2019-09-06 | Chemical process fault identification method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110647117A true CN110647117A (en) | 2020-01-03 |
CN110647117B CN110647117B (en) | 2020-12-18 |
Family
ID=69010277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910844132.5A Active CN110647117B (en) | 2019-09-06 | 2019-09-06 | Chemical process fault identification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110647117B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723650A (en) * | 2020-05-25 | 2021-11-30 | 中国石油化工股份有限公司 | Chemical process abnormity monitoring system based on semi-supervised model and model optimization device |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4729383A (en) * | 1984-12-07 | 1988-03-08 | Susi Roger E | Method and apparatus for automatically determining blood pressure measurements |
CN102830624A (en) * | 2012-09-10 | 2012-12-19 | 浙江大学 | Semi-supervised monitoring method of production process of polypropylene based on self-learning statistic analysis |
US20150209452A1 (en) * | 2014-01-27 | 2015-07-30 | Washington University | Metal-binding bifunctional compounds as diagnostic agents for alzheimer's disease |
CN106448096A (en) * | 2016-11-24 | 2017-02-22 | 青岛科技大学 | Alarm threshold value optimization method based on dimension compression and normal transformation |
CN106843195A (en) * | 2017-01-25 | 2017-06-13 | 浙江大学 | Based on the Fault Classification that the integrated semi-supervised Fei Sheer of self adaptation differentiates |
CN107423156A (en) * | 2017-07-29 | 2017-12-01 | 合肥千奴信息科技有限公司 | Fault pre-alarming algorithm based on taxonomic clustering |
CN109522973A (en) * | 2019-01-17 | 2019-03-26 | 云南大学 | Medical big data classification method and system based on production confrontation network and semi-supervised learning |
CN109800799A (en) * | 2018-12-31 | 2019-05-24 | 华南理工大学 | A kind of online Active Learning Method suitable for no label unbalanced data stream |
-
2019
- 2019-09-06 CN CN201910844132.5A patent/CN110647117B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4729383A (en) * | 1984-12-07 | 1988-03-08 | Susi Roger E | Method and apparatus for automatically determining blood pressure measurements |
CN102830624A (en) * | 2012-09-10 | 2012-12-19 | 浙江大学 | Semi-supervised monitoring method of production process of polypropylene based on self-learning statistic analysis |
US20150209452A1 (en) * | 2014-01-27 | 2015-07-30 | Washington University | Metal-binding bifunctional compounds as diagnostic agents for alzheimer's disease |
CN106448096A (en) * | 2016-11-24 | 2017-02-22 | 青岛科技大学 | Alarm threshold value optimization method based on dimension compression and normal transformation |
CN106843195A (en) * | 2017-01-25 | 2017-06-13 | 浙江大学 | Based on the Fault Classification that the integrated semi-supervised Fei Sheer of self adaptation differentiates |
CN107423156A (en) * | 2017-07-29 | 2017-12-01 | 合肥千奴信息科技有限公司 | Fault pre-alarming algorithm based on taxonomic clustering |
CN109800799A (en) * | 2018-12-31 | 2019-05-24 | 华南理工大学 | A kind of online Active Learning Method suitable for no label unbalanced data stream |
CN109522973A (en) * | 2019-01-17 | 2019-03-26 | 云南大学 | Medical big data classification method and system based on production confrontation network and semi-supervised learning |
Non-Patent Citations (3)
Title |
---|
JONG-MIN KIM: "《A Study of Face Recognition using the PCA and Error Back-Propagation》", 《2010 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS》 * |
朱美琳: "《半监督支持向量机的多分类学习算法》", 《郑州大学学报(理学版)》 * |
李锋: "《基于 Laplacian 双生最小二乘支持向量机的》", 《2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI)》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113723650A (en) * | 2020-05-25 | 2021-11-30 | 中国石油化工股份有限公司 | Chemical process abnormity monitoring system based on semi-supervised model and model optimization device |
Also Published As
Publication number | Publication date |
---|---|
CN110647117B (en) | 2020-12-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113723632A (en) | Industrial equipment fault diagnosis method based on knowledge graph | |
US20230213895A1 (en) | Method for Predicting Benchmark Value of Unit Equipment Based on XGBoost Algorithm and System thereof | |
EP3552013A1 (en) | Intelligent systems and methods for process and asset health diagnosis, anomoly detection and control in wastewater treatment plants or drinking water plants | |
CN111340110B (en) | Fault early warning method based on industrial process running state trend analysis | |
CN113259331A (en) | Unknown abnormal flow online detection method and system based on incremental learning | |
Zhang et al. | A novel data-driven method based on sample reliability assessment and improved CNN for machinery fault diagnosis with non-ideal data | |
CN112036185B (en) | Method and device for constructing named entity recognition model based on industrial enterprise | |
CN110794360A (en) | Method and system for predicting fault of intelligent electric energy meter based on machine learning | |
CN109298633A (en) | Chemical production process fault monitoring method based on adaptive piecemeal Non-negative Matrix Factorization | |
CN112560997A (en) | Fault recognition model training method, fault recognition method and related device | |
CN116842194A (en) | Electric power semantic knowledge graph system and method | |
CN110647117B (en) | Chemical process fault identification method and system | |
CN116822652A (en) | Subway fault prediction method, subway fault prediction device, electronic equipment, subway fault prediction system and storage medium | |
CN112363465B (en) | Expert rule set training method, trainer and industrial equipment early warning system | |
CN106021115A (en) | Non-supervision defect prediction method based on probabilities | |
CN116714437B (en) | Hydrogen fuel cell automobile safety monitoring system and monitoring method based on big data | |
CN110930012B (en) | Energy consumption abnormity positioning method based on sensitivity analysis and improved negative selection method | |
CN117076672A (en) | Training method of text classification model, text classification method and device | |
Alinezhad et al. | A modified bag-of-words representation for industrial alarm floods | |
CN116129182A (en) | Multi-dimensional medical image classification method based on knowledge distillation and neighbor classification | |
CN113156908B (en) | Multi-working-condition industrial process monitoring method and system with mechanism and data combined fusion | |
CN115965119A (en) | Method for power prediction optimization of distributed energy storage system | |
CN115292498B (en) | Document classification method, system, computer equipment and storage medium | |
CN112801372B (en) | Data processing method, device, electronic equipment and readable storage medium | |
Li et al. | Research on the performance degradation assessment method of a mine hoist braking system based on variable step-size fruit fly optimization algorithm–complex Gaussian wavelet support vector data description |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |