CN110647117A - Chemical process fault identification method and system - Google Patents

Chemical process fault identification method and system Download PDF

Info

Publication number
CN110647117A
CN110647117A CN201910844132.5A CN201910844132A CN110647117A CN 110647117 A CN110647117 A CN 110647117A CN 201910844132 A CN201910844132 A CN 201910844132A CN 110647117 A CN110647117 A CN 110647117A
Authority
CN
China
Prior art keywords
data
principal component
variable
chemical process
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910844132.5A
Other languages
Chinese (zh)
Other versions
CN110647117B (en
Inventor
田文德
贾旭清
刘子健
张士发
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao University of Science and Technology
Original Assignee
Qingdao University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao University of Science and Technology filed Critical Qingdao University of Science and Technology
Priority to CN201910844132.5A priority Critical patent/CN110647117B/en
Publication of CN110647117A publication Critical patent/CN110647117A/en
Application granted granted Critical
Publication of CN110647117B publication Critical patent/CN110647117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B19/00Programme-control systems
    • G05B19/02Programme-control systems electric
    • G05B19/418Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM]
    • G05B19/41885Total factory control, i.e. centrally controlling a plurality of machines, e.g. direct or distributed numerical control [DNC], flexible manufacturing systems [FMS], integrated manufacturing systems [IMS] or computer integrated manufacturing [CIM] characterised by modeling, simulation of the manufacturing system
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B2219/00Program-control systems
    • G05B2219/30Nc systems
    • G05B2219/32Operator till task planning
    • G05B2219/32339Object oriented modeling, design, analysis, implementation, simulation language
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/02Total factory control, e.g. smart factories, flexible manufacturing systems [FMS] or integrated manufacturing systems [IMS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Manufacturing & Machinery (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Testing And Monitoring For Control Systems (AREA)
  • General Factory Administration (AREA)

Abstract

The method and the system are applied to the field of chemical fault recognition with expensive labels, a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM for short) is used for recognizing the operation state of the chemical process, a principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of traditional supervised learning on the number of label data is made up, and the recognition accuracy of the semi-supervised learning is improved. The main component analysis method is adopted to eliminate noise and redundant data of the chemical process, abnormal working condition fault identification is carried out by combining historical information and future information, unmarked data with high entropy is effectively selected and marked, the performance of the identification model is improved by fully utilizing the unmarked data, efficient and complete fault identification work of the chemical process is realized, the identification accuracy is higher, and the identification speed is higher, so that the development of chemical safety is promoted.

Description

Chemical process fault identification method and system
Technical Field
The disclosure relates to the technical field related to chemical process fault identification, in particular to a chemical process fault identification method and system.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
According to the statistical analysis of accidents in chemical enterprises, a plurality of tiny exceptions are inevitably generated before any major accident occurs. Therefore, the method carries out fault identification research aiming at the chemical process, finds out potential abnormal conditions in time, and has important theoretical and practical significance for keeping the safe and stable operation of the chemical device.
The inventor finds that the existing process fault identification method mainly comprises the following steps: qualitative models, quantitative models, and data-driven methods. In all data-driven fault identification methods, the supervised learning technology shows a good identification result for fault identification in the chemical process, and the identification precision reaches more than 92%. However, the number of label data in the actual chemical process often does not meet the requirement of supervised learning, labels are added to the non-label data generally through manpower according to experience, and the cost of marking a large amount of easily collected non-label chemical data is expensive.
Semi-supervised learning has been currently applied in a number of fields, such as digital recognition, emotion classification, medical image classification, and so on. In some researches, the requirement of the traditional supervised learning on the number of label data is higher, and the existing semi-supervised learning method shows worse performance than the supervised learning under the condition of the same number of label data. Therefore, applying semi-supervised learning to chemical process fault identification is a topic of little research.
Disclosure of Invention
In order to solve the problems, the invention provides a chemical process fault identification method and a chemical process fault identification system, which are applied to the field of chemical fault identification with expensive labels, and the method combines a principal component analysis method and a dynamic active safety semi-supervised support vector machine, and adopts a dynamic active safety semi-supervised support vector machine model (PCA-DAS 4VM model for short) to identify the operating state of the chemical process, so that the requirement of the traditional supervised learning on the number of label data is met, and the identification precision of the semi-supervised learning is improved.
In order to achieve the purpose, the following technical scheme is adopted in the disclosure:
one or more embodiments provide a chemical process fault identification method, including the steps of:
acquiring operation data in a chemical production process in real time;
preprocessing the acquired running data;
selecting key characteristic data in the operating data by adopting a principal component analysis method;
and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
Further, the key characteristic data comprises tag data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding a tag to the label-free data by adopting an active learning method.
Further, the method for selecting key feature data in the operating data by adopting the principal component analysis method comprises the following steps:
calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;
establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;
obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;
normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;
and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.
Further, calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:
Figure BDA0002194626050000031
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.
Or
Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:
Figure BDA0002194626050000032
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
Further, the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
preprocessing the acquired historical data;
selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.
Further, the step of adding a label to the non-label data by adopting an active learning method comprises the following steps:
optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;
and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
Further, the method for optimizing the confidence of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:
classifying the historical data according to faults to obtain k classes corresponding to the k faults;
and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.
A chemical process fault identification system, comprising:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.
An electronic device comprising a memory and a processor and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of the above method.
A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the above method.
Compared with the prior art, the beneficial effect of this disclosure is:
according to the method, the principal component analysis method is combined with the dynamic active safety semi-supervised support vector machine, the requirement of the traditional supervised learning on the number of the label data is made up, and the identification precision of the semi-supervised learning is improved. The method can eliminate noise and redundant data in the chemical process, combines historical information and future information to identify abnormal working condition faults, effectively selects and marks unmarked data with high entropy, establishes a graphical scenario object model based on a knowledge body, and determines the label of the unmarked data according to the established graphical scenario object model and expert knowledge. Make full use of no label data promotes the identification model performance, has realized high-efficient and complete the work of carrying out chemical process fault identification, and the identification accuracy is higher, and the recognition rate is of value to the development that promotes chemical industry safety sooner.
Drawings
The accompanying drawings, which are included to provide a further understanding of the disclosure, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure and not to limit the disclosure.
FIG. 1 is a flow diagram of a chemical process fault identification method in accordance with one or more embodiments;
FIG. 2 is a principal component variance percentage of TE process fault 4 during training of embodiment 1 of the present disclosure;
FIG. 3 is a principal component eigenvalue of TE process fault 4 during training in embodiment 1 of the present disclosure;
FIG. 4 is a key measured variable weight of TE process fault 4 during training of embodiment 1 of the present disclosure;
FIG. 5 is a graph of key measured variables determined in 20 TE faults in the training process of embodiment 1 of the present disclosure;
FIG. 6 is a comparison of PCA-DAS4VM accuracy at different unlabeled data volumes;
FIG. 7 is a TE process graphical scenario object model based on ontology;
FIG. 8 is a graphical scenario object model of TE process fault 4;
FIG. 9 is a comparison of F1 scores for the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 10 is a comparison of the FPRs of the PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 11 is a FDR comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 12 is a G-mean comparison of PCA-S4VM, DAS4VM, and PCA-DAS4VM models;
FIG. 13 is a comparison of the accuracy of the DSSAE, ALSemiFDA and PCA-DAS4VM models.
The specific implementation mode is as follows:
the present disclosure is further described with reference to the following drawings and examples.
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments in the present disclosure may be combined with each other. The embodiments will be described in detail below with reference to the accompanying drawings.
Example 1
In the technical solution disclosed in one or more embodiments, as shown in fig. 1, a method for identifying a fault in a chemical process includes the following steps:
step 1, acquiring operation data in a chemical production process in real time;
step 2, preprocessing the acquired running data;
step 3, selecting key characteristic data in the operation data by adopting a principal component analysis method;
and 4, establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
The operation data obtained in the step 1 in the chemical production process comprises the flow of each material in the chemical production, control parameter data in each device and the like. The following table 1 is the operating data parameters for one production example listed in this example.
TABLE 1
Figure BDA0002194626050000081
The preprocessing of the data in the step 2 includes performing Z-score normalization on the acquired operational data by subtracting the mean value and comparing the standard deviation with the mean value, as follows:
Figure BDA0002194626050000082
wherein, x in the equationqgIs the process data; μ is the mean of the data matrix; σ is the standard deviation of the data matrix; z is the value of the data normalization.
The pretreatment converts multiple groups of data into unitless Z-Score values, so that the data standard is normalized, the data comparability is improved, and the data interpretability is weakened.
The step 3 of selecting key characteristic data in the operation data by adopting a principal component analysis method comprises the following steps:
(31) calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; and sequencing the variance contribution rates from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable. The proportional threshold can be set to 80% or more in this embodiment.
The calculation formula of the feature variance and the feature value may be as follows:
Figure BDA0002194626050000091
|C-λE|=0 (3)
c in the equation is a feature covariance matrix obtained by calculation; m is the dimension of the data matrix; λ is the eigenvalue of the data matrix; e is an identity matrix and Z is a data matrix.
(32) Establishing principal component linear expressions according to the principal component variables, calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values, and calculating through the following formula:
Figure BDA0002194626050000092
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the value of the d-th principal component variable (abbreviated as principal element) of the variable q; e is the feature root of the d-th pivot.
(33) Obtaining a comprehensive score model according to coefficients of principal component variables in the principal component linear expression, wherein the comprehensive score model comprises variables and variable coefficients, and the variable coefficients in the comprehensive score model have the following calculation formula:
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
(34) And normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight.
(35) And sorting the weight values of all fault variables, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data. The set threshold may be set according to specific conditions, and may be 80% or more.
The key characteristic data comprises tag data and non-tag data, and whether a fault or a fault hidden danger exists in the current chemical process can be obtained by taking the key characteristic data as the input of the dynamic active safety semi-supervised support vector machine model.
In the step 4, the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
step 4-1, acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
step 4-2, preprocessing the acquired historical data;
4-3, selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
and 4-4, adding a label to the non-label data by adopting an active learning method, inputting the labeled data and the labeled data as input into the dynamic active safety semi-supervised support vector machine model for training, and outputting the fault type or normal operation to obtain the parameters of the dynamic active safety semi-supervised support vector machine model.
In the step 4-4, the method for adding the label to the non-label data by adopting the active learning method comprises the following steps:
and 441, optimizing the confidence coefficient of the pseudo label of the identification model by adopting an averaging method by combining historical information and future information of the chemical process data.
Figure BDA0002194626050000111
Wherein the value of the current time is Pj,kThe historical information is the previous value (P)j-1,k) The future information is the value (P) at the next momentj+1,k)。
The optimization method described in this embodiment is averaging and is used to filter outliers. If the normal state is 5, the value at the previous time is 4.8, the value at the next time is 5.1, and the value at the current time is 8, the average value of the three values is 5.97, which is the value at the current time.
And 442, calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
The knowledge inference method based on the knowledge ontology is adopted to add labels to the label-free data, so that the cost of adding the labels is reduced, and the labor waste is reduced.
Ontology is a displayed description of the concepts of a domain and their relationships. For example, in this embodiment, a scenario object model (as shown in fig. 7) of the TE process is established by applying a knowledge ontology to express implicit information in the process, that is, information obtained from a brainstorming storm is converted into a logical relationship expressed in a graphic form. The ontology converts technical experience into a graphical relationship.
Each step in the training process is described in detail below:
in the training process of the embodiment, a data set of a Tennessee-Eastman process (TE process for short) is selected, and other data sets or historical data in the chemical production process can be selected or directly obtained for different chemical production processes.
Taking the fault 4 in the TE process as a case, firstly, Z-score normalization is performed on the data in the fault 4, the variable dimension is unified, and the specific method of preprocessing may be the same as that in step 1. The operating parameters of the set model are shown in table 2:
TABLE 2
Figure BDA0002194626050000121
And 4-3, selecting key characteristic data in the historical operating data by adopting a principal component analysis method, wherein the method can be the same as the method in the step 3.
(4-31), calculating the characteristic covariance matrix of the preprocessed data matrix, and the eigenvalue and the eigenvector of the covariance matrix; and sequencing the variance contribution rates from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable. The proportional threshold can be set to 80% or more in this embodiment. The results of the calculation of the data of fault 4 in TE process show that the sum of the variance contribution rates of the first 12 principal components is 83.15%, and the information of the first 12 principal components can represent all the variable information, such as variance percentage of the first 12 principal components of fault 4 in TE process in fig. 2, and feature value of the first 12 principal components of fault 4 in TE process in fig. 3. Each principal component corresponds to a plurality of measured variables, and the following steps are used to determine the weights of the key variables corresponding to the principal components.
(4-32) determining coefficients of the variables in each pivot linear expression, the method may be the same as step 32.
(4-33) the method of determining the coefficient of variation in the composite score model may be the same as in step 33.
(4-34) normalizing the variable coefficients in the obtained composite score model to re-determine the variable weights, which may be the same as step 34. Fig. 4 is the key measured variable weight for fault 4 in the TE process.
And (4-35) sorting the weight values of all fault variables, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data. The set threshold may be set according to specific conditions, and may be 80% or more. In this embodiment, the variable weight values in 20 faults of the tennessee-iseman process are ranked, the variables with the weight sum higher than 80% are used as input data, and the result of fig. 4 shows that the weight sum of the first 13 variables of the fault 4 in the TE process is 80.06%, so all the variables are represented by the first 13 variables. Similarly, with the sum of weights above 80%, key variables for other failures of the TE process can be determined, with the results shown in figure 5.
In step 441, the method for optimizing the confidence of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically includes:
441-1, classifying the historical data according to faults to obtain k classes corresponding to k faults; specific categories are shown in table 3, for example.
TABLE 3
Figure BDA0002194626050000131
442-2 calculating the confidence coefficient of each data belonging to each category K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient, wherein the calculation formula is as follows:
Figure BDA0002194626050000141
wherein, Pj,kIs the confidence that the jth data belongs to class k;
in step 442, entropy of the key feature data is calculated according to the confidence of the pseudo tag, active learning is adopted to select the key feature data with a high entropy, and a method for adding a data tag to the key feature data based on a knowledge ontology specifically includes:
1) and establishing a plot object model based on the knowledge ontology based on the corresponding relation between the process mechanism and the events. Any one of the circles in fig. 7 represents an event.
2) And selecting the key characteristic data with high entropy value by adopting active learning. The value range of the entropy value is [0,1], the method for determining the high-entropy value can be realized by setting a threshold value, the threshold value can be set to be 0.8, the amount of information carried by the key characteristic data of the high-entropy value is large, and the fault reason is obtained through reverse reasoning according to a scenario object model such as the scenario object model established based on the knowledge body in fig. 7, so that the label is determined.
The formula for calculating the entropy of the key feature data is as follows:
Figure BDA0002194626050000142
Figure BDA0002194626050000143
Figure BDA0002194626050000144
wherein, Pj,kIs the confidence that the jth data belongs to class k; ent (ent)jIs the entropy value of the jth data; k is the number of categories (K ═ 1,2, …, K); a is a stopping criterion for active learning.
The plot object model based on the knowledge ontology is established by production experience, and a TE process is divided into five parts by taking a fault 4 as an example: reactor, condenser, product separator, recycle compressor and stripper, reactor temperature is the first warning variable for fault 4. Since the reaction is exothermic, there are three direct reasons that may affect the reactor temperature: abnormal circulation flow, abnormal reactor feed flow rate, and abnormal reactor cooling water temperature. This failure results in 4 main consequences: high reactor cooling water outlet temperature, high condenser cooling water outlet temperature, high product separator temperature, and high reactor pressure. Analysis shows that the circulating flow and the reactor feeding flow rate have no deviation, so that the temperature abnormality of the cooling water of the reactor is most likely to cause the temperature abnormality of the reactor. Fig. 7 is a TE process graphical scenario object model based on ontology, and fig. 8 is a TE process fault 4 graphical scenario object model.
A simulation experiment is performed to illustrate the identification effect of the dynamic active safety semi-supervised support vector machine model (abbreviated as PCA-DAS4VM model) established in this embodiment to identify the operation state of the chemical process, which is specifically as follows.
The PCA-DAS4VM model is: the dynamic active safety semi-supervised support vector machine model of the present embodiment.
The DAS4VM model is: the dynamic active safety semi-supervised support vector machine model carries out fault identification on the measured variables by using the DAS4VM model without selecting key variables of a Principal Component Analysis (PCA).
The PCA-S4VM model is: and selecting PCA key variables, and carrying out safe semi-supervised support vector machine model.
The DSSAE model was: a model described in "Jiang L, Ge Z, Song Z.semi-super fault classification based on dynamic spark Stack auto-encoders model [ J ]. Chemimetry and Intelligent Laboratory Systems,2017: S0169743917302496" is cited.
The ALSemiFDA model is: a model described in "Lili yin, huangguang w, etc. incorporated active leaving to semi-assisted fault classification [ J ], Journal of Process control78(2019) 88-97" is cited.
Transverse comparison: fig. 6 illustrates that the more the unlabeled data volume is, the better the proposed PCA-DAS4VM model of the present embodiment is for fault identification.
Longitudinal comparison: the model of the present embodiment is compared with other models. And calculating F1 scores, False Positive Rate (FPR), Fault Diagnosis Rate (FDR), G-mean and accuracy comparison curves of different comparison models according to the identification effect. As shown in FIGS. 9-12, G-mean is another system performance evaluation criterion of uniform accuracy and recall, defined as the geometric mean of accuracy and recall. The higher the F1 score, the better the FPR, the higher the FDR, the better the G-mean, the better the Accuracy. These are only some machine learning indexes, and the F1 score considers both the accuracy rate of the classification model and the fault diagnosis, and can be regarded as a harmonic average rate of the model accuracy rate and the recall rate. The accuracy of the classification model refers to the proportion of samples that are identified as positive classes that are truly positive classes. Accuracy is the total weight of all predictions correct (positive class negative class).
FIG. 13 is a comparison of the accuracy of the DSSAE, ALSemiFDA and PCA-DAS4VM models, illustrating the higher accuracy of the models of this embodiment.
Example 2
The embodiment provides a chemical process fault identification system, includes:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
Example 3
The present embodiment provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and executed on the processor, wherein the computer instructions, when executed by the processor, implement the steps of the method in embodiment 1.
Example 4
The present embodiment provides a computer readable storage medium for storing computer instructions which, when executed by a processor, perform the steps of the method of embodiment 1.
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.
Although the present disclosure has been described with reference to specific embodiments, it should be understood that the scope of the present disclosure is not limited thereto, and those skilled in the art will appreciate that various modifications and changes can be made without departing from the spirit and scope of the present disclosure.

Claims (10)

1. A chemical process fault identification method is characterized by comprising the following steps:
acquiring operation data in a chemical production process in real time;
preprocessing the acquired running data;
selecting key characteristic data in the operating data by adopting a principal component analysis method;
and establishing a dynamic active safety semi-supervised support vector machine model based on a semi-supervised learning method, inputting key characteristic data into the trained dynamic active safety semi-supervised support vector machine model, and outputting the running state of the chemical process.
2. A chemical process fault identification method as claimed in claim 1, wherein: the key characteristic data comprises label data and label-free data, and the processing of the key data by the dynamic active safety semi-supervised support vector machine model comprises the step of adding labels to the label-free data by adopting an active learning method.
3. A chemical process fault identification method as claimed in claim 1, wherein: the method for selecting the key characteristic data in the operating data by adopting the principal component analysis method comprises the following steps:
calculating a characteristic covariance matrix of the preprocessed data matrix, and a characteristic value and a characteristic vector of the covariance matrix; sorting according to the variance contribution rate from large to small, and obtaining a variable with the sum of the variance contribution rates exceeding a set proportion threshold value as a principal component variable;
establishing principal component linear expressions according to the principal component variables, and calculating coefficients of the principal component variables in the principal component linear expressions according to the characteristic values;
obtaining a comprehensive scoring model according to the coefficient of the principal component variable in the principal component linear expression, and calculating the variable coefficient in the comprehensive scoring model through the variance of the principal component variable;
normalizing the variable coefficient in the obtained comprehensive score model, and re-determining the variable weight;
and sorting the re-determined variable weights according to the weight values, wherein the operation data corresponding to the variables with the weight sum higher than the set threshold value is the key characteristic data.
4. A chemical process fault identification method as claimed in claim 3, wherein:
calculating coefficients of principal component variables in each principal component linear expression according to the characteristic values, wherein the calculation formula is as follows:
Figure FDA0002194626040000021
wherein coe is the coefficient of variable q in the d-th principal component linear expression; v is the d-th principal element of the variable q; e is the feature root of the d-th pivot.
Or
Calculating the variable coefficient in the comprehensive score model according to the coefficient of the principal component variable in the principal component linear expression, wherein the calculation formula is as follows:
Figure FDA0002194626040000022
wherein, w in the equation is the coefficient of variable q in the comprehensive scoring model; o is the number of principal components; s is the variance of the d-th principal.
5. A chemical process fault identification method as claimed in claim 1, wherein: the training process of the dynamic active safety semi-supervised support vector machine model comprises the following steps:
acquiring historical data of a chemical production process, wherein the historical data comprises fault data and non-fault data;
preprocessing the acquired historical data;
selecting key characteristic data in the operating data by adopting a principal component analysis method, wherein the key characteristic data comprises tag data and non-tag data;
the method comprises the steps of adding labels to label-free data by adopting an active learning method, inputting the labeled data and the labeled data as input into a dynamic active safety semi-supervised support vector machine model for training, and obtaining parameters of the dynamic active safety semi-supervised support vector machine model by taking fault type or normal operation as output.
6. A chemical process fault identification method as claimed in claim 2 or 5, wherein: the method for adding the label to the label-free data by adopting the active learning method comprises the following steps:
optimizing the confidence coefficient of the pseudo tag of the recognition model by combining historical information and future information of the chemical process data;
and calculating the entropy value of the key characteristic data according to the confidence coefficient of the pseudo label, selecting the key characteristic data with high entropy value by adopting active learning, and adding a data label for the key characteristic data based on the knowledge body.
7. A chemical process fault identification method as claimed in claim 6, wherein: the method for optimizing the confidence coefficient of the pseudo tag of the identification model by combining the historical information and the future information of the chemical process data specifically comprises the following steps:
classifying the historical data according to faults to obtain k classes corresponding to the k faults;
and calculating the confidence coefficient of each data belonging to each class K, and calculating the confidence coefficient of the pseudo label of each key characteristic data by adopting an averaging method according to the calculated confidence coefficient.
8. A chemical process fault identification system is characterized by comprising:
a data acquisition module: the system is used for acquiring operation data in the chemical production process in real time;
a preprocessing module: the system is used for preprocessing the acquired operation data;
the key characteristic data extraction module: the method comprises the steps of selecting key characteristic data in the operating data by adopting a principal component analysis method;
an identification module: the method is used for inputting the key characteristic data into the trained dynamic active safety semi-supervised support vector machine model and outputting the running state of the chemical process.
9. An electronic device comprising a memory and a processor and computer instructions stored on the memory and executable on the processor, the computer instructions when executed by the processor performing the steps of the method of any of claims 1 to 7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any one of claims 1 to 7.
CN201910844132.5A 2019-09-06 2019-09-06 Chemical process fault identification method and system Active CN110647117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910844132.5A CN110647117B (en) 2019-09-06 2019-09-06 Chemical process fault identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910844132.5A CN110647117B (en) 2019-09-06 2019-09-06 Chemical process fault identification method and system

Publications (2)

Publication Number Publication Date
CN110647117A true CN110647117A (en) 2020-01-03
CN110647117B CN110647117B (en) 2020-12-18

Family

ID=69010277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910844132.5A Active CN110647117B (en) 2019-09-06 2019-09-06 Chemical process fault identification method and system

Country Status (1)

Country Link
CN (1) CN110647117B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723650A (en) * 2020-05-25 2021-11-30 中国石油化工股份有限公司 Chemical process abnormity monitoring system based on semi-supervised model and model optimization device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4729383A (en) * 1984-12-07 1988-03-08 Susi Roger E Method and apparatus for automatically determining blood pressure measurements
CN102830624A (en) * 2012-09-10 2012-12-19 浙江大学 Semi-supervised monitoring method of production process of polypropylene based on self-learning statistic analysis
US20150209452A1 (en) * 2014-01-27 2015-07-30 Washington University Metal-binding bifunctional compounds as diagnostic agents for alzheimer's disease
CN106448096A (en) * 2016-11-24 2017-02-22 青岛科技大学 Alarm threshold value optimization method based on dimension compression and normal transformation
CN106843195A (en) * 2017-01-25 2017-06-13 浙江大学 Based on the Fault Classification that the integrated semi-supervised Fei Sheer of self adaptation differentiates
CN107423156A (en) * 2017-07-29 2017-12-01 合肥千奴信息科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
CN109522973A (en) * 2019-01-17 2019-03-26 云南大学 Medical big data classification method and system based on production confrontation network and semi-supervised learning
CN109800799A (en) * 2018-12-31 2019-05-24 华南理工大学 A kind of online Active Learning Method suitable for no label unbalanced data stream

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4729383A (en) * 1984-12-07 1988-03-08 Susi Roger E Method and apparatus for automatically determining blood pressure measurements
CN102830624A (en) * 2012-09-10 2012-12-19 浙江大学 Semi-supervised monitoring method of production process of polypropylene based on self-learning statistic analysis
US20150209452A1 (en) * 2014-01-27 2015-07-30 Washington University Metal-binding bifunctional compounds as diagnostic agents for alzheimer's disease
CN106448096A (en) * 2016-11-24 2017-02-22 青岛科技大学 Alarm threshold value optimization method based on dimension compression and normal transformation
CN106843195A (en) * 2017-01-25 2017-06-13 浙江大学 Based on the Fault Classification that the integrated semi-supervised Fei Sheer of self adaptation differentiates
CN107423156A (en) * 2017-07-29 2017-12-01 合肥千奴信息科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
CN109800799A (en) * 2018-12-31 2019-05-24 华南理工大学 A kind of online Active Learning Method suitable for no label unbalanced data stream
CN109522973A (en) * 2019-01-17 2019-03-26 云南大学 Medical big data classification method and system based on production confrontation network and semi-supervised learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JONG-MIN KIM: "《A Study of Face Recognition using the PCA and Error Back-Propagation》", 《2010 SECOND INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS》 *
朱美琳: "《半监督支持向量机的多分类学习算法》", 《郑州大学学报(理学版)》 *
李锋: "《基于 Laplacian 双生最小二乘支持向量机的》", 《2016 13TH INTERNATIONAL CONFERENCE ON UBIQUITOUS ROBOTS AND AMBIENT INTELLIGENCE (URAI)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113723650A (en) * 2020-05-25 2021-11-30 中国石油化工股份有限公司 Chemical process abnormity monitoring system based on semi-supervised model and model optimization device

Also Published As

Publication number Publication date
CN110647117B (en) 2020-12-18

Similar Documents

Publication Publication Date Title
CN113723632A (en) Industrial equipment fault diagnosis method based on knowledge graph
US20230213895A1 (en) Method for Predicting Benchmark Value of Unit Equipment Based on XGBoost Algorithm and System thereof
EP3552013A1 (en) Intelligent systems and methods for process and asset health diagnosis, anomoly detection and control in wastewater treatment plants or drinking water plants
CN111340110B (en) Fault early warning method based on industrial process running state trend analysis
CN113259331A (en) Unknown abnormal flow online detection method and system based on incremental learning
Zhang et al. A novel data-driven method based on sample reliability assessment and improved CNN for machinery fault diagnosis with non-ideal data
CN112036185B (en) Method and device for constructing named entity recognition model based on industrial enterprise
CN110794360A (en) Method and system for predicting fault of intelligent electric energy meter based on machine learning
CN109298633A (en) Chemical production process fault monitoring method based on adaptive piecemeal Non-negative Matrix Factorization
CN112560997A (en) Fault recognition model training method, fault recognition method and related device
CN116842194A (en) Electric power semantic knowledge graph system and method
CN110647117B (en) Chemical process fault identification method and system
CN116822652A (en) Subway fault prediction method, subway fault prediction device, electronic equipment, subway fault prediction system and storage medium
CN112363465B (en) Expert rule set training method, trainer and industrial equipment early warning system
CN106021115A (en) Non-supervision defect prediction method based on probabilities
CN116714437B (en) Hydrogen fuel cell automobile safety monitoring system and monitoring method based on big data
CN110930012B (en) Energy consumption abnormity positioning method based on sensitivity analysis and improved negative selection method
CN117076672A (en) Training method of text classification model, text classification method and device
Alinezhad et al. A modified bag-of-words representation for industrial alarm floods
CN116129182A (en) Multi-dimensional medical image classification method based on knowledge distillation and neighbor classification
CN113156908B (en) Multi-working-condition industrial process monitoring method and system with mechanism and data combined fusion
CN115965119A (en) Method for power prediction optimization of distributed energy storage system
CN115292498B (en) Document classification method, system, computer equipment and storage medium
CN112801372B (en) Data processing method, device, electronic equipment and readable storage medium
Li et al. Research on the performance degradation assessment method of a mine hoist braking system based on variable step-size fruit fly optimization algorithm–complex Gaussian wavelet support vector data description

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant