CN108960329A - A kind of chemical process fault detection method comprising missing data - Google Patents

A kind of chemical process fault detection method comprising missing data Download PDF

Info

Publication number
CN108960329A
CN108960329A CN201810734994.8A CN201810734994A CN108960329A CN 108960329 A CN108960329 A CN 108960329A CN 201810734994 A CN201810734994 A CN 201810734994A CN 108960329 A CN108960329 A CN 108960329A
Authority
CN
China
Prior art keywords
data
model
missing
missing values
chemical process
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810734994.8A
Other languages
Chinese (zh)
Other versions
CN108960329B (en
Inventor
周乐
余家鑫
介婧
张淼
郑慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lover Health Science and Technology Development Co Ltd
Original Assignee
Zhejiang Lover Health Science and Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lover Health Science and Technology Development Co Ltd filed Critical Zhejiang Lover Health Science and Technology Development Co Ltd
Priority to CN201810734994.8A priority Critical patent/CN108960329B/en
Publication of CN108960329A publication Critical patent/CN108960329A/en
Application granted granted Critical
Publication of CN108960329B publication Critical patent/CN108960329B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2134Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Mathematical Analysis (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Optimization (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of chemical process fault detection method comprising missing data.The present invention utilizes the iterative learning method based on independent component analysis and autoregression dynamic hidden variable, establish an effective dynamic chemical process Fault Model, and it overcomes measurement data in actual industrial process and there are missing values, improve the on-line monitoring efficiency and performance of chemical process, so that industrial chemical process is more stable, product quality control is relatively reliable.

Description

A kind of chemical process fault detection method comprising missing data
Technical field
The invention belongs to chemical production process control field more particularly to a kind of chemical process failures comprising missing data Detection method.
Background technique
In modernization industry, as the temperature of machine learning, big data is higher and higher, and these theories are in Industry Control Aspect has had made some progress, and the extensive utilization of dcs, becomes multivariate statistical procedure detection (MSPM) Indispensable part in control system.For a modern chemical industry process, as measurands such as temperature, pressure, flows It is sampled based on time interval, so having strong autocorrelation, then needs to carry out Dynamic Process Modeling.
In order to extract the autocorrelation of measurand, traditional dynamic model have the dynamic using extended matrix method it is main at Analysis (DPCA).After this innovative work, extended matrix just becomes the common method of many dynamic models, than Such as dynamic I CA, dynamic PLS, dynamic FA.
In these traditional methods based on extended matrix, the autocorrelation of variable can be extracted, but they are still So there are some defects, i.e., does not model the cross correlation for reflecting data structure.In order to solve this problem, autoregression is proposed Dynamic latent variable model (ARDLV), autoregression model and latent variable model extract the dynamic and static characteristic of data respectively.
Sampled data set is only extracted currently based on the iteration autoregression dynamic latent variable model of generative probabilistic model Gauss information, and do not model the non-Gauss information in data.And missing data, detection efficiency are not considered the problems of Low, stability is poor.
Summary of the invention
In view of the above-mentioned deficiencies in the prior art, it is an object of the present invention to provide a kind of dynamic process failure comprising missing data Detection method.
In view of the problems of the existing technology, it is dynamic based on independent component analysis and iteration autoregression that the invention proposes one kind State latent variable model.The data set with missing values is handled first, obtain the initial value of missing values and obtains complete training Data set;Then training set is modeled to extract non-Gauss information using independent component analysis (ICA);It finally will be at ICA Residual error after reason is modeled based on iteration autoregression dynamic latent variable model, extracts the Gauss information in data.
The purpose of the present invention is achieved through the following technical solutions:
A kind of chemical process fault detection method comprising missing data, comprising the following steps:
(1) data of the chemical process to be detected when operating normally are collected, the raw data set Y with missing values is obtainedo, Yo∈RK×N, in which: K is sampling sum, and N is total number of variable;
(2) to raw data set YoCarry out whitening pretreatment, the data set after obtaining whitening pretreatment
(3) to data setIn missing values estimated and filled up, obtain complete data set Yen, Yen∈RK×N, together When record missing values position;
(4) based on obtained data set Yen, independent component analysis model is constructed, independent element matrix S is obtained0,Hybrid matrix J0,Solve mixed matrix W0,R1For the number of independent element;
(5) according to the independent component analysis model of building, residual error data collection is obtainedCorresponding I is constructed simultaneously2Statistic Control limit;
(6) it is based on residual error data collectionConstruct iteration autoregression dynamic latent variable model;
(7) according to the iteration autoregression dynamic latent variable model of foundation, estimate the desired value t of the hidden variable of training sample (tnormal), the variance var of hidden variable-1(t|x)(var-1(tnormal|xnormal)) and model predictive error E, construct corresponding T2 Control with SPE statistic limits;
(8) sample data in new chemical production process is collected as test sample, and is normalized;
(9) the iteration autoregression dynamic obtained according to the independent component analysis model of step (4) building and step (6) is hidden Variate model, the I of the test sample after calculating normalization2、T2, SPE statistic, judge whether it is more than corresponding control limit, is obtained To the on-line monitoring result of the current chemical production process.
In step (1), used using the data under the normal productive process operation of Distributed Control System acquisition as modeling Training sample set, and usually there are a certain amount of missing values in the training set, give the raw data set Y with missing valuesO:
Wherein ymIndicate that missing values, sampling sum are K, total number of variable N;
For a certain chemical process, in setting time, sampling sum is generally related with using frequency;Total number of variable one As it is related with the property major influence factors of chemical process.Common variable includes but is not limited to: temperature, pressure, concentration of substrate, One of production concentration, weight, pH value etc. are a variety of.
The missing values that the present invention is directed to are random, accidental generations, and a line or a column are observed in data, may include one Or it is multiple.
In the present invention, to raw data set Y in step (2)oWhen carrying out whitening pretreatment, column based processing, for a certain column, If including missing values in the column data, (or perhaps ignoring) all missing values are removed, albefaction is carried out to remaining data Pretreatment.
In the present invention, in step (2), described " removal " all missing values only remove in data processing, Data set after data processingIn still include all missing values, only to normal in addition to these missing values Data carry out whitening processing.For example, assuming that the observation of certain column is yv=[y1 … yj_1 ym yj+1 … yK]T, then by the column institute Contain missing values ymIt removes, obtainsThen rightWhitening pretreatment is carried out, is eliminated Its difference in horizontal and dimension.Remaining column is handled in the same manner.
In the present invention, whitening processing in step (2) may also be referred to as normalized, that is, subtract mean value then divided by Variance, finally making data set mean value is zero, variance 1.
In the present invention, missing values are estimated using independent composition analysis algorithm in step (3).
Preferably, the step of estimating missing values specifically includes:
(3-1) is based on data setRemove all place moment corresponding data lines with missing values, obtains rule Data set Yr, Yr∈RM×N, M is that there is no the line numbers of missing data;
(3-2) is based on data set YrIndependent component analysis model is established, its corresponding hybrid matrix J is acquired, Independent element matrix S,Mixed matrix W is solved,R2For the number of independent element;
Yr=SJT (3)
Wherein,For the estimated value of independent element matrix;
R in step (3)2With the R in step (4)1It may be the same or different, for convenience of calculating, preferably, R1 =R2
(3-3) utilizes regular data collection YrAnd the hybrid matrix J acquired, acquire the score s of missing valuesT
Certain row observation is assumed to be y=[y1 … yj_1 yj yj+1 … yN], define yjFor missing values (certain missing values Be also possible to it is multiple, here for one), in order to estimate the missing values of sampled data, estimate to lack using regular data collection The score of data is lost, formula is as follows:
(3-4) utilizes score sTAnd hybrid matrix J obtains the estimated value of missing values;
Wherein,To remove the observation vector after missing values,For hybrid matrix J Remove that there are the new matrixes that missing values are expert at.
The estimated value of missing sample can be calculated with following formula:
pjFor the jth row of J;
It is obtained using step (3-1)~(3-4)In corresponding missing values, and fill up data setIn, similarly, use This method estimate it is each there are the rows of missing values, obtain complete data set Yen
Suddenly (5) extract complete training dataset Y according to the independent component analysis model of buildingenNon-Gauss information. Acquire training dataset YenResidual error data collectionThe residual error data collectionIt is obtained by following formula:
Meanwhile according to the independent component analysis model of foundation, the independent element matrix S being calculated after model convergence0It obtains Obtain training sample independent element inormal, construct I2Statistic is as follows:
I2=inormal T×inormal (9)
By I2Sequence arranges from small to large, takes I2The control of statistic is limited to numerical value at (0.95 × K).
Preferably, in step (6), to the residual error data collection of above-mentioned independent component analysis modelIt is pre-processed and is returned One changes, so that the mean value of each process variable is 0, variance 1;
Preferably, in step (6), the step of constructing iteration autoregression dynamic hidden variable (R-ARDLV) model, includes:
(6-1) initialization model parameter (A, C, Q, R);
(6-2) is based on current residual error data collectionEstablish dynamic latent variable model;
xk=Azk_1+wk (10)
yk=Cxk+vk (11)
Wherein: yk∈RBFor the observation at k moment, B ykDimension, ykBy residual error data collectionIt obtains;xkIt is that dynamic is hidden Variable, xk∈RD, D is lower-dimensional subspace dimension;C is corresponding matrix of loadings, C ∈ RB×D;wkFor dynamic noise, and it is assumed to white Noise, and its noise variance is Q;The relationship for establishing dynamic hidden variable Yu its past value obtains dynamic factor vector zk_1= [xk_1 T xk_2 T ... xk_L T]T∈RDL, wherein L is kinetic order;A is state-transition matrix, A ∈ RD×DL;vkIt makes an uproar for measurement Sound, Gaussian distributed, and its noise variance are R (runic R);
In order to which subsequent calculating is easy, above-mentioned dynamic latent variable model equal value exchange obtains:
Wherein For new state-transition matrix, newly Matrix of loadings, noiseWherein
(6-3) is estimated using hidden variable of the expectation-maximization algorithm to model, and calculates likelihood function, calculates likelihood The updated value Log of functionnewWith its former likelihood function value LogoldDifference, judge whether likelihood function restrains | | Lognew- Logold||21: if so, jumping to step (6-4);If it is not, updating model parameter, step (6-2) is jumped to;ε1For likelihood letter Number convergence threshold;
(6-4) according to the positions of the missing values recorded in step (3), based on the iteration autoregression dynamic hidden variable constructed Model predicts missing values again, updates residual error data collection
(6-5) is based on updating residual error data collectionComputation model parameter judges whether to restrain: if so, entering step (7); If it is not, updating model parameter, step (6-2) is jumped to.
Wherein, general by the posteriority of "current" model parameter Estimation hidden variable using Kalman filtering in E step in step (6-3) Rate;Later parameters are directed to by likelihood function in M step respectively and seek single order local derviation, enabling local derviation is 0, obtains model parameter more New value;Finally, iterate E step and M step are until reach the condition of convergence.
It is calculated in E step by Kalman filtering:
WhereinFor the expectation variance for predicting the k moment based on the k-1 moment;For based on the k-1 moment Variance;KkFor kalman gain;WithFor corresponding moment zkExpectation and variance;For the state at k-1 moment Estimated value;For based on complete observation collection YenThe second moment of obtained k moment dynamic factor;For Based on complete observation collectionThe second moment of obtained k moment dynamic hidden variable;For the update of k moment dynamic hidden variable;For xkPrediction variance, and it can be fromMiddle acquisition, and:
K is total sample number;
The current likelihood function of computation model:
Wherein logP (X, Y) is X, the joint log-likelihood function of Y;μLFor the initial value of mean value;VLFor the initial of variance Value;zLFor the initial value of dynamic factor;Constant refers to constant term;
In M step, model parameter is updated by way of maximizing likelihood function, formula is as follows:
WhereinWithFor the estimated value of the initial value of model dynamic factor, i.e.,For μLEstimated value,For VLEstimate Evaluation;For the updated value of the dynamic factor initial value based on all K observations;For based on residual error data collectionThe second moment of obtained dynamic factor initial value;For based on residual error data collectionObtain the dynamic at k-1 moment because The second moment of sub- initial value;For based on residual error data collectionThe second moment of obtained k moment dynamic hidden variable;Indicate the update of the current dynamic hidden variable obtained based on all observation;For according to observation It obtainsWithExpectation, remember "current" model parameterIt iterates, until likelihood function Convergence.
In step (6-4), according to the position of the missing values recorded in step (3), using Kalman filtering to corresponding missing Value is estimated and is updated;The updated value θ of computation model parameternewWith its master mould parameter θoldDifference, judgment models parameter Whether restrain: | | θnewold||22, if so, model training finishes, monitored on-line in next step;If it is not, jumping to (6- 2), wherein ε2For the convergent threshold value of model parameter, concrete operations are as follows:
Under "current" model parameter, by ARDLV model prediction missing values, Kalman filtering is pre- as an accurate step It surveys, missing values are calculated by following equation:
WhereinFor the updated value of the dynamic factor based on the k-1 moment one-step prediction k moment,Rc,For current model parameter.The missing values that respective record position is updated by missing values that above-mentioned formula is calculated, until Model parameter convergence.
In step (7), corresponding T is constructed2Specific steps are limited with SPE statistic monitoring and statistics are as follows:
According to what is be calculated after model convergence in step (6)Obtain training sample Hide the desired value t of variablenormal, calculation formula are as follows:
Utilize the desired value t of hidden variablenormal, T can be constructed2Statistic is as follows:
T2=tnormal Tvar-1(tnormal|xnormal)tnormal (33)
T2The control of statistic limitsBy χ2Distribution is estimated as follows:
Wherein, G is the number of hidden variable.
Meanwhile the prediction error based on model, SPE statistic can be also constructed with the variation in reaction model residual error space:
Further deriving can obtain:
SPE=ETvar-1(E|xnormal)E (36)
The control of SPE statistic limits estimation method are as follows:Wherein,
Gh=mean (SPE) (37)
2g2H=var (SPE) (38)
Wherein g, h are chi square distribution parameter, and mean () is mean operation symbol, and var () is variance oeprator.
In step (8), specifically: it uses based on independent component analysis and iteration autoregression dynamic latent variable model to test Chemical process is monitored on-line, and the I of test sample is calculated2、T2, SPE statistic, judge its whether be more than statistics limit, worked as The on-line monitoring of the preceding chemical production process as a result, its detailed process is as follows:
As test data set YtestAfter (test set) is collected, it is input to independent component analysis model:
Construct I2Statistic is as follows:
By the residual error after independent component analysis model treatmentIt is input to iteration autoregression dynamic latent variable model, by card The desired value t of test set hidden variable is calculated in Kalman Filteringtest:
Wherein xk_testFor the dynamic hidden variable of current time test set;For the hidden change of dynamic of current time test set The update of amount;zk-1_testFor the dynamic factor of last moment test set;
Utilize the desired value t of hidden variabletestBuildingStatistic is as follows:
Ttest 2=ttest Tvar-1(ttest|Xtest)ttest (42)
Wherein var-1(ttest|xtest) be hidden variable variance.
Meanwhile the Gauss based on model predicts error, can also construct SPEtestStatistic is to reflect model
The information of Gauss residual error:
Wherein, yk_testFor the observation of current time test set;For the update of current time test set observation;
Further deriving can obtain:
SPEtest=Etest Tvar-1(Etest|xtest)Etest (44)
Judge whether it is more than statistics limit, obtains the on-line monitoring result of chemical production process.
In the actual industrial process, more or less there are missing values in measurand, and the present invention is when handling dynamic process The advantages of ARDLV is utilized proposes the novel recursion method based under probabilistic framework of one kind to estimate missing values and model Parameter, therefore even if collected data set there are missing values, still is able to carry out accurate modeling to dynamic process.
The present invention describes a kind of chemical process fault detection method comprising missing data, to have the data of missing values Integrate auto-correlation, the cross-correlation information extracted between sample as modeling sample, and establishes fault detection side on the basis of this model Method, to realize the process monitoring of chemical production process.
The beneficial effects of the present invention are: by the data matrix with missing values that will be sampled in chemical production process, it is first Initial value first is carried out to missing values to fill up, and the non-Gauss information of data set is then extracted using independent component analysis (ICA), by non-height It is new growth data matrix that residual matrix after this information extraction is expanded according to time-axis direction, with extractability dynamic auto-correlation Characteristic is established iteration autoregression dynamic latent variable model and is accurately estimated using expectation-maximization algorithm and Kalman filtering Model parameter and missing values are counted, Gauss information in data is obtained.Compared to current other Monitoring of Chemical methods, the present invention is not It only can handle the sampled data set with missing values, and can completely extracts non-gaussian and Gauss information in data, pole The effect of big raising chemical production process fault detection, reduces the rate of false alarm and rate of failing to report of failure, while largely The predictive ability for improving model improves the science and validity of the fault detection method based on the invention.
Detailed description of the invention
Fig. 1 is the chemical process fault detection method flow chart that the present invention includes missing data.
Specific embodiment
With reference to Fig. 1, this method is a kind of chemical process fault detection method comprising missing data, and this method is directed to chemical industry The fault detection problem of process collects the data that missing values are had under nominal situation first with Distributed Control System, establishes only Vertical constituent analysis and iteration autoregression dynamic latent variable model, it may be assumed that independent component analysis-recurrence autoregression dynamic latent variable model (or independent component analysis-iteration autoregression dynamic latent variable model).The model structure is estimated by expectation-maximization algorithm It arrives.On this basis, three monitoring and statistics amount I are constructed based on this model2、T2, SPE and its corresponding statistics limitAnd SPElimit.The process data newly sampled is monitored, estimates test specimens using existing model structure This individual features variable, and its corresponding statistic is calculated, and obtain final failure detection result.
A kind of chemical process fault detection method comprising missing data of the present invention, comprising the following steps: step 1: utilizing Distributed Control System collects the missing values that have that chemical production process operates normally, and (there are many reason of causing shortage of data, are mostly Accidentalia causes or equipment fault causes, so, in the present invention in the data of moment acquisition (a line), missing values can To be one or more;In the data of same variable acquisition (column), missing values are also possible to one or more) number According to composition training sample data collection is modeled, it is assumed that is collected the normal sample with a certain amount of missing data, is defined as original Beginning data set YO:
Wherein ymIndicate that missing values, sampling sum are K, total number of variable N;
Step 2: to above-mentioned YOCarry out whitening pretreatment:
By above-mentioned data set YOIt is handled by column (i.e. variable), it is assumed that the observation of certain column is yv=[y1 … yj-1 ym yj+1 … yK]T, then by missing values y contained by the columnmIt removes, obtainsThen rightInto Row whitening pretreatment eliminates its difference in horizontal and dimension.Remaining column is handled in the same manner.Finally obtain albefaction Pretreated data set
Step 3: missing values present in data set after the whitening pretreatment obtained to second step are estimated and are filled out It mends:
The data (removing all rows with missing values) for removing all place moment acquisitions with missing values, obtain Regular data collection:
Wherein M is that there is no the line numbers of missing data.
Based on data set YrEstablish independent component analysis model:
Yr=SJT (3)
Wherein,For hybrid matrix,For independent element matrix,To solve mixed matrix, For the estimated value of independent element matrix;R2For independent element number.
Seek model parameter J, S, W.
Certain row observation y=[y1 … yj_1 yj yj+1 … yN], define yjFor missing values, i.e. yj=ym, in order to estimate The missing values of sampled data estimate the score s of missing data using regular data collectionT, formula is as follows:
Wherein,To remove the observation vector after missing values,For hybrid matrix J Remove that there are the new matrixes that missing values are expert at.
The estimated value of missing sample can be calculated with following formula:
pjFor the jth row of J;
Gained y is calculated with above formulajFill up YOIn corresponding missing values similarly estimated with this method each there are missing values Row, obtains complete training dataset Yen, wherein Yen∈RK×N, while recording missing values position;
Step 4: extracting complete training dataset Y using independent component analysis (ICA) algorithmenNon-Gauss information.
To complete training dataset YenCentralization processing is carried out, makes its mean value 0, obtains Y0∈RK×N, select to need to estimate The number R of the isolated component of meter1
Then independent component analysis model is established, the independent element matrix of model is obtainedHybrid matrixSolve mixed matrix
Step 5: the independent element being calculated after restraining according to the independent component analysis model that the 4th step is established, model Matrix S0Obtain training sample independent element inormal, construct I2Statistic is as follows:
I2=inormal T×inormal (8)
By I2Sequence arranges from small to large, takes I2The control of statistic is limited to numerical value at (0.95 × K).
Meanwhile the prediction error of computation model(i.e. residual error data collection):
Step 6: to the residual error data collection of above-mentioned independent component analysis modelIt is pre-processed and is normalized, so that each The mean value of process variable is 0, variance 1;
Step 7: to the residual error data collection of above-mentioned LDPC codeBased on iteration autoregression dynamic hidden variable (R- ARDLV) model is modeled:
(1) initialization model parameter;
(2) based on current residual error data collectionEstablish dynamic latent variable model;
xk=Azk-1+wk (10)
yk=Cxk+vk (11)
Wherein: yk∈RBFor the observation at k moment, B ykDimension, ykBy residual error data collectionIt obtains;xkIt is that dynamic is hidden Variable, xk∈RD, D is lower-dimensional subspace dimension;C is corresponding matrix of loadings, C ∈ RB×D;wkFor dynamic noise, and it is assumed to white Noise, and its noise variance is Q;The relationship for establishing dynamic hidden variable Yu its past value obtains dynamic factor vector zk-1= [xk-1 T xk-2 T ... xk-L T]T∈RDL, wherein L is kinetic order;A is state-transition matrix, A ∈ RD×DL;vkIt makes an uproar for observation Sound, Gaussian distributed, and its noise variance are R;
In order to which subsequent calculating is easy, above-mentioned dynamic latent variable model equal value exchange obtains:
Wherein For new state-transition matrix, newly Matrix of loadings, noiseWherein
Step 8: being estimated dynamic hidden variable and being calculated likelihood function using expectation maximization (EM) algorithm, calculate The updated value Log of likelihood functionnewWith its former likelihood function value LogoldDifference, judge whether likelihood function restrains | | Lognew- Logold||21, if so, going to the 9th step;Model parameter and the 8th step is jumped to if it is not, updating, wherein ε1For likelihood function receipts The threshold value held back:
Wherein, expectation maximization (EM) algorithm updates model parameter, is joined using Kalman filtering by "current" model in E step The posterior probability of number estimation hidden variable;Later parameters are directed to by likelihood function in M step respectively and seek single order local derviation, enables local derviation It is 0, obtains the updated value of model parameter;Finally, iterate E step and M step are until reach the condition of convergence.
It is calculated in E step by Kalman filtering:
WhereinFor the expectation variance for predicting the k moment based on the k-1 moment;For based on the k-1 moment Variance;KkFor kalman gain;WithFor corresponding moment zkExpectation and variance;For the state at k-1 moment Estimated value;For based on complete observation collectionThe second moment of obtained k moment dynamic factor;For Based on complete observation collectionThe second moment of obtained k moment dynamic hidden variable;For the update of k moment dynamic hidden variable;For xkPrediction variance, and it can be fromMiddle acquisition, and
K is total sample number;
The current likelihood function of computation model:
Wherein logP (X, Y) is X, the joint log-likelihood function of Y;μLFor the initial value of mean value;VLFor the initial of variance Value;zLFor the initial value of dynamic factor;Cons tant refers to constant term;
In M step, model parameter is updated by way of maximizing likelihood function, formula is as follows:
WhereinWithFor the estimated value of the initial value of model dynamic factor, i.e.,For μLEstimated value,For VLEstimate Evaluation;For the updated value of the dynamic factor initial value based on all K observations;For based on residual error data collectionThe second moment of obtained dynamic factor initial value;For based on residual error data collectionObtain the dynamic at k-1 moment because The second moment of sub- initial value;For based on residual error data collectionThe second moment of obtained k moment dynamic hidden variable; Indicate the update of the current dynamic hidden variable obtained based on all observation;For according to observationIt obtains 'sWithExpectation, remember "current" model parameterIt iterates, until likelihood function is received It holds back.
Step 9: being estimated using Kalman filtering to corresponding missing values according to the position of the missing values of second step record It counts and updates;The updated value θ of computation model parameternewWith its master mould parameter θoldDifference, whether judgment models parameter restrain: ||θnewold||22, if so, model training finishes, monitored on-line in next step;If it is not, the 7th step is jumped to, wherein ε2For the convergent threshold value of model parameter, concrete operations are as follows:
Under "current" model parameter, by ARDLV model prediction missing values, Kalman filtering is pre- as an accurate step It surveys, missing values are calculated by following equation:
WhereinFor the updated value of the dynamic factor based on the k-1 moment one-step prediction k moment,Rc,For current model parameter.The missing values that respective record position is updated by missing values that above-mentioned formula is calculated, until Model parameter convergence.
Step 10: estimating the expectation of the hidden variable of training sample according to the iteration autoregression dynamic latent variable model of foundation Value t, the variance var of hidden variable-1(t | x) and model predictive error E, construct corresponding T2It is limited with SPE statistic monitoring and statistics:
According to what is be calculated after the convergence of the 8th step modelTraining sample is obtained to hide The desired value t of variablenormal, calculation formula are as follows:
Utilize the desired value t of hidden variablenormal, T can be constructed2Statistic is as follows:
T2=tnormal Tvar-1(tnormal|xnormal)tnormal (33)
T2The control of statistic is limited by χ2Distribution is estimated as follows:
Wherein, G is the number of hidden variable.
Meanwhile the prediction error based on model, SPE statistic can be also constructed with the variation in reaction model residual error space:
Further deriving can obtain
SPE=ETvar-1(E|xnormal)E (36)
The control of SPE statistic limits estimation method are as follows:Wherein,
Gh=mean (SPE) (37)
2g2H=var (SPE) (38)
Wherein g, h are chi square distribution parameter, and mean () is mean operation symbol, and var () is variance oeprator.
Step 11: collecting sample data in new chemical production process as test data set, and pre-processed And normalization;
Step 12: using independent component analysis and iteration autoregression dynamic latent variable model is based on to test chemical process It is monitored on-line, calculates the I of test sample2、T2, SPE statistic, judge whether it is more than statistics limit, obtains the current chemical industry The on-line monitoring of production process as a result, its detailed process is as follows:
As test data set YtestAfter collected, it is input to independent component analysis model:
Construct I2Statistic is as follows:
By the residual error after independent component analysis model treatmentIt is input to iteration autoregression dynamic latent variable model, by card The desired value t of test set hidden variable is calculated in Kalman Filteringtest:
Wherein xk_testFor the dynamic hidden variable of current time test set;For the hidden change of dynamic of current time test set The predicted value of amount;zk-1_testFor the dynamic factor of last moment test set;
Utilize the desired value t of hidden variabletestBuildingStatistic is as follows:
Ttest 2=ttest Tvar-1(ttest|Xtest)ttest (42)
Wherein var-1(ttest|xtest) be hidden variable variance.
Meanwhile the Gauss based on model predicts error, can also construct SPEtestStatistic is to reflect model
The information of Gauss residual error:
Wherein, yk_testFor the observation for current time test set;For the predicted value of current time test set;
Further deriving can obtain:
SPEtest=Etest Tvar-1(Etest|xtest)Etest (44)
Judge whether it is more than statistics limit, obtains the on-line monitoring result of chemical production process.

Claims (8)

1. a kind of chemical process fault detection method comprising missing data, which comprises the following steps:
(1) data of the chemical process to be detected when operating normally are collected, the raw data set Y with missing values is obtainedo, Yo∈ RK×N, in which: K is sampling sum, and N is total number of variable;
(2) to raw data set YoCarry out whitening pretreatment, the data set after obtaining whitening pretreatment
(3) to data setIn missing values estimated and filled up, obtain complete data set Yen, Yen∈RK×N, remember simultaneously Record lower missing values position;
(4) based on obtained data set Yen, independent component analysis model is constructed, independent element matrix S is obtained0,It is mixed Close matrix J0,Solve mixed matrix W0,R1For the number of independent element;
(5) according to the independent component analysis model of building, residual error data collection is obtainedCorresponding I is constructed simultaneously2The control of statistic System limit;
(6) it is based on residual error data collectionConstruct iteration autoregression dynamic latent variable model;
(7) according to the iteration autoregression dynamic latent variable model of foundation, corresponding T is acquired2Control with SPE statistic limits;
(8) sample data in new chemical production process is collected as test sample, and is normalized;
(9) the iteration autoregression dynamic hidden variable obtained according to the independent component analysis model of step (4) building and step (6) Model, the I of the test sample after calculating normalization2、T2, SPE statistic, judge whether it is more than corresponding control limit, is worked as The on-line monitoring result of the preceding chemical production process.
2. the chemical process fault detection method according to claim 1 comprising missing data, which is characterized in that step (2) to raw data set Y inoWhen carrying out whitening pretreatment, column based processing, for a certain column, if comprising lacking in the column data Mistake value then removes all missing values, carries out whitening pretreatment to remaining data.
3. the chemical process fault detection method according to claim 1 comprising missing data, which is characterized in that step (3) missing values are estimated using independent composition analysis algorithm in.
4. the chemical process fault detection method according to claim 3 comprising missing data, which is characterized in that missing The step of value is estimated specifically includes:
(3-1) is based on data setRemove all place moment corresponding data lines with missing values, obtains regular data collection Yr, Yr∈RM×N, M is that there is no the line numbers of missing data;
(3-2) is based on data set YrIndependent component analysis model is established, its corresponding hybrid matrix J is acquired,It is independent Component matrix S,Mixed matrix W is solved,R2The number of independent element;
(3-3) utilizes regular data collection YrAnd the hybrid matrix J acquired, acquire the score s of missing valuesT
(3-4) utilizes score sTAnd hybrid matrix J obtains the estimated value of missing values;
It is obtained using step (3-1)~(3-4)In corresponding missing values, obtain complete data set Yen
5. the chemical process fault detection method according to claim 1 comprising missing data, which is characterized in that step (5) in, the residual error data collectionIt is obtained by following formula:
6. the chemical process fault detection method according to claim 1 comprising missing data, which is characterized in that step (6) in, construct iteration autoregression dynamic latent variable model the step of include:
(6-1) initialization model parameter;
(6-2) is based on current residual error data collectionEstablish dynamic latent variable model;
(6-3) is estimated using hidden variable of the expectation-maximization algorithm to model, and calculates likelihood function;Judge likelihood function Whether restrain: if so, jumping to step (6-4);If it is not, updating model parameter, step (6-2) is jumped to;
(6-4) according to the positions of the missing values recorded in step (3), based on the iteration autoregression dynamic hidden variable mould constructed Type predicts missing values again, updates residual error data collection
(6-5) is based on updating residual error data collectionComputation model parameter judges whether to restrain: if so, entering step (7);If it is not, Model parameter is updated, step (6-2) is jumped to.
7. the chemical process fault detection method according to claim 6 comprising missing data, which is characterized in that step In (6-3), E step using Kalman filtering by "current" model parameter Estimation hidden variable posterior probability, M step in by likelihood function Single order local derviation is sought for parameters respectively, enabling local derviation is 0, obtains the updated value of model parameter.
8. the chemical process fault detection method according to claim 6 comprising missing data, which is characterized in that step In (6-4), corresponding missing values are estimated and updated using Kalman filtering.
CN201810734994.8A 2018-07-06 2018-07-06 Chemical process fault detection method containing missing data Active CN108960329B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810734994.8A CN108960329B (en) 2018-07-06 2018-07-06 Chemical process fault detection method containing missing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810734994.8A CN108960329B (en) 2018-07-06 2018-07-06 Chemical process fault detection method containing missing data

Publications (2)

Publication Number Publication Date
CN108960329A true CN108960329A (en) 2018-12-07
CN108960329B CN108960329B (en) 2020-11-06

Family

ID=64484244

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810734994.8A Active CN108960329B (en) 2018-07-06 2018-07-06 Chemical process fault detection method containing missing data

Country Status (1)

Country Link
CN (1) CN108960329B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209145A (en) * 2019-05-16 2019-09-06 浙江大学 One kind being based on the approximate carbon dioxide absorption tower method for diagnosing faults of nuclear matrix
CN110738259A (en) * 2019-10-16 2020-01-31 电子科技大学 fault detection method based on Deep DPCA-SVM
CN111142501A (en) * 2019-12-27 2020-05-12 浙江科技学院 Fault detection method based on semi-supervised autoregressive dynamic hidden variable model
CN111220565A (en) * 2020-01-16 2020-06-02 东北大学秦皇岛分校 CPLS-based infrared spectrum measuring instrument calibration migration method
CN112542848A (en) * 2020-11-02 2021-03-23 中国南方电网有限责任公司超高压输电公司广州局 State estimation method and device for extra-high voltage flexible direct current transmission system
CN113743489A (en) * 2021-08-26 2021-12-03 上海应用技术大学 Process industrial process fault detection method based on data loss

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6678569B2 (en) * 1999-03-19 2004-01-13 International Business Machines Corporation User configurable multivariate time series reduction tool control method
CN105404280A (en) * 2015-12-11 2016-03-16 浙江科技学院 Industrial process fault detection method based on autoregression dynamic hidden variable model
CN107153409A (en) * 2017-06-02 2017-09-12 宁波大学 A kind of nongausian process monitoring method based on missing variable modeling thinking
CN107272655A (en) * 2017-07-21 2017-10-20 江南大学 Batch process fault monitoring method based on multistage ICA SVDD
CN108181893A (en) * 2017-12-15 2018-06-19 宁波大学 A kind of fault detection method based on PCA-KDR

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6678569B2 (en) * 1999-03-19 2004-01-13 International Business Machines Corporation User configurable multivariate time series reduction tool control method
CN105404280A (en) * 2015-12-11 2016-03-16 浙江科技学院 Industrial process fault detection method based on autoregression dynamic hidden variable model
CN107153409A (en) * 2017-06-02 2017-09-12 宁波大学 A kind of nongausian process monitoring method based on missing variable modeling thinking
CN107272655A (en) * 2017-07-21 2017-10-20 江南大学 Batch process fault monitoring method based on multistage ICA SVDD
CN108181893A (en) * 2017-12-15 2018-06-19 宁波大学 A kind of fault detection method based on PCA-KDR

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JINLIN ZHU ET AL: "Non-Gaussian Industrial Process Monitoring With Probabilistic Independent Component Analysis", 《IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING》 *
LE ZHOU ET AL: "Autoregressive Dynamic Latent Variable Models for Process Monitoring", 《IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY》 *
孙怀宇: "EM-PCA在化工过程随机缺失数据补值中的应用研究", 《计算机与应用化学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209145A (en) * 2019-05-16 2019-09-06 浙江大学 One kind being based on the approximate carbon dioxide absorption tower method for diagnosing faults of nuclear matrix
CN110209145B (en) * 2019-05-16 2020-09-11 浙江大学 Carbon dioxide absorption tower fault diagnosis method based on nuclear matrix approximation
CN110738259A (en) * 2019-10-16 2020-01-31 电子科技大学 fault detection method based on Deep DPCA-SVM
CN110738259B (en) * 2019-10-16 2022-03-25 电子科技大学 Fault detection method based on Deep DPCA-SVM
CN111142501A (en) * 2019-12-27 2020-05-12 浙江科技学院 Fault detection method based on semi-supervised autoregressive dynamic hidden variable model
CN111142501B (en) * 2019-12-27 2021-10-22 浙江科技学院 Fault detection method based on semi-supervised autoregressive dynamic hidden variable model
CN111220565A (en) * 2020-01-16 2020-06-02 东北大学秦皇岛分校 CPLS-based infrared spectrum measuring instrument calibration migration method
CN111220565B (en) * 2020-01-16 2022-07-29 东北大学秦皇岛分校 CPLS-based infrared spectrum measuring instrument calibration migration method
CN112542848A (en) * 2020-11-02 2021-03-23 中国南方电网有限责任公司超高压输电公司广州局 State estimation method and device for extra-high voltage flexible direct current transmission system
CN113743489A (en) * 2021-08-26 2021-12-03 上海应用技术大学 Process industrial process fault detection method based on data loss
CN113743489B (en) * 2021-08-26 2023-09-29 上海应用技术大学 Data loss-based fault detection method for process industrial process

Also Published As

Publication number Publication date
CN108960329B (en) 2020-11-06

Similar Documents

Publication Publication Date Title
CN108960329A (en) A kind of chemical process fault detection method comprising missing data
CN105719312B (en) Multi-object tracking method based on sequential Bayesian filter and tracking system
CN109992921B (en) On-line soft measurement method and system for thermal efficiency of boiler of coal-fired power plant
CN106092625B (en) The industrial process fault detection method merged based on amendment type independent component analysis and Bayesian probability
CN107153409B (en) A kind of nongausian process monitoring method based on missing variable modeling thinking
CN109407649B (en) Fault type matching method based on fault characteristic variable selection
CN105607631B (en) The weak fault model control limit method for building up of batch process and weak fault monitoring method
CN103678869A (en) Prediction and estimation method of flight parameter missing data
CN115495991A (en) Rainfall interval prediction method based on time convolution network
CN105046046B (en) A kind of Ensemble Kalman Filter localization method
CN106404442B (en) The industrial process fault detection method kept based on data neighborhood feature and non-neighboring characteristic of field
CN104198912A (en) Data mining-based hardware circuit FMEA (Failure Mode and Effects Analysis) method
CN111324110A (en) Fermentation process fault monitoring method based on multiple shrinkage automatic encoders
CN103488561A (en) kNN (k-nearest neighbor) fault detection method for online upgrading master sample model
CN103885867B (en) Online evaluation method of performance of analog circuit
CN109144039A (en) A kind of batch process fault detection method keeping extreme learning machine based on timing extension and neighborhood
CN116578870A (en) Distribution network voltage abnormal data filling method based on fluctuation cross-correlation analysis
CN110516890B (en) Crop yield monitoring system based on gray combined model
CN109325065B (en) Multi-sampling-rate soft measurement method based on dynamic hidden variable model
CN102194106A (en) Human face recognition method used in door access system
CN109033205B (en) Aerospace test data checking device based on data input behavior analysis
CN108470699B (en) intelligent control system of semiconductor manufacturing equipment and process
CN108170648A (en) A kind of nongausian process monitoring method returned based on given data
CN113543026A (en) Multi-floor indoor positioning method based on radial basis function network
CN105373805A (en) A multi-sensor maneuvering target tracking method based on the principle of maximum entropy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant