CN112650063A - Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression - Google Patents

Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression Download PDF

Info

Publication number
CN112650063A
CN112650063A CN202011614387.1A CN202011614387A CN112650063A CN 112650063 A CN112650063 A CN 112650063A CN 202011614387 A CN202011614387 A CN 202011614387A CN 112650063 A CN112650063 A CN 112650063A
Authority
CN
China
Prior art keywords
semi
supervised
regression model
gaussian mixture
component
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011614387.1A
Other languages
Chinese (zh)
Other versions
CN112650063B (en
Inventor
宋执环
李德阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011614387.1A priority Critical patent/CN112650063B/en
Publication of CN112650063A publication Critical patent/CN112650063A/en
Application granted granted Critical
Publication of CN112650063B publication Critical patent/CN112650063B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • G05B13/042Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators in which a parameter or coefficient is automatically adjusted to optimise the performance

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Automation & Control Theory (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixed regression, which comprises the steps of firstly using an incremental Gaussian mixed regression model, and carrying out prediction estimation on quality variables which are difficult to measure in real time in a time-varying industrial process by selecting a group of process variables which have relatively large correlation with key quality variables and are easy to measure as input of the model. In order to solve the influence of rare phenomena of labeled samples on the prediction accuracy of the model, which widely exist in the industrial process, the incremental Gaussian mixture regression model is expanded to a semi-supervised incremental Gaussian mixture regression model. The method can effectively face the nonlinear, non-Gaussian and time-varying characteristics in the actual industrial process, can also effectively solve the problem of inaccurate model parameter learning caused by rare labeled samples in the industrial process, relieves overfitting of the model to a certain extent, improves the model updating efficiency and achieves the aim of self-adaptive soft measurement for the key variable.

Description

Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression
Technical Field
The invention belongs to the field of prediction and control of industrial processes, and particularly relates to a self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixed regression.
Background
In the actual industrial production process, more or less key process variables often cannot be detected on line, and in order to solve the problem, a mathematical model which takes the variables as input and the key process variables as output is constructed by acquiring the variables which are easy to detect in the process and according to a certain optimal standard side, so that the online estimation of the key process variables is realized, which is the common soft measurement modeling in the industrial process.
The development of the statistical process soft measurement modeling method has remarkable requirements on large-scale industrial data, wherein the Gaussian mixture regression model can well solve the nonlinear and non-Gaussian characteristics in the industrial process, and is widely applied to prediction of industrial quality variables. However, there are some problems with soft measurement modeling today. In most industrial processes, the physical and chemical characteristics of the process are constantly changed due to various factors such as change of process environment, aging of platform instruments and equipment, change of raw material feeding, degradation of catalyst activity and the like, so that the operation condition of the process is constantly changed. In order to correctly track the process state, the soft measurement model needs to be adaptively updated and corrected in time. Meanwhile, a large amount of industrial data is needed in the data-driven soft measurement modeling method, and the soft measurement modeling process always assumes that the acquired input samples and the acquired output samples are in one-to-one correspondence. However, in the actual industrial production process, due to the limitation of production environment and technical means, some key product quality variables can not be directly measured on site. Therefore, the data actually collected only has a small amount of labeled data, and most samples are unlabeled samples containing only auxiliary variables. The traditional soft measurement modeling method can only use the small part of labeled samples to model, and abandons a large amount of unlabeled samples. When a small amount of labeled samples are used for training a model, results of inaccurate model parameter training and poor model generalization capability can be caused, so that the prediction effect is difficult to guarantee, and meanwhile, a large amount of useful information contained in unlabeled samples can be wasted.
Therefore, the method aims to solve the defects of the soft measurement model in the analysis, namely, the method is used for solving the time-varying characteristic in the industrial process and fully utilizing the mass non-label data information in the production process. The method of the invention
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a self-adaptive soft measurement method based on a semi-supervised incremental Gaussian mixture regression model, which is expanded into the semi-supervised incremental Gaussian mixture regression model on the basis of the incremental Gaussian mixture regression model, so that the model can continuously learn new knowledge from new sample data mixed with labels and without labels, the previously learned knowledge can be stored, statistically equivalent components are fused into one component when the semi-supervised incremental Gaussian mixture regression model is updated, the compactness of the model is kept, and overfitting is avoided. Assuming that a linear relation exists between a process variable and a quality variable, learning probability density functions, regression relation coefficients and mixing coefficients of all components through an expectation-maximization (EM) algorithm, and selecting a model by using a Bayesian Information Criterion (BIC) effectively solves the problem of inaccurate model parameter learning caused by rare labeled samples in an industrial process.
The purpose of the invention is realized by the following technical scheme: a self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression specifically comprises the following steps:
(1) tagged data set of historical operating conditions in industrial processes
Figure BDA0002876031940000021
And unlabeled datasets
Figure BDA0002876031940000022
Forming an initial training data set, wherein l represents a label of the label, i represents an index of the labeled data set, and xiDenotes the ith sample, y, in the labeled datasetiRepresents a label, nlDenotes the number of samples of the tagged dataset, j denotes the index of the untagged dataset, u denotes the tag of the untagged, xjDenotes the jth sample, n, in the unlabeled datasetuRepresenting the number of samples of the unlabeled dataset;
(2) standardizing the initial training data set collected in the step (1) with the mean value of 0 and the variance of 1 to obtain a standardized data set
Figure BDA0002876031940000023
(3) Learning semi-supervised Gaussian mixture regression model parameters theta by adopting an EM algorithm in an iterative mode, wherein the semi-supervised Gaussian mixture regression model parameters theta comprise: prior probability alpha of kth component in semi-supervised Gaussian mixture regression modelkMean vector mu of kth component in semi-supervised Gaussian mixture regression modelkCovariance matrix sigma of kth component in semi-supervised Gaussian mixed regression modelkRegression coefficient omega of kth component in semi-supervised Gaussian mixture regression modelkRegression coefficient of kth component label in semi-supervised Gaussian mixture regression model
Figure BDA0002876031940000024
Measuring noise variance
Figure BDA0002876031940000025
The method specifically comprises the following substeps:
(3.1) obtaining the posterior probability distribution of the hidden variables of the labeled samples respectively by using linear Gaussian operation
Figure BDA0002876031940000026
And the hidden variable posterior probability distribution of unlabeled samples
Figure BDA0002876031940000027
Figure BDA0002876031940000028
Figure BDA0002876031940000029
Wherein z isiIs the hidden variable of the ith labeled sample, zjHidden variable for jth unlabeled sample, RikRepresenting a semi-supervised Gaussian mixture regression modelThe k component in the generation of the posterior probability, R, of the ith labeled samplejkRepresenting the posterior probability of the kth component in the semi-supervised Gaussian mixed regression model in generating the jth unlabeled sample; p represents the probability of the kth component from 0 to 1 in the semi-supervised gaussian mixture regression model,
Figure BDA0002876031940000031
represents the mean vector of the labeled samples in the kth component in the semi-supervised gaussian mixture regression model,
Figure BDA0002876031940000032
representing a covariance matrix of labeled samples in the kth component of the semi-supervised gaussian mixture regression model,
Figure BDA0002876031940000033
represents the mean vector of the k component without label in the semi-supervised Gaussian mixture regression model,
Figure BDA0002876031940000034
represents the unlabeled covariance matrix in the kth component of the semi-supervised gaussian mixture regression model,
Figure BDA0002876031940000035
representing a kth gaussian distribution based on the mean vector and the covariance matrix;
(3.2) posterior probability distribution of hidden variables obtained by the step (3.1)
Figure BDA0002876031940000036
And
Figure BDA0002876031940000037
to calculate a corresponding log-likelihood function, said log-likelihood function
Figure BDA0002876031940000038
Comprises the following steps:
Figure BDA0002876031940000039
wherein z represents an implicit variable of the sample,
maximizing log-likelihood functions for each semi-supervised Gaussian mixture regression model parameter
Figure BDA00028760319400000310
Figure BDA00028760319400000311
Wherein β is a lagrange multiplier;
estimating the updated value of the parameters of the semi-supervised Gaussian mixed regression model:
Figure BDA00028760319400000312
Figure BDA00028760319400000313
Figure BDA00028760319400000314
Figure BDA00028760319400000315
Figure BDA00028760319400000316
wherein the content of the first and second substances,
Figure BDA00028760319400000317
in order to be a set of regression coefficients,
Figure BDA00028760319400000318
Figure BDA00028760319400000319
is a set of hidden variable posterior probability distributions for labeled exemplars,
Figure BDA00028760319400000320
Figure BDA00028760319400000321
in the form of a matrix of tagged data sets,
Figure BDA00028760319400000322
i is 1 and dimension is nlThe vector of (a) is determined,
Figure BDA00028760319400000323
(3.3) calculating a normalized data set based on the updated values of the parameters of the semi-supervised Gaussian mixture regression model estimated in step (3.2)
Figure BDA00028760319400000324
Log likelihood function of
Figure BDA00028760319400000325
Repeating steps (3.1) - (3.2) until the log-likelihood function
Figure BDA00028760319400000326
When convergence occurs, the parameters of the semi-supervised Gaussian mixture regression model at the moment are the parameters of the final semi-supervised Gaussian mixture regression model;
(4) and (3) predicting quality variables by using the final semi-supervised Gaussian mixture regression model parameters:
Figure BDA0002876031940000041
wherein the content of the first and second substances,
Figure BDA0002876031940000042
predicting the expected value of data x for a given band, y being a quality variable, RxkRepresenting the probability value of the data to be predicted belonging to each component of the semi-supervised Gaussian mixture regression model,
Figure BDA0002876031940000043
Figure BDA0002876031940000044
representing the probability mean value of each component of the semi-supervised Gaussian mixture regression model belonging to the data x to be predicted,
Figure BDA0002876031940000045
Figure BDA0002876031940000046
(5) collecting new labeled and unlabeled mixed data in the same proportion as training data, training the mixed data into a semi-supervised incremental Gaussian mixed regression model according to the processes of the steps (2) to (3), and storing parameters of the semi-supervised incremental Gaussian mixed regression model and the number of the training data into a historical database;
(6) calculating each component in the semi-supervised Gaussian mixture regression model in the step (3) and each component in the semi-supervised incremental Gaussian mixture regression model in the step (5) pairwise to obtain a symmetrical Kullback-Leibler divergence SKLD, judging whether the SKLD value exceeds 10, keeping the original mean vector and covariance unchanged when the SKLD value exceeds 10, and updating the mixture weight of the components; when the SKLD value is less than 10, fusing; the calculation process of the SKLD value is as follows:
Figure BDA0002876031940000047
wherein phi is1Parameter set of a certain component in semi-supervised Gaussian mixture regression model, phi2Is a parameter set, mu, of a corresponding component in a semi-supervised incremental Gaussian mixture regression model1Is a mean vector of a certain component in a semi-supervised Gaussian mixed regression model2Is a semi-supervisorMean vectors, sigma, of corresponding components in the Du-delta Gaussian mixture regression model1Is a covariance matrix, sigma, of a certain component in a semi-supervised Gaussian mixed regression model2The covariance matrix of corresponding components in the semi-supervised incremental Gaussian mixed regression model is obtained, and KLD is relative entropy;
(7) and (5) continuously repeating the steps (5) to (6) along with the inflow of the data to be measured, so that the self-adaptive quality prediction of the industrial process can be realized.
Further, the fusion process of step (6) is:
Figure BDA0002876031940000048
Figure BDA0002876031940000049
Figure BDA00028760319400000410
wherein mu is a mean vector of the components in the fusion semi-supervised increment Gaussian mixed regression model, and sigma is a covariance of the components in the fusion semi-supervised increment Gaussian mixed regression model,
Figure BDA00028760319400000411
the method is characterized in that the mixed weight of components in a post-half-supervision incremental Gaussian mixed regression model is fused, N represents the total number of original sample data, mujThe mean vector of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data is shown,
Figure BDA0002876031940000051
represents the mixing weight, sigma of the jth component in the semi-supervised Gaussian mixed regression model of the initial training data samplejRepresents the covariance, M, of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data samplekSemi-supervised incremental gaussians representing new sample trainingNumber of samples, μ, of the kth component in the mixed regression modelkSum ΣkRespectively, the mean vector and the covariance matrix of the kth component in the corresponding semi-supervised incremental gaussian mixture regression model.
Further, the step (6) of updating the mixing weights of the components comprises the following steps:
when the remaining components belong to a new sample:
Figure BDA0002876031940000052
when the remaining components belong to the initial training data sample:
Figure BDA0002876031940000053
compared with the prior art, the invention has the beneficial effects that: in order to fully utilize a large amount of non-label data information in the industrial production process, the invention provides semi-supervised incremental Gaussian mixed regression on the basis of the incremental Gaussian mixed regression. In semi-supervised incremental gaussian mixture regression, it is assumed that process variables and quality variables have a linear relationship, while model learning is engaged with labeled and unlabeled data. The probability density function, the regression relation coefficient and the mixing coefficient of each component are learned through an EM (effective memory) algorithm, and model selection is performed by utilizing a Bayesian information criterion, so that the problems of model overfitting and inaccurate parameter learning caused by rare labeled data are effectively solved, and the prediction accuracy of the model is improved. On the basis of the incremental Gaussian mixture regression model, the problem of inaccurate model parameter learning caused by rare labeled samples in the industrial process is solved. Compared with other traditional self-adaptive soft measurement models, the method has the advantages that overfitting is relieved, prediction errors are reduced, and model updating efficiency is improved.
Drawings
FIG. 1 is a diagram of an adaptive soft-sensing method of a semi-supervised incremental Gaussian mixture regression model of the present invention;
FIG. 2 is a process block diagram of a primary reformer;
FIG. 3 is a graph of the predicted effect of the method of the present invention on a segment of a furnace industrial process.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
Fig. 1 is a diagram of an adaptive soft measurement method of a semi-supervised incremental gaussian mixture regression model according to the present invention, and the adaptive soft measurement method specifically includes the following steps:
(1) tagged data set of historical operating conditions in industrial processes
Figure BDA0002876031940000054
And unlabeled datasets
Figure BDA0002876031940000055
Forming an initial training data set, wherein l represents a label of the label, i represents an index of the labeled data set, and xiDenotes the ith sample, y, in the labeled datasetiRepresents a label, nlDenotes the number of samples of the tagged dataset, j denotes the index of the untagged dataset, u denotes the tag of the untagged, xjDenotes the jth sample, n, in the unlabeled datasetuRepresenting the number of samples of the unlabeled dataset;
(2) standardizing the initial training data set collected in the step (1) with the mean value of 0 and the variance of 1 to obtain a standardized data set
Figure BDA0002876031940000061
(3) Assuming that there are K gaussian components in the semi-supervised gaussian mixture regression model, the Probability Density Function (PDF) of the K gaussian component for x and the functional dependence of y on x are defined as:
Figure BDA0002876031940000062
Figure BDA0002876031940000063
wherein P isk(x) Represented as a Probability Density Function (PDF) of the auxiliary variable x,
Figure BDA0002876031940000064
represented as mean-based vectors
Figure BDA0002876031940000065
Sum covariance matrix
Figure BDA0002876031940000066
A gaussian distribution of (a). OmegakAnd
Figure BDA0002876031940000067
representing the regression coefficient between the auxiliary variable x and the quality variable y,
Figure BDA0002876031940000068
representing the measurement noise of the mass variable y in the kth gaussian component.
Learning semi-supervised Gaussian mixture regression model parameters theta by adopting an EM algorithm in an iterative mode, wherein the semi-supervised Gaussian mixture regression model parameters theta comprise: prior probability alpha of kth component in semi-supervised Gaussian mixture regression modelkMean vector mu of kth component in semi-supervised Gaussian mixture regression modelkCovariance matrix sigma of kth component in semi-supervised Gaussian mixed regression modelkRegression coefficient omega of kth component in semi-supervised Gaussian mixture regression modelkRegression coefficient of kth component label in semi-supervised Gaussian mixture regression model
Figure BDA0002876031940000069
Measuring noise variance
Figure BDA00028760319400000610
The method specifically comprises the following substeps:
(3.1) obtaining the posterior probability distribution of the hidden variables of the labeled samples respectively by using linear Gaussian operation
Figure BDA00028760319400000611
And the hidden variable posterior probability distribution of unlabeled samples
Figure BDA00028760319400000612
Figure BDA00028760319400000613
Figure BDA00028760319400000614
Wherein z isiIs the hidden variable of the ith labeled sample, zjHidden variable for jth unlabeled sample, RikRepresenting the posterior probability, R, of the kth component in the semi-supervised Gaussian mixture regression model in generating the ith labeled samplejkRepresenting the posterior probability of the kth component in the semi-supervised Gaussian mixed regression model in generating the jth unlabeled sample; p represents the probability of the kth component from 0 to 1 in the semi-supervised gaussian mixture regression model,
Figure BDA00028760319400000615
represents the mean vector of the labeled samples in the kth component in the semi-supervised gaussian mixture regression model,
Figure BDA00028760319400000616
representing a covariance matrix of labeled samples in the kth component of the semi-supervised gaussian mixture regression model,
Figure BDA00028760319400000617
represents the mean vector of the k component without label in the semi-supervised Gaussian mixture regression model,
Figure BDA00028760319400000618
represents the unlabeled covariance matrix in the kth component of the semi-supervised gaussian mixture regression model,
Figure BDA00028760319400000619
representing the kth gaussian distribution based on the mean vector and covariance matrix.
(3.2) posterior probability distribution of hidden variables obtained by the step (3.1)
Figure BDA00028760319400000620
And
Figure BDA00028760319400000621
to calculate a corresponding log-likelihood function, said log-likelihood function
Figure BDA0002876031940000071
Comprises the following steps:
Figure BDA0002876031940000072
z represents an implicit variable of the sample,
maximizing log-likelihood functions for each semi-supervised Gaussian mixture regression model parameter
Figure BDA0002876031940000073
Figure BDA0002876031940000074
Wherein β is a lagrange multiplier;
estimating the updated value of the parameters of the semi-supervised Gaussian mixed regression model:
Figure BDA0002876031940000075
Figure BDA0002876031940000076
Figure BDA0002876031940000077
Figure BDA0002876031940000078
Figure BDA0002876031940000079
wherein the content of the first and second substances,
Figure BDA00028760319400000710
in order to be a set of regression coefficients,
Figure BDA00028760319400000711
Figure BDA00028760319400000712
is a set of hidden variable posterior probability distributions for labeled exemplars,
Figure BDA00028760319400000713
Figure BDA00028760319400000714
in the form of a matrix of tagged data sets,
Figure BDA00028760319400000715
i is 1 and dimension is nlThe vector of (a) is determined,
Figure BDA00028760319400000716
(3.3) calculating a normalized data set based on the updated values of the parameters of the semi-supervised Gaussian mixture regression model estimated in step (3.2)
Figure BDA00028760319400000717
Log likelihood function of
Figure BDA00028760319400000718
Repeating steps (3.1) - (3.2) until the log-likelihood function
Figure BDA00028760319400000719
When convergence occurs, the parameters of the semi-supervised Gaussian mixture regression model at the moment are the parameters of the final semi-supervised Gaussian mixture regression model;
(4) to achieve the prediction of the quality variables, semi-supervised Gaussian mixture regression model parameters are utilized
Figure BDA00028760319400000720
To calculate the joint probability density function of the auxiliary variable x and the quality variable y:
Figure BDA0002876031940000081
wherein
Figure BDA0002876031940000082
In each gaussian component, the conditional distribution of y after a given x is:
Figure BDA0002876031940000083
wherein the mean vector
Figure BDA0002876031940000084
Covariance matrix
Figure BDA0002876031940000085
The final conditional probability distribution of y can be expressed as:
Figure BDA0002876031940000086
wherein
Figure BDA0002876031940000087
Thus, given the auxiliary variable x, we can predict the quality variable y as:
Figure BDA0002876031940000088
wherein the content of the first and second substances,
Figure BDA0002876031940000089
predicting the expected value of data x for a given band, y being a quality variable, RxkRepresenting the probability value R of each component of the semi-supervised Gaussian mixture regression model belonging to the data to be predictedxkRepresenting the probability that the data x to be predicted belongs to each component of the semi-supervised gaussian mixed regression model,
Figure BDA00028760319400000810
represents the mean value of the corresponding components;
(5) after the collected real-time data are standardized, the quality of the new sample is predicted by using the steps (3) to (4) and the result is output
Figure BDA00028760319400000811
Obtaining a true output Y of the quality variable YnewThereafter, sample data was collected into the new data set Z, while quantitative evaluation of the predicted performance was performed using Root Mean Square Error (RMSE), as shown below:
Figure BDA00028760319400000812
where i 1, 2, N represents the total length of the test set, YiAnd
Figure BDA00028760319400000813
respectively representing the true value and the predicted value of the output quality variable. Will collect new labels in the same proportionTaking the label and non-label mixed data as training data, training the training data into a semi-supervised incremental Gaussian mixed regression model according to the processes in the steps (2) to (3), and storing parameters of the semi-supervised incremental Gaussian mixed regression model and the number of the training data into a historical database;
(6) calculating each component in the semi-supervised Gaussian mixture regression model in the step (3) and each component in the semi-supervised incremental Gaussian mixture regression model in the step (5) pairwise to obtain a symmetrical Kullback-Leibler divergence SKLD, judging whether the SKLD value exceeds 10, keeping the original mean vector and covariance unchanged when the SKLD value exceeds 10, and updating the mixture weight of the components; when the SKLD value is less than 10, fusing; the calculation process of the SKLD value is as follows:
Figure BDA00028760319400000814
Figure BDA0002876031940000091
wherein phi is1Parameter set of a certain component in semi-supervised Gaussian mixture regression model, phi2Is a parameter set, mu, of a corresponding component in a semi-supervised incremental Gaussian mixture regression model1Is a mean vector of a certain component in a semi-supervised Gaussian mixed regression model2Is the mean vector, sigma, of the corresponding component in the semi-supervised incremental Gaussian mixture regression model1Is a covariance matrix, sigma, of a certain component in a semi-supervised Gaussian mixed regression model2The covariance matrix of corresponding components in the semi-supervised incremental Gaussian mixed regression model is obtained, and KLD is relative entropy;
and when the calculated SKLD value is less than 10, judging that the original GMR component j and the new GMR component k are equivalent statistically and can be fused. The following is a parameter update formula fused into a new component:
Figure BDA0002876031940000092
Figure BDA0002876031940000093
Figure BDA0002876031940000094
wherein mu is a mean vector of the components in the fusion semi-supervised increment Gaussian mixed regression model, and sigma is a covariance of the components in the fusion semi-supervised increment Gaussian mixed regression model,
Figure BDA0002876031940000095
the method is characterized in that the mixed weight of components in a post-half-supervision incremental Gaussian mixed regression model is fused, N represents the total number of original sample data, mujThe mean vector of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data is shown,
Figure BDA0002876031940000096
represents the mixing weight, sigma of the jth component in the semi-supervised Gaussian mixed regression model of the initial training data samplejRepresents the covariance, M, of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data samplekRepresents the number of samples, mu, of the k component in the semi-supervised incremental Gaussian mixture regression model of the newly-come sample trainingkSum ΣkRespectively, the mean vector and the covariance matrix of the kth component in the corresponding semi-supervised incremental gaussian mixture regression model.
When the calculated SKLD value is larger than 10, the two components can not be fused and become residual components, the original mean vector mu and covariance matrix sigma are kept unchanged, and the mixed weight of the components is updated
Figure BDA0002876031940000097
That is, the following are the mixing weights
Figure BDA0002876031940000098
The update formula of (2):
when the remaining components belong to a new sample:
Figure BDA0002876031940000099
when the remaining components belong to the initial training data sample:
Figure BDA00028760319400000910
(7) and (5) continuously repeating the step (5) and the step (6) along with the inflow of the data to be measured, on one hand, predicting the quality variable of the new data by using a semi-supervised Gaussian mixture regression model, on the other hand, collecting new data mixed with labels and without labels for modeling, updating the semi-supervised Gaussian mixture regression model in an incremental mode, and realizing the self-adaptive quality prediction of the industrial process. In short, historical data are discarded in the updating process, and only the number of the historical data and the model parameters need to be stored, so that the occupation of storage space is greatly reduced; on the other hand, model parameters established by historical data are used for updating, so that the time of subsequent training is remarkably shortened; and finally, parameter learning is carried out by using the data of the labeled samples and the data of the unlabeled samples, so that the accuracy of model parameter learning is improved.
Examples
The performance of the semi-supervised incremental gaussian mixed regression model is described below in conjunction with a specific primary reformer example for a hydrogen production unit process for an ammonia synthesis process. The main raw material NH3 of the hydrogen production unit of the ammonia synthesis process is usually the main raw material in the urea synthesis process, and the primary reformer is the main place for carrying out the conversion reaction according to the process flow design, and the process flow diagram is shown in figure 2. According to the reaction mechanism, the reaction temperature is a key factor for ensuring the hydrogen production in the first-stage converter, in order to stabilize the temperature at a certain level, the combustion state needs to be monitored in real time, and the control of the oxygen content at the top in the converter within a set range is one of effective means. In a practical industrial process, the cost of measuring oxygen concentration using a mass spectrometer is very expensive. Therefore, in order to improve the control quality of the primary reformer and reduce the measurement cost, an adaptive soft measurement model needs to be established for the oxygen content in the primary reformer. Table 1 summarizes the detailed description of the 13 auxiliary variables and the 1 quality variable.
Table 1: sample variable description
Label (R) Name (R)
U1 Fuel natural gas flow
U2 Fuel exhaust gas flow
U3 E3 outlet fuel natural gas pressure
U4 PR outlet hearth flue gas pressure
U5 T E3 temperature of tail gas of outlet fuel
U6 PH outlet fuel natural gas temperature
U7 PR inlet process gas temperature
U8 Flue gas temperature of PR top left side hearth
U9 Flue gas temperature of PR top right hearth
U10 Flue gas temperature of PR top mixed hearth
U11 PR outlet transition air temperature
U12 PR right side outlet switching air temperature
U13 PR outlet transition air temperature
Y Top oxygen content in the furnace
First, a total of 7000 samples of industrial process data were collected. The first 1500 samples are taken as original sample training models by taking time as a sequence, and new data prediction and model updating are carried out by different updating step lengths. Using root mean square error RMSE as index to measure the proposed S2Measurement accuracy of IGMR and IGMR soft measurement models.
Table 2: the method of the invention compares the parameters of the prediction results of the incremental Gaussian mixture regression model
Figure BDA0002876031940000101
Figure BDA0002876031940000111
As shown in FIG. 3, which is a diagram of the predicted effect of a section of the industrial process of the furnace, it can be seen that the method can better capture the dynamic curve of the top oxygen content in the furnace. From table 2, we can see that when the new data update step size is changed from 60 to 200, the overall prediction accuracy RMSE shows a trend of decreasing first and then increasing. The reason for the change is that when the sample update step is too small, the model trained by the new sample is difficult to better fit data, so that the prediction accuracy is reduced; when the step length reaches a proper number, the precision also reaches the best; with the continuous increase of the sample updating step length, the model is not updated timely, and when the working condition is changed, the model is difficult to adapt to new data, so that the prediction precision is poor. With the reduction of the proportion of the labeled samples in the data set, the prediction effect of the incremental Gaussian mixture regression model begins to deteriorate, and particularly when the proportion of the labeled samples reaches 10%, compared with the situation that the performance of the incremental Gaussian mixture regression model is relatively slowly reduced, a better prediction effect can still be achieved. Therefore, the method is considered to be an effective method for predicting the oxygen content on line, and is beneficial to better controlling the reaction temperature, thereby ensuring continuous and stable hydrogen production.

Claims (3)

1. A self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression is characterized by comprising the following steps:
(1) tagged data set of historical operating conditions in industrial processes
Figure FDA0002876031930000011
And unlabeled datasets
Figure FDA0002876031930000012
Forming an initial training data set, wherein l represents a label of the label, i represents an index of the labeled data set, and xiDenotes the ith sample, y, in the labeled datasetiRepresents a label, nlRepresenting the number of samples of the tagged data setItem, j denotes the index of the unlabeled dataset, u denotes the unlabeled label, xjDenotes the jth sample, n, in the unlabeled datasetuRepresenting the number of samples of the unlabeled dataset;
(2) standardizing the initial training data set collected in the step (1) with the mean value of 0 and the variance of 1 to obtain a standardized data set
Figure FDA00028760319300000114
(3) Learning semi-supervised Gaussian mixture regression model parameters theta by adopting an EM algorithm in an iterative mode, wherein the semi-supervised Gaussian mixture regression model parameters theta comprise: prior probability alpha of kth component in semi-supervised Gaussian mixture regression modelkMean vector mu of kth component in semi-supervised Gaussian mixture regression modelkCovariance matrix sigma of kth component in semi-supervised Gaussian mixed regression modelkRegression coefficient omega of kth component in semi-supervised Gaussian mixture regression modelkRegression coefficient of kth component label in semi-supervised Gaussian mixture regression model
Figure FDA00028760319300000113
Measuring noise variance
Figure FDA0002876031930000014
The method specifically comprises the following substeps:
(3.1) obtaining the posterior probability distribution of the hidden variables of the labeled samples respectively by using linear Gaussian operation
Figure FDA0002876031930000015
And the hidden variable posterior probability distribution of unlabeled samples
Figure FDA0002876031930000016
Figure FDA0002876031930000017
Figure FDA0002876031930000018
Wherein z isiIs the hidden variable of the ith labeled sample, zjHidden variable for jth unlabeled sample, RikRepresenting the posterior probability, R, of the kth component in the semi-supervised Gaussian mixture regression model in generating the ith labeled samplejkRepresenting the posterior probability of the kth component in the semi-supervised Gaussian mixed regression model in generating the jth unlabeled sample; p represents the probability of the kth component from 0 to 1 in the semi-supervised gaussian mixture regression model,
Figure FDA0002876031930000019
represents the mean vector of the labeled samples in the kth component in the semi-supervised gaussian mixture regression model,
Figure FDA00028760319300000110
representing a covariance matrix of labeled samples in the kth component of the semi-supervised gaussian mixture regression model,
Figure FDA00028760319300000111
represents the mean vector of the k component without label in the semi-supervised Gaussian mixture regression model,
Figure FDA00028760319300000112
represents the unlabeled covariance matrix in the kth component of the semi-supervised gaussian mixture regression model,
Figure FDA0002876031930000021
representing a kth gaussian distribution based on the mean vector and the covariance matrix;
(3.2) posterior probability distribution of hidden variables obtained by the step (3.1)
Figure FDA0002876031930000022
And
Figure FDA0002876031930000023
to calculate a corresponding log-likelihood function, said log-likelihood function
Figure FDA0002876031930000024
Comprises the following steps:
Figure FDA0002876031930000025
wherein z represents an implicit variable of the sample,
maximizing log-likelihood functions for each semi-supervised Gaussian mixture regression model parameter
Figure FDA0002876031930000026
Figure FDA0002876031930000027
Wherein β is a lagrange multiplier;
estimating the updated value of the parameters of the semi-supervised Gaussian mixed regression model:
Figure FDA0002876031930000028
Figure FDA0002876031930000029
Figure FDA00028760319300000210
Figure FDA00028760319300000211
Figure FDA00028760319300000212
wherein the content of the first and second substances,
Figure FDA00028760319300000213
in order to be a set of regression coefficients,
Figure FDA00028760319300000214
Figure FDA00028760319300000215
is a set of hidden variable posterior probability distributions for labeled exemplars,
Figure FDA00028760319300000216
Figure FDA00028760319300000217
in the form of a matrix of tagged data sets,
Figure FDA00028760319300000218
i is 1 and dimension is nlThe vector of (a) is determined,
Figure FDA00028760319300000219
(3.3) calculating a normalized data set based on the updated values of the parameters of the semi-supervised Gaussian mixture regression model estimated in step (3.2)
Figure FDA00028760319300000227
Log likelihood function of
Figure FDA00028760319300000220
Repeating steps (3.1) - (3.2) until the log-likelihood function
Figure FDA00028760319300000221
When convergence occurs, the parameters of the semi-supervised Gaussian mixture regression model at the moment are the parameters of the final semi-supervised Gaussian mixture regression model;
(4) and (3) predicting quality variables by using the final semi-supervised Gaussian mixture regression model parameters:
Figure FDA00028760319300000222
wherein the content of the first and second substances,
Figure FDA00028760319300000226
predicting the expected value of data x for a given band, y being a quality variable, RxkRepresenting the probability value of the data to be predicted belonging to each component of the semi-supervised Gaussian mixture regression model,
Figure FDA00028760319300000224
Figure FDA00028760319300000225
representing the probability mean value of each component of the semi-supervised Gaussian mixture regression model belonging to the data x to be predicted,
Figure FDA0002876031930000031
Figure FDA0002876031930000032
(5) collecting new labeled and unlabeled mixed data in the same proportion as training data, training the mixed data into a semi-supervised incremental Gaussian mixed regression model according to the processes of the steps (2) to (3), and storing parameters of the semi-supervised incremental Gaussian mixed regression model and the number of the training data into a historical database;
(6) calculating each component in the semi-supervised Gaussian mixture regression model in the step (3) and each component in the semi-supervised incremental Gaussian mixture regression model in the step (5) pairwise to obtain a symmetrical Kullback-Leibler divergence SKLD, judging whether the SKLD value exceeds 10, keeping the original mean vector and covariance unchanged when the SKLD value exceeds 10, and updating the mixture weight of the components; when the SKLD value is less than 10, fusing; the calculation process of the SKLD value is as follows:
Figure FDA0002876031930000033
wherein phi is1Parameter set of a certain component in semi-supervised Gaussian mixture regression model, phi2Is a parameter set, mu, of a corresponding component in a semi-supervised incremental Gaussian mixture regression model1Is a mean vector of a certain component in a semi-supervised Gaussian mixed regression model2Is the mean vector, sigma, of the corresponding component in the semi-supervised incremental Gaussian mixture regression model1Is a covariance matrix, sigma, of a certain component in a semi-supervised Gaussian mixed regression model2The covariance matrix of corresponding components in the semi-supervised incremental Gaussian mixed regression model is obtained, and KLD is relative entropy;
(7) and (5) continuously repeating the steps (5) to (6) along with the inflow of the data to be measured, so that the self-adaptive quality prediction of the industrial process can be realized.
2. The adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression as claimed in claim 1, wherein the fusion process of the step (6) is as follows:
Figure FDA0002876031930000034
Figure FDA0002876031930000035
Figure FDA0002876031930000036
wherein u is a mean vector of the components in the fusion semi-supervised increment Gaussian mixed regression model, and Σ is a covariance of the components in the fusion semi-supervised increment Gaussian mixed regression model,
Figure FDA0002876031930000037
the method is characterized in that the mixed weight of components in a post-half-supervision incremental Gaussian mixed regression model is fused, N represents the total number of original sample data, mujThe mean vector of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data is shown,
Figure FDA0002876031930000038
represents the mixing weight, sigma of the jth component in the semi-supervised Gaussian mixed regression model of the initial training data samplejRepresents the covariance, M, of the jth component in the semi-supervised Gaussian mixture regression model of the initial training data samplekRepresents the number of samples, mu, of the k component in the semi-supervised incremental Gaussian mixture regression model of the newly-come sample trainingkSum ΣkRespectively, the mean vector and the covariance matrix of the kth component in the corresponding semi-supervised incremental gaussian mixture regression model.
3. The adaptive soft measurement method based on semi-supervised incremental gaussian mixture regression as recited in claim 1, wherein the step (6) of updating the mixture weights of the components comprises the following steps:
when the remaining components belong to a new sample:
Figure FDA0002876031930000041
when the remaining components belong to the initial training data sample:
Figure FDA0002876031930000042
CN202011614387.1A 2020-12-30 2020-12-30 Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression Active CN112650063B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011614387.1A CN112650063B (en) 2020-12-30 2020-12-30 Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011614387.1A CN112650063B (en) 2020-12-30 2020-12-30 Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression

Publications (2)

Publication Number Publication Date
CN112650063A true CN112650063A (en) 2021-04-13
CN112650063B CN112650063B (en) 2022-04-29

Family

ID=75364465

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011614387.1A Active CN112650063B (en) 2020-12-30 2020-12-30 Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression

Country Status (1)

Country Link
CN (1) CN112650063B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158473A (en) * 2021-04-27 2021-07-23 昆明理工大学 Semi-supervised integrated instant learning industrial rubber compound Mooney viscosity soft measurement method
CN113707240A (en) * 2021-07-30 2021-11-26 浙江大学 Component parameter robust soft measurement method based on semi-supervised nonlinear variational Bayes mixed model
CN114239400A (en) * 2021-12-16 2022-03-25 浙江大学 Multi-working-condition process self-adaptive soft measurement modeling method based on local double-weighted probability hidden variable regression model
CN114662620A (en) * 2022-05-24 2022-06-24 岚图汽车科技有限公司 Automobile endurance load data processing method and device for market users

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542126A (en) * 2011-10-10 2012-07-04 上海交通大学 Soft measurement method based on half supervision learning
CN102693452A (en) * 2012-05-11 2012-09-26 上海交通大学 Multiple-model soft-measuring method based on semi-supervised regression learning
CN102708294A (en) * 2012-05-11 2012-10-03 上海交通大学 Self-adaptive parameter soft measuring method on basis of semi-supervised local linear regression
CN103927412A (en) * 2014-04-01 2014-07-16 浙江大学 Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models
CN104462850A (en) * 2014-12-25 2015-03-25 江南大学 Multi-stage batch process soft measurement method based on fuzzy gauss hybrid model
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN108171002A (en) * 2017-11-30 2018-06-15 浙江大学 A kind of polypropylene melt index Forecasting Methodology based on semi-supervised mixed model
CN108764295A (en) * 2018-04-28 2018-11-06 江南大学 A kind of soft-measuring modeling method based on semi-supervised integrated study
US10678196B1 (en) * 2020-01-27 2020-06-09 King Abdulaziz University Soft sensing of a nonlinear and multimode processes based on semi-supervised weighted Gaussian regression

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102542126A (en) * 2011-10-10 2012-07-04 上海交通大学 Soft measurement method based on half supervision learning
CN102693452A (en) * 2012-05-11 2012-09-26 上海交通大学 Multiple-model soft-measuring method based on semi-supervised regression learning
CN102708294A (en) * 2012-05-11 2012-10-03 上海交通大学 Self-adaptive parameter soft measuring method on basis of semi-supervised local linear regression
CN103927412A (en) * 2014-04-01 2014-07-16 浙江大学 Real-time learning debutanizer soft measurement modeling method on basis of Gaussian mixture models
CN104462850A (en) * 2014-12-25 2015-03-25 江南大学 Multi-stage batch process soft measurement method based on fuzzy gauss hybrid model
CN107451101A (en) * 2017-07-21 2017-12-08 江南大学 It is a kind of to be layered integrated Gaussian process recurrence soft-measuring modeling method
CN108171002A (en) * 2017-11-30 2018-06-15 浙江大学 A kind of polypropylene melt index Forecasting Methodology based on semi-supervised mixed model
CN108764295A (en) * 2018-04-28 2018-11-06 江南大学 A kind of soft-measuring modeling method based on semi-supervised integrated study
US10678196B1 (en) * 2020-01-27 2020-06-09 King Abdulaziz University Soft sensing of a nonlinear and multimode processes based on semi-supervised weighted Gaussian regression

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
JINGBO WANG等: ""Bayesian Regularized Gaussian Mixture Regression with Application to Soft Sensor Modeling for Multi-Mode Industrial Processes"", 《2018 IEEE 7TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS CONFERENCE (DDCLS)》 *
WEIMING SHAO等: ""Soft-Sensor Development for Processes With Multiple Operating Modes Based on Semisupervised Gaussian Mixture Regression"", 《IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY》 *
邵伟明等: ""基于循环神经网络的半监督动态软测量建模方法"", 《电子测量与仪器学报》 *
邵伟明等: ""基于集成学习的多产品化工过程软测量建模方法"", 《化工学报》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158473A (en) * 2021-04-27 2021-07-23 昆明理工大学 Semi-supervised integrated instant learning industrial rubber compound Mooney viscosity soft measurement method
CN113707240A (en) * 2021-07-30 2021-11-26 浙江大学 Component parameter robust soft measurement method based on semi-supervised nonlinear variational Bayes mixed model
CN113707240B (en) * 2021-07-30 2023-11-07 浙江大学 Component parameter robust soft measurement method based on semi-supervised nonlinear variation Bayesian hybrid model
CN114239400A (en) * 2021-12-16 2022-03-25 浙江大学 Multi-working-condition process self-adaptive soft measurement modeling method based on local double-weighted probability hidden variable regression model
CN114662620A (en) * 2022-05-24 2022-06-24 岚图汽车科技有限公司 Automobile endurance load data processing method and device for market users

Also Published As

Publication number Publication date
CN112650063B (en) 2022-04-29

Similar Documents

Publication Publication Date Title
CN112650063B (en) Self-adaptive soft measurement method based on semi-supervised incremental Gaussian mixture regression
CN106897775B (en) Soft-measuring modeling method based on Bayes's integrated study
CN104778298A (en) Gaussian process regression soft measurement modeling method based on EGMM (Error Gaussian Mixture Model)
CN109508818B (en) Online NOx prediction method based on LSSVM
CN114358213B (en) Error ablation processing method, system and medium for nonlinear time series data prediction
CN109670625A (en) NOx emission concentration prediction method based on Unscented kalman filtering least square method supporting vector machine
CN113012766B (en) Self-adaptive soft measurement modeling method based on online selective integration
CN110046377B (en) Selective integration instant learning soft measurement modeling method based on heterogeneous similarity
CN110189800B (en) Furnace oxygen content soft measurement modeling method based on multi-granularity cascade cyclic neural network
CN113095550A (en) Air quality prediction method based on variational recursive network and self-attention mechanism
CN114239400A (en) Multi-working-condition process self-adaptive soft measurement modeling method based on local double-weighted probability hidden variable regression model
CN113159456A (en) Water quality prediction method, device, electronic device, and storage medium
CN114022311A (en) Comprehensive energy system data compensation method for generating countermeasure network based on time sequence condition
CN110083065B (en) Self-adaptive soft measurement method based on flow type variational Bayesian supervised factor analysis
CN110880044B (en) Markov chain-based load prediction method
CN115759415A (en) Power consumption demand prediction method based on LSTM-SVR
CN111898673A (en) Dissolved oxygen content prediction method based on EMD and LSTM
Alam et al. Forecasting co 2 emissions in Saudi Arabia using artificial neural network, holt-winters exponential smoothing, and autoregressive integrated moving average models
CN114239397A (en) Soft measurement modeling method based on dynamic feature extraction and local weighted deep learning
CN114169459A (en) Robust soft measurement method based on semi-supervised Bayesian regularization hybrid Student's t model
CN109033524A (en) A kind of chemical process concentration variable On-line Estimation method based on robust mixed model
CN113707240B (en) Component parameter robust soft measurement method based on semi-supervised nonlinear variation Bayesian hybrid model
CN115035962A (en) Variational self-encoder and generation countermeasure network-based virtual sample generation and soft measurement modeling method
CN114861759A (en) Distributed training method of linear dynamic system model
CN110879873B (en) Soft measurement method and system for vine copula correlation description based on Hamilton Monte Carlo sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant