CN116738866B

CN116738866B - Instant learning soft measurement modeling method based on time sequence feature extraction

Info

Publication number: CN116738866B
Application number: CN202311007772.3A
Authority: CN
Inventors: 王平; 尹贻超; 李雪静
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-10-27
Anticipated expiration: 2043-08-11
Also published as: CN116738866A

Abstract

The invention discloses a soft measurement modeling method based on instant learning of time sequence feature extraction, which belongs to the technical field of industrial process soft measurement, and aims at modeling data and acquired query data. The invention not only can assist in establishing the soft measurement model by using a large number of unlabeled samples, but also can effectively mine the dynamic relation among process variables, and remarkably strengthen the characteristic representation capability of instant learning on complex dynamic data, thereby effectively processing the nonlinearity and dynamic characteristics of the industrial process and improving the prediction precision of the soft measurement model.

Description

Instant learning soft measurement modeling method based on time sequence feature extraction

Technical Field

The invention belongs to the technical field of industrial process soft measurement, and particularly relates to a soft measurement modeling method based on instant learning of time sequence feature extraction.

Background

Due to the limitations of site environment, measurement technology, economic cost and other factors, some key variables or indexes closely related to product quality, such as product components and concentrations, catalyst activity, polypropylene melt index and the like, are commonly existed in industrial processes, and are difficult to directly measure. At present, the key variables can be obtained only by means of manual sampling laboratory offline analysis or expensive component analyzers, so that the outstanding problems of large measurement delay, low precision, high cost and the like exist, and feedback information can not be provided for closed-loop control and operation optimization of the process in time. Soft measurement techniques are an effective means to solve the above problems, which aim to achieve real-time estimation of difficult to measure directly but important quality indicators using easily measurable process variables. Compared with the traditional measuring instrument, the soft measurement has the remarkable advantages of economy, flexibility, easy maintenance and the like.

Modern industrial production systems are large in scale, complex in action mechanism, diversified in production mode, numerous in factors affecting product quality, and often the process is represented by various complex characteristics such as nonlinearity, dynamics and time variation, so that the global soft measurement model is not ideal in practical application, and prediction accuracy is continuously deteriorated along with the time of use. In this regard, local modeling methods, such as multi-model, ensemble learning, and instant learning, are increasingly favored by researchers. The method is characterized in that Just-in-time learning (JITL) is taken as a classical local modeling method, a local model is built on line based on a similar sample of query data, and if a modeling data set updating strategy is assisted, the problem that model accuracy is reduced due to time-varying process can be well solved.

The actual production process usually has obvious dynamic change and feedback control effect, so that the collected process data has obvious time sequence correlation, namely the quality variable at the current moment can be influenced by auxiliary variables at a plurality of previous moments, and the characteristic of multi-dimensional time sequence is shown. JITL, while being able to efficiently process time-varying characteristics of a process by building different local models, has limited processing power for process dynamics. In addition, as industrial processes continue to scale and complicate, the auxiliary and quality variables are no longer simply linear, but rather exhibit strong nonlinearities. From the above, the model established by the conventional instant learning algorithm has the problem of poor prediction accuracy.

Disclosure of Invention

In order to solve the problems, the invention provides a soft measurement modeling method based on instant learning of time sequence feature extraction, which can effectively extract dynamic and nonlinear information in process data on the premise of ensuring algorithm calculation efficiency, and remarkably strengthen feature representation capability of complex dynamic data, thereby ensuring soft measurement model prediction precision.

The technical scheme of the invention is as follows:

a soft measurement modeling method based on instant learning of time sequence feature extraction comprises the following steps:

step 1, selecting auxiliary variables to collect samples to obtain a sample set, and analyzing to obtain a corresponding real quality variable set;

step 2, constructing an initial training data set based on the sample set and the real quality variable set, and performing standardization processing to obtain a standard data set;

step 3, extracting a high-dimensional dynamic latent variable set from the standardized sample set by using a time sequence feature mapping method based on a recurrent neural network; then simplifying hidden layer feature dimensions and eliminating redundant variables by adopting a dimension reduction algorithm to obtain a dimension reduction dynamic latent variable set; combining the dimension-reducing dynamic latent variable set and the real quality variable set to obtain modeling data;

step 4, collecting query data on line and carrying out standardization processing, extracting Gao Weiqian variables of the query data by adopting a time sequence feature mapping method based on a recurrent neural network, and then reducing the dimension to obtain a query data low-dimension latent variable set;

step 5, constructing a spatial neighbor sample set, an original time neighbor sample set and an extended time neighbor sample set of query data, and combining to obtain a local neighbor sample set of a query data low-dimensional latent variable set;

step 6, taking a label sample in the original time neighbor sample set as training data, and establishing a local supervision model of the query data low-dimensional latent variable set by an instant learning unified optimization modeling method;

step 7, calculating a quality variable predicted value corresponding to the query data low-dimensional latent variable set through an instant learning unified optimization modeling method based on local label propagation based on a local neighbor sample set and a local supervision model of the query data low-dimensional latent variable set;

and 8, updating the training data set by using the latest acquired process data.

Further, the specific process of the step 1 is as follows: firstly, according to the knowledge of the process mechanism, analyzing the correlation between the process data and the quality variable, and sorting the process data and the quality variable according to the sequence from high to low of the correlation, and selecting the process data and the quality variable before sortingThe individual process data are used as auxiliary variables, which are then recorded>The values of the auxiliary variables at different times, resulting in a sample set +.>，/>，，/>For the total number of samples, +.>For the dimension of the sample, +.>Is->A sample number; t is a matrix transposition operator; then obtaining a real quality variable set corresponding to each sample through offline assay analysis>；/>Is->The actual quality variable corresponding to each sample.

Further, the specific process of the step 2 is as follows: based on、/>Constructing an initial training datasetFor the initial training data set +.>Normalization is performed according to formula (1) to obtain a standard data set +.>，/>For the normalized sample set, +.>As a normalized set of real quality variables, formula (1) is expressed as:

（1）；

in the formula, the functionRepresenting the mean value of each row of the calculation matrix, function +.>Representing the standard deviation of each row of the calculation matrix.

Further, the specific process of the step 3 is as follows:

step 3.1 forEach sample in the set is mapped in turn according to sampling time to generate a state vector, and a high-dimensional dynamic latent variable set +.>，/>Indicate->State vector of individual samples->，/>Representing a high-dimensional feature quantity; the specific calculation formula of the state vector is as follows:

（2）；

in the method, in the process of the invention,is->Time->A state vector of the individual samples; />Is a nonlinear activation function; />The input weight matrix is used; />Is->Time->A sample number; />A state weight matrix of the reserve pool; />Is->Time->A state vector of the individual samples; the expression of the nonlinear activation function is shown in formula (3):

（3）；

in the method, in the process of the invention,is a hyperbolic tangent activation function; />Inputting vectors for functions;

step 3.2, calculating the following optimization targets by solvingIs +.>：

（4）；

In the method, in the process of the invention,for->Is an optimization objective of (1); />；

Step 3.3, calculating to obtainIs->And residual matrix->The calculation formulas are respectively as follows:

（5）；

（6）；

step 3.4, residual error matrixAs a new->Extracting principal component according to the same procedure as in step 3.2-step 3.3, and sequentially obtaining principal component 2 ++>To->Principal component->Finally, the dimension-reducing dynamic latent variable set is obtained；

Step 3.5, collecting the dimension-reducing dynamic latent variablesAnd real quality variable set +.>Combining to obtain modeling data->。

Further, the specific process of step 4 is as follows: for the on-line collected query data, firstly, carrying out standardized processing according to the same calculation mode as the formula (1) to obtain standard query dataThe method comprises the steps of carrying out a first treatment on the surface of the Then, using a recurrent neural network-based time series feature mapping method, from +.>Extracting to obtain a query data Gao Weiqian variable set +.>，/>The method comprises the steps of carrying out a first treatment on the surface of the Finally, pair->Dimension reduction is carried out to obtain a query data low-dimension latent variable set +.>。

Further, the specific process of step 5 is as follows:

step 5.1, calculating a query data low-dimensional latent variable set according to the distance measurementAnd modeling data->The spatial distances among all modeling samples in the database are ranked, and the +.f. with the largest spatial similarity with the query data is selected>Samples, spatial neighbor sample set constituting query data +.>Wherein both label samples and unlabeled samples are included; the distance metric expression is as follows:

（7）；

in the method, in the process of the invention,is->Similarity of the individual modeling samples and the query samples; />Representing an exponential function;indicate->Dynamic latent variable set of dimension reduction of individual modeling samples, < ->The width is the kernel mapping;

step 5.2, selecting the latest acquiredOriginal time neighbor sample set with individual labels as query dataThe method comprises the steps of carrying out a first treatment on the surface of the Spatial neighbor sample set for query data +.>The%>Samples, selecting the +.>Sample +.>Time neighbor samples->Wherein->Thereby obtaining an extended time neighbor sample set of query data +.>；

Step 5.3, constructing a query data low-dimensional latent variable setIs a local neighbor sample set of (1)。

Further, the specific process of step 6 is as follows:

step 6.1, original time neighbor sample set of query dataThe label sample in the model is used as training data, a local supervision model is established by an instant learning unified optimization modeling method,the optimization objective of the local supervision model is as follows:

（8）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />The method is a weight coefficient of two subtasks of a balanced instant learning unified optimization modeling method; />Reconstruction coefficient vector +.>Weight matrix of>，/>，Is->Reconstruction coefficients of the individual samples with respect to the query data; />For regression coefficient vector->Weight matrix of>，，/>Is->Regression coefficient vectors corresponding to principal components; />Weight matrix for Euclidean distance vector, < ->；/>For regression coefficient vector->Regularization coefficient of>Reconstruction coefficient vector +.>Is included in the regularization coefficients of (a);

、/>、/>the expression of (2) is as follows:

（9）；

（10）；

（11）；

in the method, in the process of the invention,diagonalizing the function for the vector; />For inquiring data and->Euclidean distance between samples;

step 6.2, fixing regression coefficient vectorSolving for the reconstruction coefficient vector +.>Re-describing equation (8) as equation (12):

（12）；

in the method, in the process of the invention,for->Optimization objective of->For->Diagonal matrix of>Diagonal elements of (a) are training errors, < >>；/>Is->The training error of the individual samples is determined,，/>is->True quality variable corresponding to the individual samples, +.>Is->A dimension-reducing dynamic latent variable set of each sample; />Representing all and reconstruction coefficient vectors->An irrelevant constant term;

calculating to obtain a reconstruction coefficient vectorThe expression of (2) is shown in formula (13):

（13）；

step 6.3, fixing the reconstruction coefficient vectorSolving regression coefficient vector->Re-describing equation (8) as equation (14):

（14）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />For->Is a diagonal matrix of the (a),，/>is->Reconstruction error of principal component,/->Is the reconstruction error of the sample, defining a reconstruction error vector +.>；/>For about->Diagonal matrix of individual samples->，/>Is->Individual samples and query samples about +.>The difference between the principal components is calculated,diagonal element of->Distance difference values of each sample and query sample about each feature are defined as distance difference value vectors；

Calculating to obtain regression coefficient vectorThe expression of (2) is shown in formula (15):

（15）；

in the method, in the process of the invention,reconstruction coefficient vector +.>A related item;

regression coefficient vector obtained by final solutionObtaining a low-dimensional latent variable set about query data>Is->The model coefficient is regression coefficient vector +.>。

Further, the specific process of the step 7 is as follows:

step 7.1, collecting query data low-dimensional latent variablesSample set of local neighbors thereof>Merging into one dataset +.>，/>Is->A number of samples; will->Dividing into label sample subsetsLabel-free sample subset->Wherein->Representing the +.sub.f in the tag sample subset>Sample number->Indicate->Labels corresponding to the individual samples->And->Respectively->Label sample and no label sample number, +.>Is->A label-free sample;

the optimization target of the instant learning unified optimization modeling method based on local label propagation is shown as a formula (16):

（16）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />Is thatPredicted value of mass variable for all samples in +.>Is->A quality variable predictor for each sample; />For the label sample subset +.>A quality variable predictor for each sample; />For sample set->Middle->Sample and->Similarity of the individual samples in the auxiliary variable space; />Is->A quality variable predictor for each sample; />Is calculated by monitoring model by instant learning unified optimization modeling method>Quality variable predictive value of (2); />、/>All represent weighting coefficients;

step 7.2, calculating by the formula (17)Predicted value of quality variable for all samples in (1)>Thereby obtaining the query data low-dimensional latent variable set +.>Corresponding predicted value of quality variable:

（17）；

in the method, in the process of the invention,for the diagonal matrix, the diagonal element corresponding to the label sample is 1, and the diagonal element corresponding to the label-free sample is 0, < >>Is Laplace matrix>For the similarity matrix, the matrix element is +.>Diagonal matrix->Is a contiguous matrix, diagonal element->；/>A unit matrix, namely a diagonal matrix with diagonal elements of 1; />Calculated by a local supervision modelPredicted values of all samples in (a)/(b)>For the +.>Predicted values for the individual samples; />Is->The label value of the unlabeled exemplar is 0.

Further, the specific process of step 8 is as follows: when data is queried by laboratory assay analysis criteriaWhen the real quality variables are obtained, the samples are added to the initial training dataset +.>Expanding a working interval contained in the modeling data set; otherwise, maintain the initial training data set +.>The samples contained in (a) are unchanged.

The invention has the beneficial technical effects that: according to the instant learning soft measurement modeling method based on time sequence feature extraction, firstly, a modeling sample and a query sample are projected to a high-dimensional space by utilizing a time sequence feature extraction idea based on a recurrent neural network, and high-dimensional dynamic latent variables in time sequence data are extracted to obtain rich feature expression of complex nonlinear dynamic data, so that nonlinear and dynamic information in the samples can be mined. Then, to reduce modeling complexity, the hidden layer feature dimension is simplified and redundant variables are eliminated using a dimension reduction algorithm. And finally, establishing a soft measurement model by using an instant learning unified optimization modeling method based on local label propagation, so that information in a large number of unlabeled samples can be effectively extracted, the calculation efficiency of an algorithm is ensured, and the prediction precision of the soft measurement model is improved.

Drawings

FIG. 1 is a flow chart of a method for modeling soft measurements based on instant learning of time series feature extraction according to the present invention.

Fig. 2 is a graph showing the actual output of process data from a low temperature conversion unit according to an embodiment of the present invention.

FIG. 3 is a schematic diagram of the prediction error of the low temperature conversion unit data by the conventional partial weighted partial least squares soft measurement modeling method.

Fig. 4 is a schematic diagram of prediction error of low-temperature conversion unit data by the instant learning soft measurement modeling method based on time series feature extraction according to the present invention.

Detailed Description

The invention is described in further detail below with reference to the attached drawings and detailed description:

aiming at time sequence correlation, nonlinearity and time-varying characteristics in an industrial process, the invention provides a unified optimization method for instant learning based on time sequence feature extraction, which is based on the time sequence feature extraction idea of a recurrent neural network, and is used for mining dynamic and nonlinear information in process data, so that the time sequence correlation and nonlinearity problems of the process data are overcome; and extracting principal component features in the high-dimensional data through a dimension reduction algorithm, improving the algorithm calculation efficiency and eliminating redundant variables. In addition, a soft measurement model is established by a semi-supervised instant learning modeling method based on local label propagation, so that the time-varying characteristic of the process is overcome, and the problem that the prediction performance of the model is limited due to insufficient label samples is solved. The method for modeling soft measurement based on instant learning of sample collaborative representation according to the present invention is described in detail below.

As shown in fig. 1, the invention provides a soft measurement modeling method based on instant learning of time sequence feature extraction, which combines a recurrent neural network with an instant learning algorithm based on local label propagation to effectively extract dynamic information in industrial data, and specifically comprises the following steps:

and step 1, selecting auxiliary variables to acquire samples to obtain a sample set, and analyzing to obtain a corresponding real quality variable set. The specific process is as follows:

firstly, according to the knowledge of the process mechanism, analyzing the correlation between the process data and the quality variable, and sorting the process data and the quality variable according to the sequence from high to low of the correlation, and selecting the process data and the quality variable before sortingThe individual process data are used as auxiliary variables, which are then recorded>The values of the auxiliary variables at different times, resulting in a sample set +.>，/>，/>，/>For the total number of samples,for the dimension of the sample, +.>Is->A sample number; t is a matrix transposition operator;

then obtaining real quality variable sets corresponding to the samples through offline assay analysis；/>Is->The actual quality variable corresponding to each sample.

And 2, constructing an initial training data set based on the sample set and the real quality variable set, and performing standardization processing to obtain a standard data set. The specific process is as follows:

based on、/>Construction of the initial training data set +.>For initial training data setNormalization is performed according to formula (1) to obtain a standard data set +.>，/>For the normalized sample set, +.>As a normalized set of real quality variables, formula (1) is expressed as:

（1）；

Step 3, using a time sequence feature mapping method based on recurrent neural network to perform secondary processingExtracting to obtain a high-dimensional dynamic latent variable set +.>，/>，/>Representing the number of features in high dimensions. Then, the dimension-reducing algorithm is used for simplifying the hidden layer feature dimension and eliminating the redundant variable to obtain a dimension-reducing dynamic latent variable set +.>，/>The feature quantity is the feature quantity after dimension reduction; the dynamic latent variable set for reducing the dimension is->And real quality variable set +.>Combining to obtain modeling dataThe influence of the problems such as multiple collinearity is reduced. The specific process is as follows:

（2）；

（3）；

step 3.2, calculating the following optimization targets by solvingIs +.>：

（4）；

In the method, in the process of the invention,for->Is an optimization objective of (1); />。

（5）；

（6）；

step 3.4, residual error matrixAs a new->Extracting principal component according to the same procedure as in step 3.2-step 3.3, and sequentially obtaining principal component 2 ∈ ->To->Principal component->Finally, the dimension-reducing dynamic latent variable set is obtained；

And 4, collecting query data on line and carrying out standardization processing, extracting Gao Weiqian variables of the query data by adopting a time sequence feature mapping method based on a recurrent neural network, and then reducing the dimension to obtain a query data low-dimension latent variable set. The specific process is as follows:

for the on-line collected query data, firstly, carrying out standardized processing according to the same calculation mode as the formula (1) to obtain standard query data. Then, using a recurrent neural network-based time series feature mapping method, from +.>Extracting to obtain a query data Gao Weiqian variable set +.>，/>. Finally, pair->Dimension reduction is carried out to obtain a query data low-dimension latent variable set +.>。

Step 5, constructing a spatial neighbor sample set of query dataOriginal temporal neighbor sample set->And the extended temporal neighbor sample set +.>And combining to obtain query data low-dimensional latent variable set +.>Is->At this time->Including the time-space dual information. The specific process is as follows:

step 5.1, calculating a query data low-dimensional latent variable set according to the distance measure (Euclidean distance)And modeling data->The spatial distances among all modeling samples in the database are ranked, and the +.f. with the largest spatial similarity with the query data is selected>Samples, spatial neighbor sample set constituting query data +.>Wherein both label samples and unlabeled samples are included. The distance metric expression is as follows:

（7）；

step 5.2, selecting the latest acquiredOriginal time neighbor sample set with individual labels as query data. Spatial neighbor sample set for query data +.>The%>Samples, selecting the +.>Sample +.>Time neighbor samples->Wherein->Thereby obtaining an extended time neighbor sample set of query data +.>。

Step 6, the original time neighbor sample set of the query dataThe label sample in the query data is used as training data, and a low-dimensional latent variable set about the query data is established by an instant learning unified optimization modeling method>Is a local supervision model of (a). The specific process is as follows:

step 6.1, collecting the original time neighbor samplesThe label sample in the model is used as training data, a local supervision model is established through an instant learning unified optimization modeling method, and the optimization targets of the local supervision model are as follows:

（8）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />The method is a weight coefficient of two subtasks (query sample reconstruction and local model construction) of a balanced instant learning unified optimization modeling method; />Reconstruction coefficient vector +.>Is used for the weight matrix of the (c),，/>，/>is->Reconstruction coefficients of the individual samples with respect to the query data; />For regression coefficient vector->Weight matrix of>，/>，/>Is->Regression coefficient vectors corresponding to principal components; />Weight matrix for Euclidean distance vector, < ->；/>For regression coefficient vector->Regularization coefficient of>Reconstruction coefficient vector +.>Is included in the regularization coefficients of (a);

、/>、/>the expression of (2) is as follows:

（9）；

（10）；

（11）；

step 6.2, fixing regression coefficient vectorSolving for the reconstruction coefficient vector +.>Equation (8) can be re-described as equation (12):

（12）；

in the method, in the process of the invention,for->Optimization objective of->For->Diagonal matrix of>Diagonal elements of (a) are training errors, < >>；/>Is->The training error of the individual samples is determined,，/>is->True quality variable corresponding to the individual samples, +.>Indicate->A sample dimension-reducing dynamic latent variable set; />Representing all and reconstruction coefficient vectors->An irrelevant constant term.

Can calculate and obtain a reconstruction coefficient vectorThe expression of (2) is shown in formula (13):

（13）；

step 6.3, fixing the reconstruction coefficient vectorSolving regression coefficient vector->Equation (8) can be re-described as equation (14):

（14）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />For->Is a diagonal matrix of the (a),，/>is->Reconstruction error of principal component,/->Is the reconstruction error of the sample, defining a reconstruction error vector +.>；/>For about->Diagonal matrix of individual samples->，/>Is->Individual samples and query samples about +.>The difference between the principal components is calculated,diagonal element of->Distance difference between each sample and the query sample about each feature, distance difference vector is defined>；

Can calculate regression coefficient vectorThe expression of (2) is shown in formula (15):

（15）；

in the method, in the process of the invention,reconstruction coefficient vector +.>Related items.

Regression coefficient vector obtained by final solutionCan get low data about the querySet of dimension latent variablesIs->The model coefficient is regression coefficient vector +.>。

Step 7, low-dimensional latent variable set based on query dataIs->Local supervision model->Calculating query data low-dimensional latent variable set by using instant learning unified optimization modeling method based on local label propagation>And a corresponding quality variable predictive value. The specific process is as follows:

step 7.1, collecting query data low-dimensional latent variablesSample set of local neighbors thereof>Merging into one dataset +.>，/>Is->Is the number of samples. Due to->Comprises both tag samples and unlabeled samples, thus further dividing them into tag sample subsets +.>Label-free sample subset->Wherein->Representing the +.sub.f in the tag sample subset>Sample number->Indicate->Labels corresponding to the individual samples->And->Respectively->Label sample and no label sample number, +.>Is->And unlabeled exemplars.

（16）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />Is thatPredicted value of mass variable for all samples in +.>Is->A quality variable predictor for each sample; />For the label sample subset +.>A quality variable predictor for each sample; />For sample set->Middle->Sample and->Similarity of the individual samples in the auxiliary variable space; />Is->A quality variable predictor for each sample; />Is calculated by monitoring model by instant learning unified optimization modeling method>Quality variable predictive value of (2); />、/>All represent weighting coefficients.

Step 7.2, calculation by equation (17)Predicted value of quality variable for all samples in (1)>Thereby obtaining the query data low-dimensional latent variable set +.>Corresponding predicted value of quality variable:

（17）；

And 8, updating the training data set by using the latest acquired process data. The specific process is as follows: when data is queried by laboratory assay analysis criteriaWhen the real quality variables are obtained, the samples are added to the initial training dataset +.>Expanding a working interval contained in the modeling data set; otherwise, maintain the initial training data set +.>The samples contained in (a) are unchanged.

The method of the invention firstly utilizes the excellent feature learning capability of the recurrent neural network to effectively extract the high-dimensional dynamic latent variable in the time sequence data, and obtains the rich feature expression of the complex nonlinear dynamic data. Further, the dimension reduction algorithm is utilized to simplify the hidden layer feature dimension and eliminate redundant variables, so that the influence of multiple collinearity and other problems is reduced. And finally, based on the extracted representative conclusive characteristics, establishing a soft measurement model by an instant learning unified optimization modeling method based on local label propagation. The method provided by the embodiment of the invention not only can fully excavate the dynamic information and the nonlinear information in the process data and overcome the problem of long-term dependence of the process, but also can effectively utilize a large number of label-free samples to assist in establishing a soft measurement model, thereby relieving the problem of limited model prediction performance caused by insufficient label samples.

In order to illustrate the feasibility and advantages of the invention, the following examples are given, and the invention is further illustrated with reference to specific examples.

Examples: the process data of a low temperature conversion unit (LTTU) is illustrated as an example.

The cryogenic conversion unit process is part of an ammonia synthesis process for removing impure components in gas production. The reaction mixture from the high temperature conversion device is separated by a separator and then enters a low temperature conversion unit to convert CO into CO ₂ And is absorbed and removed in a subsequent processing stage. In order to control CO concentration and reduce environmental pollution in the production process, it is necessary to monitor and control the operating state of LTTU in real time. At present, a mass spectrometer is a common method for measuring the concentration of CO, however, the mass spectrometer is expensive and complicated to install, and the accuracy of a measurement result cannot be guaranteed. Thus, the CO concentration can be measured in real time by creating a soft measurement model.

The specific steps of the invention are described below in connection with a cryogenic conversion unit process:

step 1, when a soft measurement model of CO concentration is established, according to process mechanism knowledge, carrying out correlation analysis and then carrying out actual industrial productionSelect and select、/>、/>、/>、/>、/>Six readily measurable process data are used as auxiliary variables, the six auxiliary variables being the flow of process gas to the LTTU, the pressure at the outlet of the LTTU, the temperature of the upper layer of the LTTU, the temperature of the middle layer of the LTTU, the temperature of the lower layer of the LTTU, the temperature of the process gas at the outlet of the LTTU, respectively. Then collecting the carbon monoxide concentration values of the six auxiliary variables at different times to obtain a low-temperature conversion unit process data sample set with the sample number of 2000 +.>. Sample set of cryogenic conversion unit process data +.>The actual output curve is shown in fig. 2, which shows only 600 samples of carbon monoxide concentration due to the large data volume, arranged by sampling time.

Step 2, preprocessing the acquired data samples, and deleting abnormal samples in the data samples; in addition, taking the dynamic characteristics of the process into consideration, dimension expansion is carried out on modeling data according to the following formula, and the dimension of the sample after expansion is 30; finally, standardized processing is carried out to obtain a standard data set。

In this embodiment, considering the process dynamic characteristics, that is, the current CO concentration is related to not only the auxiliary variable in the current state but also the history auxiliary variable, the input variable of the model is constructed by the auxiliary variable at the current time and the previous times, and the specific construction formula is as follows:

；

in the method, in the process of the invention,representing the predictive value of the soft measurement model for the concentration of carbon monoxide,/->Represents the concentration of carbon monoxide and->Is a potential relation of->The input data is represented by a representation of the input data,。

step 3, extracting by using a time sequence feature mapping method based on a recurrent neural networkMedium-high dimension dynamic latent variable to obtain +.>. Then, the dimension of hidden layer characteristics is simplified and redundant variables are eliminated by using a dimension reduction algorithm to obtainModeling data->The influence of the problems such as multiple collinearity is reduced.

Step 4, for the on-line collected query data, firstly, carrying out standardization processing to the query data to obtainTo standard query data. Then, using a recurrent neural network-based time series feature mapping method, from +.>Extracting to obtain a query data Gao Weiqian variable set +.>. Finally, pair->Reducing blood dimension to obtain->。

Step 5, constructing a spatial neighbor sample set of query dataOriginal temporal neighbor sample set->And the extended temporal neighbor sample set +.>Obtaining a query data low-dimensional latent variable set +.>Is a local neighbor sample set of (1)。

Step 6, collecting original time neighbor samplesThe label sample in the query data is used as training data, and a low-dimensional latent variable set about the query data is established by an instant learning unified optimization modeling method>Is->Model coefficient is->。

Step 7, low-dimensional latent variable set based on query dataIs->Local supervision model->Calculating query data low-dimensional latent variable set by using instant learning unified optimization modeling method based on local label propagation>Corresponding quality variable predictive value +.>。

Step 8, when the data is queried through laboratory test analysis standardWhen the real quality variables are obtained, the samples are added to the initial training dataset +.>To expand the working space contained in the modeling dataset.

The prediction errors of the traditional local weighted partial least square (LWPS) soft measurement modeling method and the method (TSFE-JITL) of the invention on the data output variable of the low-temperature conversion unit are shown in fig. 3 and 4, and the prediction errors of 600 samples are shown in fig. 3 and 4 due to the large data volume. As can be seen from fig. 3 and 4, compared with the conventional method, the method of the present invention has the advantages of smaller prediction error and higher prediction accuracy.

It should be understood that the above description is not intended to limit the invention to the particular embodiments disclosed, but to limit the invention to the particular embodiments disclosed, and that the invention is not limited to the particular embodiments disclosed, but is intended to cover modifications, adaptations, additions and alternatives falling within the spirit and scope of the invention.

Claims

1. The instant learning soft measurement modeling method based on the time sequence feature extraction is characterized by comprising the following steps of:

the specific process of the step 1 is as follows: firstly, according to the knowledge of the process mechanism, analyzing the correlation between the process data and the quality variable, and sorting the process data and the quality variable according to the sequence from high to low of the correlation, and selecting the process data and the quality variable before sortingThe individual process data are used as auxiliary variables, which are then recorded>The values of the auxiliary variables at different times, resulting in a sample set +.>，/>，/>，For the total number of samples, +.>For the dimension of the sample, +.>Is->A sample number; t is a matrix transposition operator; then obtaining a real quality variable set corresponding to each sample through offline assay analysis>；/>Is->Real quality variables corresponding to the individual samples;

when the soft measurement model of the CO concentration is established, the method selects from the actual industrial production process after correlation analysis according to the knowledge of the process mechanism、/>、/>、/>、/>、/>Six easily measured process data are used as auxiliary variables, which are the process gas flow to the LTTU, the pressure at the outlet of the LTTU, the temperature of the upper layer of the LTTU, the temperature of the middle layer of the LTTU, the temperature of the lower layer of the LTTU, the temperature of the process gas at the outlet of the LTTU, respectively; then collecting the oxidation of the six auxiliary variables at different timesObtaining a sample set of low-temperature conversion unit process data with a sample number of 2000 from the carbon concentration value>；

the specific process of the step 2 is as follows: based on、/>Construction of the initial training data set +.>For the initial training data set +.>Performing standardization processing according to formula (1) to obtain a standard data set，/>For the normalized sample set, +.>As a normalized set of real quality variables, formula (1) is expressed as:

（1）；

in the formula, the functionRepresenting each row of the calculation matrixMean, function->Representing the standard deviation of each row of the calculation matrix;

the specific process of the step 3 is as follows:

（2）；

（3）；

step 3.2, calculating the following optimization targets by solvingIs +.>：

（4）；

（5）；

（6）；

Step 3.5, collecting the dimension-reducing dynamic latent variablesAnd real quality variable set +.>Combining to obtain modeling data；

2. The method for modeling soft measurements based on instant learning of time series feature extraction of claim 1, wherein the specific process of step 4 is: for the on-line collected query data, firstly, carrying out standardized processing according to the same calculation mode as the formula (1) to obtain standard query dataThe method comprises the steps of carrying out a first treatment on the surface of the Then, using a recurrent neural network-based time series feature mapping method, from +.>Extracting to obtain a query data Gao Weiqian variable set +.>，/>The method comprises the steps of carrying out a first treatment on the surface of the Finally, toDimension reduction is carried out to obtain a query data low-dimension latent variable set +.>。

3. The method for modeling soft measurement based on instant learning of time series feature extraction of claim 2, wherein the specific process of step 5 is:

（7）；

in the method, in the process of the invention,is->Similarity of the individual modeling samples and the query samples; />Representing an exponential function; />Indicate->Dynamic latent variable set of dimension reduction of individual modeling samples, < ->The width is the kernel mapping;

step 5.2, selecting the latest acquiredOriginal time neighbor sample set of personal tags as query data +.>The method comprises the steps of carrying out a first treatment on the surface of the Spatial neighbor sample set for query data +.>The%>Samples, selecting the +.>Sample +.>Time neighbor samples->Wherein->Thereby obtaining an extended time neighbor sample set of query data +.>；

4. The method for modeling soft measurement based on instant learning of time series feature extraction of claim 3, wherein the specific process of step 6 is as follows:

step 6.1, original time neighbor sample set of query dataThe label sample in the model is used as training data, a local supervision model is established through an instant learning unified optimization modeling method, and the optimization targets of the local supervision model are as follows:

（8）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />The method is a weight coefficient of two subtasks of a balanced instant learning unified optimization modeling method; />Reconstruction coefficient vector +.>Weight matrix of>，/>，/>Is the firstReconstruction coefficients of the individual samples with respect to the query data; />For regression coefficient vector->Weight matrix of>，，/>Is->Regression coefficient vectors corresponding to principal components; />Weight matrix for Euclidean distance vector, < ->；/>For regression coefficient vector->Regularization coefficient of>Reconstruction coefficient vector +.>Is included in the regularization coefficients of (a);

、/>、/>the expression of (2) is as follows:

（9）；

（10）；

（11）；

step 6.2, fixing regression coefficient vectorSolving forReconstructing coefficient vector->Re-describing equation (8) as equation (12):

（12）；

in the method, in the process of the invention,for->Optimization objective of->For->Diagonal matrix of>The diagonal elements of (a) are the training errors,；/>is->The training error of the individual samples is determined,，/>is->True quality variable corresponding to the individual samples, +.>Is->A dimension-reducing dynamic latent variable set of each sample; />Representing all and reconstruction coefficient vectors->An irrelevant constant term;

（13）；

（14）；

in the method, in the process of the invention,for->Is an optimization objective of (1);/>for->Diagonal matrix of>，Is->Reconstruction error of principal component,/->Is the reconstruction error of the sample, defining a reconstruction error vector；/>For about->A diagonal matrix of the individual samples is used,，/>is->Individual samples and query samples about +.>Difference of principal components, ++>Diagonal element of->Distance difference values of each sample and query sample about each feature are defined as distance difference value vectors；

（15）；

5. The method for modeling soft measurements based on instant learning of time series feature extraction of claim 4, wherein the specific process of step 7 is:

step 7.1, collecting query data low-dimensional latent variablesSample set of local neighbors thereof>Merging into one dataset +.>，/>Is->A number of samples; will->Dividing into label sample subsetsLabel-free sample subset->Wherein->Representing the +.sub.f in the tag sample subset>Sample number->Indicate->Corresponding marks of each sampleSign (I)>And->Respectively->Label sample and no label sample number, +.>Is->A label-free sample;

（16）；

in the method, in the process of the invention,for->Is an optimization objective of (1); />Is->Predicted value of mass variable for all samples in +.>Is->The mass of the individual samplesA quantity variable predictive value; />Is the label sample subsetA quality variable predictor for each sample; />For sample set->Middle->Sample and->Similarity of the individual samples in the auxiliary variable space; />Is->A quality variable predictor for each sample; />Is calculated by monitoring model by instant learning unified optimization modeling method>Quality variable predictive value of (2); />、/>All represent weighting coefficients;

（17）；

in the method, in the process of the invention,for a diagonal matrix, the diagonal element corresponding to the label sample is 1, the diagonal element corresponding to the unlabeled sample is 0,is Laplace matrix>For the similarity matrix, the matrix element is +.>Diagonal matrixIs a contiguous matrix, diagonal element->；/>A unit matrix, namely a diagonal matrix with diagonal elements of 1; />Calculated for the local supervision model +.>Predicted values of all samples in (a)/(b)>For the +.>Predicted values for the individual samples; />Is->The label value of the unlabeled exemplar is 0.

6. The method for modeling soft measurements based on instant learning of time series feature extraction of claim 5, wherein the specific process of step 8 is: when data is queried by laboratory assay analysis criteriaWhen the real quality variables are obtained, the samples are added to the initial training dataset +.>Expanding a working interval contained in the modeling data set; otherwise, maintain the initial training data set +.>The samples contained in (a) are unchanged.