CN114528764A

CN114528764A - Soft measurement modeling method and device based on integral optimization and instant learning

Info

Publication number: CN114528764A
Application number: CN202210151538.7A
Authority: CN
Inventors: 王智权; 袁志宏; 白玮; 李秀洁; 吴昂山; 徐飞; 许恒微; 宋垚
Original assignee: Jiangsu Sailboat Petrochemical Co ltd; Tsinghua University
Current assignee: Jiangsu Sailboat Petrochemical Co ltd; Tsinghua University
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-24

Abstract

The method obtains a weight matrix of a historical sample through a collaborative expression algorithm, establishes a weighted ridge regression model through a weighted ridge regression algorithm, fuses the two algorithms to form a unified instant learning optimization target, and finally solves the problem in an alternate iteration mode. Therefore, the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process are well solved, the selection of the similar samples and the construction of the local model are fused into an optimization function, the selection of the similar samples is guided by the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.

Description

Soft measurement modeling method and device based on integral optimization and instant learning

Technical Field

The application relates to the technical field of industrial process detection, in particular to a soft measurement modeling method and device based on integral optimization and immediate learning, electronic equipment and a storage medium.

Background

In the modern industrial production process, a plurality of important quality variables (such as oil viscosity, components and the like) are difficult to measure in real time, and great influence is brought to the control and optimization of the chemical process. Because the problems of difficult sampling of samples on site, high cost of analytical instruments, time lag of analysis and the like exist in the chemical production process, the real-time measurement of the quality quantity is often difficult to carry out by using modes such as an online analytical instrument, an offline test and the like in the actual production process, and the closed-loop control of the quality quantity cannot be formed. Therefore, how to acquire the quality variable in real time becomes a problem to be solved first in process control. Thus, soft measurements have entered the line of sight of research in the field of process industrial control.

Common data-driven soft measurement modeling methods include Principal Component Regression (PCR), Partial Least Squares Regression (PLSR), and Artificial Neural Networks (ANN). The model established by the soft measurement algorithm belongs to an off-line model, and the model cannot be adaptively adjusted along with the change of the generation process after being established, and cannot track the change of the production state, so that the prediction precision is gradually reduced. Therefore, automatic maintenance of the soft measurement model becomes the key point for algorithm research and improvement. Therefore, in order to adapt to the multi-modal and time-varying characteristics exhibited by the modern chemical production process, various on-line modeling algorithms have been widely applied to monitoring and quality variable prediction of the production process.

Currently, mainstream online soft measurement modeling algorithms include: sliding window algorithms, recursive algorithms, time difference algorithms, and immediate learning algorithms. The first three types are updated models according to time correlation and belong to a time self-adaptive algorithm; the immediate learning algorithm is used for updating and maintaining the model based on spatial correlation and belongs to a spatial adaptive algorithm. Compared with other algorithms, the instant learning algorithm has the advantages that the method can better adapt to the mutation phenomenon in the production process, and because the algorithm establishes a local model for each sample, the nonlinear relation among process variables can be well described.

The selection of similar samples or the calculation of sample weights are core steps of the instantaneous learning algorithm, and can greatly influence the prediction accuracy of the algorithm. For the conventional learning-on-the-fly algorithm, on one hand, the selection of some adjustable parameters in the algorithm is often very difficult, such as the kernel width parameter in the LWPLS algorithm and the number of similar samples in the LWLS algorithm, and the selection of these parameters has no clear theoretical experience guidance and has a large influence on the performance of the model; on the other hand, two core steps of the algorithm: selecting similar samples and building a local model are independent of each other, which may result in the selected similar samples being sub-optimal for the local model. I.e. the selected similar samples are used to build the local model, but the information obtained by building the local model is not used to guide the selection of the similar samples. As can be seen from the above, the model established by the conventional immediate learning algorithm has a problem of poor prediction accuracy, and needs to be solved urgently.

Disclosure of Invention

The application provides a soft measurement modeling method, a device, electronic equipment and a storage medium for immediate learning based on integral optimization, and aims to solve the problems of poor prediction precision and the like in the prior art.

The embodiment of the first aspect of the application provides a soft measurement modeling method for instantaneous learning based on integral optimization, which comprises the following steps: calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix; fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.

Optionally, in an embodiment of the present application, before the establishing a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further includes: constructing and storing the preset auxiliary variable data set in the industrial process; analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.

Optionally, in an embodiment of the present application, the normalization process is:

wherein the content of the first and second substances,

to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.

Optionally, in an embodiment of the present application, before the calculating, according to the preset input variable weight matrix, weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set, the method further includes:

establishing an offline ridge regression model according to the target training data set, wherein the optimization target is as follows:

wherein the content of the first and second substances,

is the ridge regression coefficient, lambda, of an offline ridge regression model₀Is a regularization term coefficient, X, of an offline ridge regression model_LFor the normalized auxiliary variable data to be used,

the value of the standardized real quality variable is obtained;

solving the optimization target of the off-line ridge regression model to obtain a ridge regression coefficient W of the off-line ridge regression model₀：

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L

Wherein, X_L ^TAs data X_LI is an identity matrix;

calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:

wherein, W₀(1) Is a ridge regression coefficient W₀The first element of (1), W₀(m) is the ridge regression coefficient W₀The mth element of (1).

Optionally, in an embodiment of the present application, the calculating, according to a preset input variable weight matrix, weighted euclidean distances between acquired query data and all samples in a preset auxiliary variable data set, fusing the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculating, by using the target collaborative representation model, a weight matrix of each historical sample in the preset auxiliary variable data set, includes:

calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:

Dx_q＝W_var(x_q×1-X_L)

wherein x is_qIn order to query the data for it,

as a diagonal matrix

And the elements are all 1, symbols

Representing the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;

establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:

wherein b is a co-expression coefficient, λ₁In the case of the regular term coefficients,

performing two-norm operation;

calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q

obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:

wherein, b₁Is the first element of the vector b, b_nThe nth element of the vector b.

Optionally, in an embodiment of the present application, the building a weighted ridge regression model according to a target training data set and a weight matrix of each historical sample, fusing an optimization target of the target collaborative representation model and the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients includes:

establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:

wherein, the first and the second end of the pipe are connected with each other,

as coefficients of a weighted ridge regression model, lambda₂Is a ridge regression regularization term coefficient;

fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:

wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda₁In order to co-represent the regular term coefficients,

calculating the weighted ridge regression model coefficient w:

wherein the content of the first and second substances,

computing predicted values for the query data using the weighted ridge regression model coefficients

Wherein the content of the first and second substances,

for querying data x_qThe transposing of (1).

Optionally, in an embodiment of the present application, an actual value of the query data is calculated, and the actual value and the query data are added to the target training data set.

The embodiment of the second aspect of the present application provides a soft measurement modeling apparatus for instantaneous learning based on global optimization, including: the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.

Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.

wherein the content of the first and second substances,

Optionally, in an embodiment of the present application, the method further includes:

the modeling module is used for building an offline ridge regression model according to the target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:

wherein the content of the first and second substances,

ridge regression coefficient, λ, for an offline ridge regression model₀Is a regularization term coefficient, X, of an offline ridge regression model_LFor the normalized auxiliary variable data to be used,

the value of the standardized real quality variable is obtained;

a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model₀：

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L

Wherein, X_L ^TAs data X_LI is an identity matrix;

a weight calculation module, configured to calculate a weight matrix of each input variable according to the ridge regression coefficient to obtain the preset input variable weight matrix:

Optionally, in an embodiment of the present application, the calculation module is specifically configured to:

Dx_q＝W_var(x_q×1-X_L)

wherein x is_qIn order to query the data for it,

as a diagonal matrix

And the elements are all 1, symbols

Representing the multiplication of corresponding elements of two matrices, the function sum (-) representing the addition of the rows of the matrices;

wherein b is a co-expression coefficient, λ₁Is a coefficient of a regular term and is,

performing two-norm operation;

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q

Optionally, in an embodiment of the present application, the prediction module is specifically configured to:

fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target which is:

calculating the weighted ridge regression model coefficient w:

wherein the content of the first and second substances,

calculating a predicted value of the query data using the weighted ridge regression model coefficients:

wherein the content of the first and second substances,

for querying data x_qThe transposing of (1).

Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.

An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the soft measurement modeling method based on ensemble optimization learning-on-demand as described in the above embodiments.

A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the soft measurement modeling method based on ensemble optimization and point-of-care learning as described in the foregoing embodiments.

Therefore, the application has at least the following beneficial effects:

the data self-expression algorithm is introduced into the instant learning, and is improved aiming at the application background of the instant learning, specifically, on one hand, the weight of the input variable is considered while the sample weight is calculated through the self-expression algorithm; on the other hand, the weighted Euclidean distance between the query sample and the historical sample is calculated and is used as a regular term of the algorithm, so that the local spatial distance information of the data is fused. Compared with other existing algorithms, the method and the device have the advantages that selection of similar samples or calculation of sample weights is converted into an optimization problem, and reasonability and reliability of the sample weights are improved. In addition, compared with the traditional algorithm in which the selection of similar samples and the establishment of local models are independent, the method and the device realize the selection of similar samples and the establishment of local models simultaneously through a unified optimization target, and improve the model optimization efficiency and the prediction precision. Therefore, the problems of poor prediction precision and the like in the prior art are solved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application;

FIG. 2 is a process schematic of a Debutanizer (DCP) provided according to an embodiment of the present application;

FIG. 3 is a graph of a real output of debutanizer process data provided in accordance with an embodiment of the present application;

FIG. 4 is a graphical illustration of a prediction bias for debutanizer data for a global optimization-based just-in-time learning soft measurement modeling method according to an embodiment of the present application;

FIG. 5 is a graphical illustration of predicted deviation of existing partial weighted partial least squares algorithms for debutanizer data, provided in accordance with one embodiment of the present application;

FIG. 6 is a diagram of an example of a soft measurement modeling apparatus for holistic optimization based just-in-time learning according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Description of reference numerals: a calculation module-100, a prediction module-200, a memory-701, a processor-702, and a communication interface-703.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

A soft measurement modeling method, an apparatus, an electronic device, and a storage medium for ensemble optimization-based point-of-care learning according to embodiments of the present application are described below with reference to the accompanying drawings. Aiming at the problems of time-varying and multi-modal characteristics in the industrial process, multiple collinearity in industrial data and the like generally mentioned in the background art, the application provides a soft measurement modeling method based on integral optimization and in the method, a soft measurement model is established through an instant learning algorithm, so that the time-varying and multi-modal problems are solved; local models (namely an offline ridge regression model and a weighted ridge regression model) are established through a ridge regression algorithm, the problem of multiple collinearity of process data is solved, and the calculation efficiency is high. In addition, the selection of similar samples is converted into an optimization problem and is fused with a local model optimization target, so that the modeling process is optimized, and the reliability of sample weight and the prediction precision of a soft measurement model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.

Specifically, fig. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application.

As shown in fig. 1, the soft measurement modeling method based on the global optimization and the immediate learning includes the following steps:

in step S101, the weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set are calculated according to the preset input variable weight matrix.

It should be noted that the preset auxiliary variable data set is obtained by normalizing the target training data set, and the specific process is as follows.

Optionally, in an embodiment of the present application, constructing the target training data set includes: constructing a preset auxiliary variable data set in the industrial process; analyzing a preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; establishing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set, wherein the standardization processing comprises the following steps:

wherein the function mean (-) represents the mean of each row of the calculation matrix, and the function std (-) represents the standard deviation of each row of the calculation matrix. Thereby obtaining a target training data set.

In particular, in embodiments of the present application, data X ═ X of auxiliary variables related to quality in an industrial process are collected and stored in real time by in-situ sensors and storage devices₁,x₂,…x_n]^T，X＝[x₁,x₂,…x_n]^TN is the number of samples, and m is the dimension of the samples; the real quality variable value corresponding to each sample is obtained by analyzing the acquired data through laboratory tests

Using the collected data as an initial training data set

For the initial training data set

The normalization process is performed according to the formula (1) to make the mean value 0 and the variance 1, and a training data set is obtained

X_LThe data obtained after the process is standardized for the data X,

to becomeAnd (5) normalizing the value of the variable obtained after the processing.

Optionally, in an embodiment of the present application, before calculating weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to the preset input variable weight matrix, the method further includes:

using training data sets

Establishing an off-line ridge regression model, and calculating a weight matrix of each input variable through a regression coefficient of the model

The method comprises the following specific steps:

using training data sets

An off-line ridge regression model is established, and the optimization goal is as follows:

wherein the content of the first and second substances,

ridge regression coefficient, λ, for an offline ridge regression model₀The regular term coefficients are of an offline ridge regression model; solving the optimization target to obtain a ridge regression coefficient W of the offline ridge regression model₀The analytical expression of (a) is:

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L (3)

wherein, X_L ^TAs data X_LI is a unit matrix;

ridge regression coefficient W from offline ridge regression model₀A weight matrix of each input variable is calculated by equation (4), where equation (4) is expressed as:

Optionally, in one embodiment of the present application, the query data is newly collected

Normalizing according to equation (1) based on the weight matrix W_varCalculating query data x by equations (5) and (6)_qAnd data X_LWeighted euclidean distances d of all samples in; the expressions of formula (5) and formula (6) are:

Dx_q＝W_var(x_q×1-X_L) (5)

in the formula (I), the compound is shown in the specification,

as a diagonal matrix

And the elements are all 1, symbols

Representing the multiplication of the corresponding elements of the two matrices, and the function sum (-) representing the addition of the rows of the matrices.

In step S102, the weighted euclidean distance is fused into the collaborative representation regular term to obtain a target collaborative representation model, and a weight matrix of each historical sample in the preset auxiliary variable data set is calculated by using the target collaborative representation model.

Optionally, in an embodiment of the present application, query data x is established_qAnd training data set

The collaborative representation model is combined with the weight matrix and the weighted Euclidean distance, and the optimization target is as follows:

wherein the content of the first and second substances,

for co-expression of coefficients, λ₁Is a regular term coefficient.

Calculating data X by equation (8)_LAnd query data x_qThe formula (8) is expressed as:

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q (8)

deriving a training data set by equation (9) using co-expression coefficients

Weight matrix of historical samples

Formula (9) is represented as:

In step S103, a weighted ridge regression model is established according to the target training data set and the weight matrix of each historical sample, an optimization target of the collaborative representation algorithm is fused with an optimization target of the weighted ridge regression algorithm, and data X is calculated_LAnd query data x_qCo-expression coefficients and local model coefficients.

Specifically, in the embodiment of the present application, the specific steps of solving the co-expression coefficients and the local model coefficients through the unified optimization objective are as follows:

according to a training set

And a weight matrix W_sampleEstablishing a weighted ridge regression model, wherein the optimization goal is as follows:

as coefficients of a weighted ridge regression model, lambda₂Is a ridge regression regularization term coefficient.

The improved collaborative representation optimization target and the weighted ridge regression optimization target are combined in a weighted mode, namely the formula (7) and the formula (10), so that a unified instant learning optimization target can be obtained, and the formula (11) is shown as follows:

wherein a is a weight coefficient of the improved collaborative representation algorithm and the weighted ridge regression algorithm.

First, by solving the co-expression coefficient b by fixing the local model coefficient w, equation (11) can be described anew as equation (12):

wherein the content of the first and second substances,

cst (b) represents an item independent of b. The improved co-expression coefficient b is calculated by equation (13), where equation (13) is:

then, the fixed co-expression coefficients b solve for the local model coefficients w, and equation (11) can be re-described as equation (14):

wherein the content of the first and second substances,

cst (w) represents an item independent of w.

The coefficient w of the local model is calculated by equation (15), where equation (15) is expressed as:

Specifically, in an embodiment of the present application, query data x is calculated by equation (16) using weighted ridge regression model coefficients w_qOutput value of

Formula (16) is represented as:

when the real output value y is obtained through laboratory test analysis_qThen, the sample [ x ]_q,y_q]Adding to training data set

To expand the training data set

The working interval contained in (1); otherwise, the training data set is maintained

The space contained in (a) does not change.

The soft measurement modeling method based on the integral optimization and the instant learning will be explained by taking the process data of the debutanizer as an example in combination with the attached drawings.

The Debutanizer (DCP) is part of a desulfurization and naphtha splitting plant, whose task is to reduce the concentration of butane in the bottoms as much as possible. The principle of a DCP is shown in fig. 2. Generally, the concentration of the butane at the bottom of the tower is measured on line by a gas chromatograph arranged at the top of the tower, and as certain time is required for the butane steam at the bottom of the tower to reach the top of the tower, and certain time is also required for the analysis process of the gas chromatograph, the on-line measurement of the concentration of the butane at the bottom of the tower has large lag, so that a soft measurement model needs to be established to estimate the concentration of the butane at the bottom of the tower on line in real time. In establishing a soft measurement model of the butane concentration at the bottom of the column, seven variables installed in the debutanizer column (see fig. 2) were selected as auxiliary variables, and an explanation of these seven auxiliary variables is given in table 1. The data set is from an actual industrial process, the number of samples is 2382, and the actual output curve is shown in fig. 3.

Table 1 description of auxiliary variables

The following description of the specific steps of the present application is made in conjunction with the debutanizer process:

1. the acquired data is used as a training data set and is preprocessed.

Firstly, preprocessing all samples and deleting abnormal samples in the samples; then, taking into account the processDynamic characteristics, performing dimension expansion on all samples according to the following formula, wherein the dimension of the expanded sample is 30; finally, carrying out standardization processing to obtain a final training data set

Then:

wherein the content of the first and second substances,

represents the predicted value of the soft measurement model to the concentration of butane at the bottom of the tower, f_DCP(. represents butane concentration and X)₁～X₇The potential relationship of (a).

Further obtaining:

2. an offline ridge regression model is established using the training data set, and a weight matrix for each input variable is calculated.

From a training data set

Establishing an offline ridge regression model, and ridge regression coefficients from the model

Calculating a weight matrix for each input variable

3. And collecting new data for standardization processing.

For newly collected query data

According to the criteria of the training data setThe normalization approach normalizes the data.

4. Calculating sample collaborative representation coefficients simultaneously according to unified optimization objective

And weighted ridge regression model coefficients

First, the collected query data x is calculated_qWeighted Euclidean distance from training samples

Then, d is fused into a regular term of the collaborative representation, and the collaborative representation is fused with a weighted ridge regression algorithm to obtain a uniform instant learning optimization target; finally, calculating the sample collaborative representation coefficient by an alternative iterative optimization mode

And local model coefficients

5. The output values of the query data are predicted based on the weighted ridge regression model coefficients.

Computing query data x from weighted ridge regression coefficients w_qPredicted output value of

When the true value y of the output variable is obtained_qThen, the sample [ x ]_q；y_q]Adding to training data set

The predicted deviation values of the method described herein (UniJITL) and the conventional partial weighted partial least squares (LWPLS) algorithm for the debutanizer data output variables are shown in fig. 4 and 5. As can be seen from fig. 4 and 5, the method of the present application has higher prediction accuracy than the conventional method.

According to the soft measurement modeling method based on the integral optimization and the instant learning, a weight matrix of a historical sample is obtained through a collaborative expression algorithm, a weighted ridge regression model is established through a weighted ridge regression algorithm, and the two algorithms are fused to form a unified optimization target. For the collected query data, firstly, the weighted Euclidean distance between the query data and the training sample is calculated, the weighted Euclidean distance is fused into a regular term represented in a collaborative mode, and through a unified optimization target, the selection of similar samples and the establishment of a local model are achieved at the same time. The method provided by the embodiment of the application not only can well solve the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process, but also can integrate similar sample selection and local model construction into an optimization function, so that the selection of similar samples is guided by using the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved.

Next, a soft measurement modeling apparatus for point-of-care learning based on global optimization according to an embodiment of the present application will be described with reference to the drawings.

Fig. 6 is a block diagram of a soft measurement modeling apparatus based on ensemble optimization and learning-on-demand according to an embodiment of the present application.

As shown in fig. 6, the soft measurement modeling apparatus 10 for instantaneous learning based on global optimization includes: a calculation module 100 and a prediction module 200.

The calculation module 100 is configured to calculate weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to a preset input variable weight matrix, fuse the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculate a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module 200 is configured to establish a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, fuse an optimized target of the target collaborative representation model and the weighted ridge regression model, and calculate a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model.

Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing a preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain a target training data set.

wherein the content of the first and second substances,

the modeling module is used for establishing an offline ridge regression model according to a target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:

wherein the content of the first and second substances,

is the ridge regression coefficient, lambda, of an offline ridge regression model₀Regularization term for offline ridge regression modelCoefficient, X_LFor the normalized auxiliary variable data to be used,

the value of the real mass variable after standardization;

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L

Wherein X_L ^TAs data X_LI is an identity matrix;

the weight calculation module is used for calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:

Optionally, in an embodiment of the present application, the computing module 100 is specifically configured to,

calculating the weighted Euclidean distance between the query data and all samples in the standardized preset auxiliary variable data set according to a preset input variable weight matrix:

Dx_q＝W_var(x_q×1-X_L)

wherein x is_qIn order to query the data in the database,

as a diagonal matrix

And the elements are all 1, symbols

establishing a collaborative representation model of the query data and a target training data set, and fusing a preset input variable weight matrix and a weighted Euclidean distance, wherein the optimization target is as follows:

performing two-norm operation;

calculating a collaborative representation coefficient b of the normalized preset auxiliary variable data set and the query data:

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q

and obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:

Optionally, in one embodiment of the present application, the prediction module 200, in particular for,

establishing a weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:

wherein the content of the first and second substances,

calculating a weighted ridge regression model coefficient w:

wherein the content of the first and second substances,

calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model:

wherein the content of the first and second substances,

for querying data x_qThe transposing of (1).

Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating the actual value of the query data and adding the actual value and the query data to the target training data set.

It should be noted that the foregoing explanation of the embodiment of the soft measurement modeling method based on global optimization and immediate learning is also applicable to the soft measurement modeling apparatus based on global optimization and immediate learning of the embodiment, and details are not repeated here.

According to the soft measurement modeling device based on the instantaneous learning of the overall optimization, the selection of the similar samples is converted into the optimization problem, and the optimization problem is fused with the optimization target of the local model, so that the overall optimization of the selection of the similar samples and the establishment of the local model is realized, the reasonability and the reliability of the sample weight are improved, and the optimization efficiency and the prediction precision of the model are improved.

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:

memory 701, processor 702, and a computer program stored on memory 701 and executable on processor 702.

The processor 702, when executing the program, implements the soft metric modeling approach for holistic optimization based just-in-time learning provided in the embodiments described above.

Further, the electronic device further includes:

a communication interface 703 for communication between the memory 701 and the processor 702.

A memory 701 for storing computer programs operable on the processor 702.

The memory 701 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the communication interface 703, the memory 701 and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.

Optionally, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may complete mutual communication through an internal interface.

The processor 702 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.

The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the soft metric modeling method based on ensemble-optimized just-in-time learning as above.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

Claims

1. A soft measurement modeling method of instantaneous learning based on integral optimization is characterized by comprising the following steps:

calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix;

fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;

establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data;

and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.

2. The method of claim 1, wherein before establishing the weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further comprises:

constructing and storing the preset auxiliary variable data set in the industrial process;

analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;

and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.

3. The method of claim 2, wherein the normalization process is:

wherein the content of the first and second substances,

4. The method of claim 1, wherein before calculating the weighted euclidean distances between the collected query data and all samples in the predetermined auxiliary variable data set according to the predetermined input variable weight matrix, the method further comprises:

wherein the content of the first and second substances,

is the ridge regression coefficient, lambda, of an offline ridge regression model₀Is a regular term coefficient, X, of an off-line ridge regression model_LFor the normalized auxiliary variable data to be used,

the value of the standardized real quality variable is obtained;

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L

Wherein, X_L ^TAs data X_LI is an identity matrix;

5. The method according to claim 4, wherein the calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model includes:

Dx_q＝W_var(x_q×1-X_L)

wherein x is_qIn order to query the data for it,

is a diagonal matrix D_i,i＝d_i,i＝1,2,…,n，

And the elements are all 1, symbols

performing two-norm operation;

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q

6. The method as claimed in claim 5, wherein the building a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target co-representation model and an optimization target of the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients comprises:

establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical sample, wherein the optimization target is as follows:

wherein the content of the first and second substances,

calculating the weighted ridge regression model coefficient w:

wherein the content of the first and second substances,

Wherein the content of the first and second substances,

for querying data x_qThe transposing of (1).

7. The method according to any one of claims 1 to 6,

calculating an actual value of the query data, adding the actual value and the query data to the target training data set.

8. An instantaneous learning soft measurement modeling device based on global optimization, comprising:

the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;

the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and an optimized target of the weighted ridge regression model, and calculating a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.

9. The apparatus of claim 8, further comprising:

the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process;

the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;

and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.

10. The apparatus of claim 9, wherein the normalization process is:

wherein the content of the first and second substances,

for a data set that needs to be normalized, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.

11. The apparatus of claim 8, further comprising:

wherein the content of the first and second substances,

the value of the standardized real quality variable is obtained;

W₀＝(X_LX_L ^T+λ₀×I)^-1X_LY_L

Wherein, X_L ^TAs data X_LI is an identity matrix;

12. The apparatus of claim 11, wherein the computing module is specifically configured to:

Dx_q＝W_var(x_q×1-X_L)

wherein x is_qIn order to query the data for it,

is a diagonal matrix D_i,i＝d_i,i＝1,2,…,n，

And the elements are all 1, symbols

performing two-norm operation;

b＝(X_L ^TW_varX_L+λ₁D)^-1X_L ^TW_varx_q

13. The apparatus of claim 12, wherein the prediction module is specifically configured to:

wherein the content of the first and second substances,

as coefficients of a weighted ridge regression model, lambda₂A ridge regression regularization term coefficient;

calculating the weighted ridge regression model coefficient w:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

for querying data x_qThe transposing of (1).

14. The apparatus of any one of claims 8-13, further comprising:

and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.

15. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the soft metric modeling method based on ensemble optimized just-in-time learning of any of claims 1-7.

16. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a soft metric modeling method based on ensemble optimized just-in-time learning according to any of claims 1-7.