CN114528764A - Soft measurement modeling method and device based on integral optimization and instant learning - Google Patents
Soft measurement modeling method and device based on integral optimization and instant learning Download PDFInfo
- Publication number
- CN114528764A CN114528764A CN202210151538.7A CN202210151538A CN114528764A CN 114528764 A CN114528764 A CN 114528764A CN 202210151538 A CN202210151538 A CN 202210151538A CN 114528764 A CN114528764 A CN 114528764A
- Authority
- CN
- China
- Prior art keywords
- ridge regression
- data set
- weighted
- target
- coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005457 optimization Methods 0.000 title claims abstract description 92
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000005259 measurement Methods 0.000 title claims description 37
- 239000011159 matrix material Substances 0.000 claims abstract description 96
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 47
- 230000006870 function Effects 0.000 claims abstract description 24
- 238000004519 manufacturing process Methods 0.000 claims abstract description 16
- 238000010276 construction Methods 0.000 claims abstract description 5
- 238000012549 training Methods 0.000 claims description 75
- 239000000126 substance Substances 0.000 claims description 28
- 230000008569 process Effects 0.000 claims description 22
- 230000004186 co-expression Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000012545 processing Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 8
- 238000003860 storage Methods 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 7
- 238000004458 analytical method Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 4
- 230000014509 gene expression Effects 0.000 abstract description 4
- 238000004891 communication Methods 0.000 description 9
- 239000001273 butane Substances 0.000 description 8
- IJDNQMDRQITEOD-UHFFFAOYSA-N n-butane Chemical compound CCCC IJDNQMDRQITEOD-UHFFFAOYSA-N 0.000 description 8
- OFBQJSOFQDEBGM-UHFFFAOYSA-N n-pentane Natural products CCCCC OFBQJSOFQDEBGM-UHFFFAOYSA-N 0.000 description 8
- 238000010586 diagram Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000012824 chemical production Methods 0.000 description 2
- 238000009533 lab test Methods 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 238000010238 partial least squares regression Methods 0.000 description 2
- 238000012628 principal component regression Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 238000012369 In process control Methods 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001311 chemical methods and process Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000006477 desulfuration reaction Methods 0.000 description 1
- 230000023556 desulfurization Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010965 in-process control Methods 0.000 description 1
- 238000011065 in-situ storage Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000004540 process dynamic Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F30/00—Computer-aided design [CAD]
- G06F30/20—Design optimisation, verification or simulation
- G06F30/27—Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Algebra (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computer Hardware Design (AREA)
- Geometry (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The method obtains a weight matrix of a historical sample through a collaborative expression algorithm, establishes a weighted ridge regression model through a weighted ridge regression algorithm, fuses the two algorithms to form a unified instant learning optimization target, and finally solves the problem in an alternate iteration mode. Therefore, the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process are well solved, the selection of the similar samples and the construction of the local model are fused into an optimization function, the selection of the similar samples is guided by the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.
Description
Technical Field
The application relates to the technical field of industrial process detection, in particular to a soft measurement modeling method and device based on integral optimization and immediate learning, electronic equipment and a storage medium.
Background
In the modern industrial production process, a plurality of important quality variables (such as oil viscosity, components and the like) are difficult to measure in real time, and great influence is brought to the control and optimization of the chemical process. Because the problems of difficult sampling of samples on site, high cost of analytical instruments, time lag of analysis and the like exist in the chemical production process, the real-time measurement of the quality quantity is often difficult to carry out by using modes such as an online analytical instrument, an offline test and the like in the actual production process, and the closed-loop control of the quality quantity cannot be formed. Therefore, how to acquire the quality variable in real time becomes a problem to be solved first in process control. Thus, soft measurements have entered the line of sight of research in the field of process industrial control.
Common data-driven soft measurement modeling methods include Principal Component Regression (PCR), Partial Least Squares Regression (PLSR), and Artificial Neural Networks (ANN). The model established by the soft measurement algorithm belongs to an off-line model, and the model cannot be adaptively adjusted along with the change of the generation process after being established, and cannot track the change of the production state, so that the prediction precision is gradually reduced. Therefore, automatic maintenance of the soft measurement model becomes the key point for algorithm research and improvement. Therefore, in order to adapt to the multi-modal and time-varying characteristics exhibited by the modern chemical production process, various on-line modeling algorithms have been widely applied to monitoring and quality variable prediction of the production process.
Currently, mainstream online soft measurement modeling algorithms include: sliding window algorithms, recursive algorithms, time difference algorithms, and immediate learning algorithms. The first three types are updated models according to time correlation and belong to a time self-adaptive algorithm; the immediate learning algorithm is used for updating and maintaining the model based on spatial correlation and belongs to a spatial adaptive algorithm. Compared with other algorithms, the instant learning algorithm has the advantages that the method can better adapt to the mutation phenomenon in the production process, and because the algorithm establishes a local model for each sample, the nonlinear relation among process variables can be well described.
The selection of similar samples or the calculation of sample weights are core steps of the instantaneous learning algorithm, and can greatly influence the prediction accuracy of the algorithm. For the conventional learning-on-the-fly algorithm, on one hand, the selection of some adjustable parameters in the algorithm is often very difficult, such as the kernel width parameter in the LWPLS algorithm and the number of similar samples in the LWLS algorithm, and the selection of these parameters has no clear theoretical experience guidance and has a large influence on the performance of the model; on the other hand, two core steps of the algorithm: selecting similar samples and building a local model are independent of each other, which may result in the selected similar samples being sub-optimal for the local model. I.e. the selected similar samples are used to build the local model, but the information obtained by building the local model is not used to guide the selection of the similar samples. As can be seen from the above, the model established by the conventional immediate learning algorithm has a problem of poor prediction accuracy, and needs to be solved urgently.
Disclosure of Invention
The application provides a soft measurement modeling method, a device, electronic equipment and a storage medium for immediate learning based on integral optimization, and aims to solve the problems of poor prediction precision and the like in the prior art.
The embodiment of the first aspect of the application provides a soft measurement modeling method for instantaneous learning based on integral optimization, which comprises the following steps: calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix; fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
Optionally, in an embodiment of the present application, before the establishing a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further includes: constructing and storing the preset auxiliary variable data set in the industrial process; analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
wherein the content of the first and second substances,to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, before the calculating, according to the preset input variable weight matrix, weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set, the method further includes:
establishing an offline ridge regression model according to the target training data set, wherein the optimization target is as follows:
wherein the content of the first and second substances,is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,the value of the standardized real quality variable is obtained;
solving the optimization target of the off-line ridge regression model to obtain a ridge regression coefficient W of the off-line ridge regression model0:
W0=(XLXL T+λ0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the calculating, according to a preset input variable weight matrix, weighted euclidean distances between acquired query data and all samples in a preset auxiliary variable data set, fusing the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculating, by using the target collaborative representation model, a weight matrix of each historical sample in the preset auxiliary variable data set, includes:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
wherein x isqIn order to query the data for it,as a diagonal matrixAnd the elements are all 1, symbolsRepresenting the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL+λ1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in an embodiment of the present application, the building a weighted ridge regression model according to a target training data set and a weight matrix of each historical sample, fusing an optimization target of the target collaborative representation model and the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients includes:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
wherein, the first and the second end of the pipe are connected with each other,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
computing predicted values for the query data using the weighted ridge regression model coefficients
Optionally, in an embodiment of the present application, an actual value of the query data is calculated, and the actual value and the query data are added to the target training data set.
The embodiment of the second aspect of the present application provides a soft measurement modeling apparatus for instantaneous learning based on global optimization, including: the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
wherein the content of the first and second substances,to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, the method further includes:
the modeling module is used for building an offline ridge regression model according to the target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
wherein the content of the first and second substances,ridge regression coefficient, λ, for an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,the value of the standardized real quality variable is obtained;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0:
W0=(XLXL T+λ0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
a weight calculation module, configured to calculate a weight matrix of each input variable according to the ridge regression coefficient to obtain the preset input variable weight matrix:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the calculation module is specifically configured to:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
wherein x isqIn order to query the data for it,as a diagonal matrixAnd the elements are all 1, symbolsRepresenting the multiplication of corresponding elements of two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
wherein b is a co-expression coefficient, λ1Is a coefficient of a regular term and is,performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL+λ1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in an embodiment of the present application, the prediction module is specifically configured to:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
wherein, the first and the second end of the pipe are connected with each other,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target which is:
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
calculating a predicted value of the query data using the weighted ridge regression model coefficients:
Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the soft measurement modeling method based on ensemble optimization learning-on-demand as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the soft measurement modeling method based on ensemble optimization and point-of-care learning as described in the foregoing embodiments.
Therefore, the application has at least the following beneficial effects:
the data self-expression algorithm is introduced into the instant learning, and is improved aiming at the application background of the instant learning, specifically, on one hand, the weight of the input variable is considered while the sample weight is calculated through the self-expression algorithm; on the other hand, the weighted Euclidean distance between the query sample and the historical sample is calculated and is used as a regular term of the algorithm, so that the local spatial distance information of the data is fused. Compared with other existing algorithms, the method and the device have the advantages that selection of similar samples or calculation of sample weights is converted into an optimization problem, and reasonability and reliability of the sample weights are improved. In addition, compared with the traditional algorithm in which the selection of similar samples and the establishment of local models are independent, the method and the device realize the selection of similar samples and the establishment of local models simultaneously through a unified optimization target, and improve the model optimization efficiency and the prediction precision. Therefore, the problems of poor prediction precision and the like in the prior art are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application;
FIG. 2 is a process schematic of a Debutanizer (DCP) provided according to an embodiment of the present application;
FIG. 3 is a graph of a real output of debutanizer process data provided in accordance with an embodiment of the present application;
FIG. 4 is a graphical illustration of a prediction bias for debutanizer data for a global optimization-based just-in-time learning soft measurement modeling method according to an embodiment of the present application;
FIG. 5 is a graphical illustration of predicted deviation of existing partial weighted partial least squares algorithms for debutanizer data, provided in accordance with one embodiment of the present application;
FIG. 6 is a diagram of an example of a soft measurement modeling apparatus for holistic optimization based just-in-time learning according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of reference numerals: a calculation module-100, a prediction module-200, a memory-701, a processor-702, and a communication interface-703.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
A soft measurement modeling method, an apparatus, an electronic device, and a storage medium for ensemble optimization-based point-of-care learning according to embodiments of the present application are described below with reference to the accompanying drawings. Aiming at the problems of time-varying and multi-modal characteristics in the industrial process, multiple collinearity in industrial data and the like generally mentioned in the background art, the application provides a soft measurement modeling method based on integral optimization and in the method, a soft measurement model is established through an instant learning algorithm, so that the time-varying and multi-modal problems are solved; local models (namely an offline ridge regression model and a weighted ridge regression model) are established through a ridge regression algorithm, the problem of multiple collinearity of process data is solved, and the calculation efficiency is high. In addition, the selection of similar samples is converted into an optimization problem and is fused with a local model optimization target, so that the modeling process is optimized, and the reliability of sample weight and the prediction precision of a soft measurement model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.
Specifically, fig. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application.
As shown in fig. 1, the soft measurement modeling method based on the global optimization and the immediate learning includes the following steps:
in step S101, the weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set are calculated according to the preset input variable weight matrix.
It should be noted that the preset auxiliary variable data set is obtained by normalizing the target training data set, and the specific process is as follows.
Optionally, in an embodiment of the present application, constructing the target training data set includes: constructing a preset auxiliary variable data set in the industrial process; analyzing a preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; establishing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set, wherein the standardization processing comprises the following steps:
wherein the function mean (-) represents the mean of each row of the calculation matrix, and the function std (-) represents the standard deviation of each row of the calculation matrix. Thereby obtaining a target training data set.
In particular, in embodiments of the present application, data X ═ X of auxiliary variables related to quality in an industrial process are collected and stored in real time by in-situ sensors and storage devices1,x2,…xn]T,X=[x1,x2,…xn]TN is the number of samples, and m is the dimension of the samples; the real quality variable value corresponding to each sample is obtained by analyzing the acquired data through laboratory testsUsing the collected data as an initial training data setFor the initial training data setThe normalization process is performed according to the formula (1) to make the mean value 0 and the variance 1, and a training data set is obtainedXLThe data obtained after the process is standardized for the data X,to becomeAnd (5) normalizing the value of the variable obtained after the processing.
Optionally, in an embodiment of the present application, before calculating weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to the preset input variable weight matrix, the method further includes:
using training data setsEstablishing an off-line ridge regression model, and calculating a weight matrix of each input variable through a regression coefficient of the modelThe method comprises the following specific steps:
using training data setsAn off-line ridge regression model is established, and the optimization goal is as follows:
wherein the content of the first and second substances,ridge regression coefficient, λ, for an offline ridge regression model0The regular term coefficients are of an offline ridge regression model; solving the optimization target to obtain a ridge regression coefficient W of the offline ridge regression model0The analytical expression of (a) is:
W0=(XLXL T+λ0×I)-1XLYL (3)
wherein, XL TAs data XLI is a unit matrix;
ridge regression coefficient W from offline ridge regression model0A weight matrix of each input variable is calculated by equation (4), where equation (4) is expressed as:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in one embodiment of the present application, the query data is newly collectedNormalizing according to equation (1) based on the weight matrix WvarCalculating query data x by equations (5) and (6)qAnd data XLWeighted euclidean distances d of all samples in; the expressions of formula (5) and formula (6) are:
Dxq=Wvar(xq×1-XL) (5)
in the formula (I), the compound is shown in the specification,as a diagonal matrixAnd the elements are all 1, symbolsRepresenting the multiplication of the corresponding elements of the two matrices, and the function sum (-) representing the addition of the rows of the matrices.
In step S102, the weighted euclidean distance is fused into the collaborative representation regular term to obtain a target collaborative representation model, and a weight matrix of each historical sample in the preset auxiliary variable data set is calculated by using the target collaborative representation model.
Optionally, in an embodiment of the present application, query data x is establishedqAnd training data setThe collaborative representation model is combined with the weight matrix and the weighted Euclidean distance, and the optimization target is as follows:
wherein the content of the first and second substances,for co-expression of coefficients, λ1Is a regular term coefficient.
Calculating data X by equation (8)LAnd query data xqThe formula (8) is expressed as:
b=(XL TWvarXL+λ1D)-1XL TWvarxq (8)
deriving a training data set by equation (9) using co-expression coefficientsWeight matrix of historical samplesFormula (9) is represented as:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
In step S103, a weighted ridge regression model is established according to the target training data set and the weight matrix of each historical sample, an optimization target of the collaborative representation algorithm is fused with an optimization target of the weighted ridge regression algorithm, and data X is calculatedLAnd query data xqCo-expression coefficients and local model coefficients.
Specifically, in the embodiment of the present application, the specific steps of solving the co-expression coefficients and the local model coefficients through the unified optimization objective are as follows:
according to a training setAnd a weight matrix WsampleEstablishing a weighted ridge regression model, wherein the optimization goal is as follows:
wherein, the first and the second end of the pipe are connected with each other,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient.
The improved collaborative representation optimization target and the weighted ridge regression optimization target are combined in a weighted mode, namely the formula (7) and the formula (10), so that a unified instant learning optimization target can be obtained, and the formula (11) is shown as follows:
wherein a is a weight coefficient of the improved collaborative representation algorithm and the weighted ridge regression algorithm.
First, by solving the co-expression coefficient b by fixing the local model coefficient w, equation (11) can be described anew as equation (12):
wherein the content of the first and second substances,cst (b) represents an item independent of b. The improved co-expression coefficient b is calculated by equation (13), where equation (13) is:
then, the fixed co-expression coefficients b solve for the local model coefficients w, and equation (11) can be re-described as equation (14):
The coefficient w of the local model is calculated by equation (15), where equation (15) is expressed as:
optionally, in an embodiment of the present application, an actual value of the query data is calculated, and the actual value and the query data are added to the target training data set.
Specifically, in an embodiment of the present application, query data x is calculated by equation (16) using weighted ridge regression model coefficients wqOutput value ofFormula (16) is represented as:
when the real output value y is obtained through laboratory test analysisqThen, the sample [ x ]q,yq]Adding to training data setTo expand the training data setThe working interval contained in (1); otherwise, the training data set is maintainedThe space contained in (a) does not change.
The soft measurement modeling method based on the integral optimization and the instant learning will be explained by taking the process data of the debutanizer as an example in combination with the attached drawings.
The Debutanizer (DCP) is part of a desulfurization and naphtha splitting plant, whose task is to reduce the concentration of butane in the bottoms as much as possible. The principle of a DCP is shown in fig. 2. Generally, the concentration of the butane at the bottom of the tower is measured on line by a gas chromatograph arranged at the top of the tower, and as certain time is required for the butane steam at the bottom of the tower to reach the top of the tower, and certain time is also required for the analysis process of the gas chromatograph, the on-line measurement of the concentration of the butane at the bottom of the tower has large lag, so that a soft measurement model needs to be established to estimate the concentration of the butane at the bottom of the tower on line in real time. In establishing a soft measurement model of the butane concentration at the bottom of the column, seven variables installed in the debutanizer column (see fig. 2) were selected as auxiliary variables, and an explanation of these seven auxiliary variables is given in table 1. The data set is from an actual industrial process, the number of samples is 2382, and the actual output curve is shown in fig. 3.
Table 1 description of auxiliary variables
The following description of the specific steps of the present application is made in conjunction with the debutanizer process:
1. the acquired data is used as a training data set and is preprocessed.
Firstly, preprocessing all samples and deleting abnormal samples in the samples; then, taking into account the processDynamic characteristics, performing dimension expansion on all samples according to the following formula, wherein the dimension of the expanded sample is 30; finally, carrying out standardization processing to obtain a final training data setThen:
wherein the content of the first and second substances,represents the predicted value of the soft measurement model to the concentration of butane at the bottom of the tower, fDCP(. represents butane concentration and X)1~X7The potential relationship of (a).
Further obtaining:
2. an offline ridge regression model is established using the training data set, and a weight matrix for each input variable is calculated.
From a training data setEstablishing an offline ridge regression model, and ridge regression coefficients from the modelCalculating a weight matrix for each input variable
3. And collecting new data for standardization processing.
For newly collected query dataAccording to the criteria of the training data setThe normalization approach normalizes the data.
4. Calculating sample collaborative representation coefficients simultaneously according to unified optimization objectiveAnd weighted ridge regression model coefficients
First, the collected query data x is calculatedqWeighted Euclidean distance from training samplesThen, d is fused into a regular term of the collaborative representation, and the collaborative representation is fused with a weighted ridge regression algorithm to obtain a uniform instant learning optimization target; finally, calculating the sample collaborative representation coefficient by an alternative iterative optimization modeAnd local model coefficients
5. The output values of the query data are predicted based on the weighted ridge regression model coefficients.
Computing query data x from weighted ridge regression coefficients wqPredicted output value ofWhen the true value y of the output variable is obtainedqThen, the sample [ x ]q;yq]Adding to training data set
The predicted deviation values of the method described herein (UniJITL) and the conventional partial weighted partial least squares (LWPLS) algorithm for the debutanizer data output variables are shown in fig. 4 and 5. As can be seen from fig. 4 and 5, the method of the present application has higher prediction accuracy than the conventional method.
According to the soft measurement modeling method based on the integral optimization and the instant learning, a weight matrix of a historical sample is obtained through a collaborative expression algorithm, a weighted ridge regression model is established through a weighted ridge regression algorithm, and the two algorithms are fused to form a unified optimization target. For the collected query data, firstly, the weighted Euclidean distance between the query data and the training sample is calculated, the weighted Euclidean distance is fused into a regular term represented in a collaborative mode, and through a unified optimization target, the selection of similar samples and the establishment of a local model are achieved at the same time. The method provided by the embodiment of the application not only can well solve the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process, but also can integrate similar sample selection and local model construction into an optimization function, so that the selection of similar samples is guided by using the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved.
Next, a soft measurement modeling apparatus for point-of-care learning based on global optimization according to an embodiment of the present application will be described with reference to the drawings.
Fig. 6 is a block diagram of a soft measurement modeling apparatus based on ensemble optimization and learning-on-demand according to an embodiment of the present application.
As shown in fig. 6, the soft measurement modeling apparatus 10 for instantaneous learning based on global optimization includes: a calculation module 100 and a prediction module 200.
The calculation module 100 is configured to calculate weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to a preset input variable weight matrix, fuse the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculate a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module 200 is configured to establish a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, fuse an optimized target of the target collaborative representation model and the weighted ridge regression model, and calculate a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model.
Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing a preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain a target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
wherein the content of the first and second substances,to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, the method further includes:
the modeling module is used for establishing an offline ridge regression model according to a target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
wherein the content of the first and second substances,is the ridge regression coefficient, lambda, of an offline ridge regression model0Regularization term for offline ridge regression modelCoefficient, XLFor the normalized auxiliary variable data to be used,the value of the real mass variable after standardization;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0:
W0=(XLXL T+λ0×I)-1XLYL
Wherein XL TAs data XLI is an identity matrix;
the weight calculation module is used for calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the computing module 100 is specifically configured to,
calculating the weighted Euclidean distance between the query data and all samples in the standardized preset auxiliary variable data set according to a preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
wherein x isqIn order to query the data in the database,as a diagonal matrixAnd the elements are all 1, symbolsRepresenting the multiplication of corresponding elements of two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and a target training data set, and fusing a preset input variable weight matrix and a weighted Euclidean distance, wherein the optimization target is as follows:
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,performing two-norm operation;
calculating a collaborative representation coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL+λ1D)-1XL TWvarxq
and obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in one embodiment of the present application, the prediction module 200, in particular for,
establishing a weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
wherein the content of the first and second substances,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating a weighted ridge regression model coefficient w:
calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model:
Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating the actual value of the query data and adding the actual value and the query data to the target training data set.
It should be noted that the foregoing explanation of the embodiment of the soft measurement modeling method based on global optimization and immediate learning is also applicable to the soft measurement modeling apparatus based on global optimization and immediate learning of the embodiment, and details are not repeated here.
According to the soft measurement modeling device based on the instantaneous learning of the overall optimization, the selection of the similar samples is converted into the optimization problem, and the optimization problem is fused with the optimization target of the local model, so that the overall optimization of the selection of the similar samples and the establishment of the local model is realized, the reasonability and the reliability of the sample weight are improved, and the optimization efficiency and the prediction precision of the model are improved.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
The processor 702, when executing the program, implements the soft metric modeling approach for holistic optimization based just-in-time learning provided in the embodiments described above.
Further, the electronic device further includes:
a communication interface 703 for communication between the memory 701 and the processor 702.
A memory 701 for storing computer programs operable on the processor 702.
The memory 701 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the communication interface 703, the memory 701 and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may complete mutual communication through an internal interface.
The processor 702 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the soft metric modeling method based on ensemble-optimized just-in-time learning as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
Claims (16)
1. A soft measurement modeling method of instantaneous learning based on integral optimization is characterized by comprising the following steps:
calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix;
fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;
establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data;
and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
2. The method of claim 1, wherein before establishing the weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further comprises:
constructing and storing the preset auxiliary variable data set in the industrial process;
analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;
and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
3. The method of claim 2, wherein the normalization process is:
4. The method of claim 1, wherein before calculating the weighted euclidean distances between the collected query data and all samples in the predetermined auxiliary variable data set according to the predetermined input variable weight matrix, the method further comprises:
establishing an offline ridge regression model according to the target training data set, wherein the optimization target is as follows:
wherein the content of the first and second substances,is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regular term coefficient, X, of an off-line ridge regression modelLFor the normalized auxiliary variable data to be used,the value of the standardized real quality variable is obtained;
solving the optimization target of the off-line ridge regression model to obtain a ridge regression coefficient W of the off-line ridge regression model0:
W0=(XLXL T+λ0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
5. The method according to claim 4, wherein the calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model includes:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
wherein x isqIn order to query the data for it,is a diagonal matrix Di,i=di,i=1,2,…,n,And the elements are all 1, symbolsRepresenting the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL+λ1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
6. The method as claimed in claim 5, wherein the building a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target co-representation model and an optimization target of the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients comprises:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical sample, wherein the optimization target is as follows:
wherein the content of the first and second substances,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
computing predicted values for the query data using the weighted ridge regression model coefficients
7. The method according to any one of claims 1 to 6,
calculating an actual value of the query data, adding the actual value and the query data to the target training data set.
8. An instantaneous learning soft measurement modeling device based on global optimization, comprising:
the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;
the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and an optimized target of the weighted ridge regression model, and calculating a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
9. The apparatus of claim 8, further comprising:
the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process;
the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;
and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
10. The apparatus of claim 9, wherein the normalization process is:
11. The apparatus of claim 8, further comprising:
the modeling module is used for building an offline ridge regression model according to the target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
wherein the content of the first and second substances,is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,the value of the standardized real quality variable is obtained;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0:
W0=(XLXL T+λ0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
a weight calculation module, configured to calculate a weight matrix of each input variable according to the ridge regression coefficient to obtain the preset input variable weight matrix:
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
12. The apparatus of claim 11, wherein the computing module is specifically configured to:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
wherein x isqIn order to query the data for it,is a diagonal matrix Di,i=di,i=1,2,…,n,And the elements are all 1, symbolsRepresenting the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL+λ1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
13. The apparatus of claim 12, wherein the prediction module is specifically configured to:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
wherein the content of the first and second substances,as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,as coefficients of a weighted ridge regression model, lambda2A ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
calculating a predicted value of the query data using the weighted ridge regression model coefficients:
14. The apparatus of any one of claims 8-13, further comprising:
and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.
15. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the soft metric modeling method based on ensemble optimized just-in-time learning of any of claims 1-7.
16. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a soft metric modeling method based on ensemble optimized just-in-time learning according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151538.7A CN114528764A (en) | 2022-02-18 | 2022-02-18 | Soft measurement modeling method and device based on integral optimization and instant learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210151538.7A CN114528764A (en) | 2022-02-18 | 2022-02-18 | Soft measurement modeling method and device based on integral optimization and instant learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114528764A true CN114528764A (en) | 2022-05-24 |
Family
ID=81623426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210151538.7A Pending CN114528764A (en) | 2022-02-18 | 2022-02-18 | Soft measurement modeling method and device based on integral optimization and instant learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114528764A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738866A (en) * | 2023-08-11 | 2023-09-12 | 中国石油大学(华东) | Instant learning soft measurement modeling method based on time sequence feature extraction |
CN117272244A (en) * | 2023-11-21 | 2023-12-22 | 中国石油大学(华东) | Soft measurement modeling method integrating feature extraction and self-adaptive composition |
-
2022
- 2022-02-18 CN CN202210151538.7A patent/CN114528764A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738866A (en) * | 2023-08-11 | 2023-09-12 | 中国石油大学(华东) | Instant learning soft measurement modeling method based on time sequence feature extraction |
CN116738866B (en) * | 2023-08-11 | 2023-10-27 | 中国石油大学(华东) | Instant learning soft measurement modeling method based on time sequence feature extraction |
CN117272244A (en) * | 2023-11-21 | 2023-12-22 | 中国石油大学(华东) | Soft measurement modeling method integrating feature extraction and self-adaptive composition |
CN117272244B (en) * | 2023-11-21 | 2024-03-15 | 中国石油大学(华东) | Soft measurement modeling method integrating feature extraction and self-adaptive composition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107765347B (en) | Short-term wind speed prediction method based on Gaussian process regression and particle filtering | |
CN114528764A (en) | Soft measurement modeling method and device based on integral optimization and instant learning | |
CN103389472B (en) | A kind of Forecasting Methodology of the cycle life of lithium ion battery based on ND-AR model | |
Kaya et al. | Process capability analyses with fuzzy parameters | |
CN109389314B (en) | Quality soft measurement and monitoring method based on optimal neighbor component analysis | |
CN114117919B (en) | Instant learning soft measurement modeling method based on sample collaborative representation | |
CN109523077B (en) | Wind power prediction method | |
CN117312816B (en) | Special steel smelting effect evaluation method and system | |
CN114841073A (en) | Instant learning semi-supervised soft measurement modeling method based on local label propagation | |
CN117594164A (en) | Metal structure residual fatigue life calculation and evaluation method and system based on digital twin | |
CN114970341B (en) | Method for establishing low-orbit satellite orbit prediction precision improvement model based on machine learning | |
CN111626359A (en) | Data fusion method and device, control terminal and ship | |
CN101446828A (en) | Nonlinear process quality prediction method | |
CN116821695B (en) | Semi-supervised neural network soft measurement modeling method | |
Wang et al. | Research on construction cost estimation based on artificial intelligence technology | |
CN116644655A (en) | Industrial process soft measurement method based on weighted target feature regression neural network | |
CN115631804A (en) | Method for predicting outlet concentration of sodium aluminate solution in evaporation process based on data coordination | |
CN115577856A (en) | Method and system for predicting construction cost and controlling balance of power transformation project | |
JP7020500B2 (en) | Prediction model generation method, corrosion amount prediction method for metal materials, prediction model generation program and prediction model generation device | |
CN109858699B (en) | Water quality quantitative simulation method and device, electronic equipment and storage medium | |
CN114240006A (en) | Water resource bearing capacity assessment method | |
CN110866638A (en) | Traffic volume prediction model construction method and device, computer equipment and storage medium | |
CN110705187A (en) | Method for checking and diagnosing real-time online instrument through least square algorithm | |
CN110188433B (en) | Ridge regression soft measurement modeling method based on distributed parallel local modeling mechanism | |
CN113971372B (en) | Wind speed prediction method and device based on neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |