CN114528764A - Soft measurement modeling method and device based on integral optimization and instant learning - Google Patents

Soft measurement modeling method and device based on integral optimization and instant learning Download PDF

Info

Publication number
CN114528764A
CN114528764A CN202210151538.7A CN202210151538A CN114528764A CN 114528764 A CN114528764 A CN 114528764A CN 202210151538 A CN202210151538 A CN 202210151538A CN 114528764 A CN114528764 A CN 114528764A
Authority
CN
China
Prior art keywords
ridge regression
data set
weighted
target
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210151538.7A
Other languages
Chinese (zh)
Inventor
王智权
袁志宏
白玮
李秀洁
吴昂山
徐飞
许恒微
宋垚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Sailboat Petrochemical Co ltd
Tsinghua University
Original Assignee
Jiangsu Sailboat Petrochemical Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Sailboat Petrochemical Co ltd, Tsinghua University filed Critical Jiangsu Sailboat Petrochemical Co ltd
Priority to CN202210151538.7A priority Critical patent/CN114528764A/en
Publication of CN114528764A publication Critical patent/CN114528764A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Algebra (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The method obtains a weight matrix of a historical sample through a collaborative expression algorithm, establishes a weighted ridge regression model through a weighted ridge regression algorithm, fuses the two algorithms to form a unified instant learning optimization target, and finally solves the problem in an alternate iteration mode. Therefore, the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process are well solved, the selection of the similar samples and the construction of the local model are fused into an optimization function, the selection of the similar samples is guided by the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.

Description

Soft measurement modeling method and device based on integral optimization and instant learning
Technical Field
The application relates to the technical field of industrial process detection, in particular to a soft measurement modeling method and device based on integral optimization and immediate learning, electronic equipment and a storage medium.
Background
In the modern industrial production process, a plurality of important quality variables (such as oil viscosity, components and the like) are difficult to measure in real time, and great influence is brought to the control and optimization of the chemical process. Because the problems of difficult sampling of samples on site, high cost of analytical instruments, time lag of analysis and the like exist in the chemical production process, the real-time measurement of the quality quantity is often difficult to carry out by using modes such as an online analytical instrument, an offline test and the like in the actual production process, and the closed-loop control of the quality quantity cannot be formed. Therefore, how to acquire the quality variable in real time becomes a problem to be solved first in process control. Thus, soft measurements have entered the line of sight of research in the field of process industrial control.
Common data-driven soft measurement modeling methods include Principal Component Regression (PCR), Partial Least Squares Regression (PLSR), and Artificial Neural Networks (ANN). The model established by the soft measurement algorithm belongs to an off-line model, and the model cannot be adaptively adjusted along with the change of the generation process after being established, and cannot track the change of the production state, so that the prediction precision is gradually reduced. Therefore, automatic maintenance of the soft measurement model becomes the key point for algorithm research and improvement. Therefore, in order to adapt to the multi-modal and time-varying characteristics exhibited by the modern chemical production process, various on-line modeling algorithms have been widely applied to monitoring and quality variable prediction of the production process.
Currently, mainstream online soft measurement modeling algorithms include: sliding window algorithms, recursive algorithms, time difference algorithms, and immediate learning algorithms. The first three types are updated models according to time correlation and belong to a time self-adaptive algorithm; the immediate learning algorithm is used for updating and maintaining the model based on spatial correlation and belongs to a spatial adaptive algorithm. Compared with other algorithms, the instant learning algorithm has the advantages that the method can better adapt to the mutation phenomenon in the production process, and because the algorithm establishes a local model for each sample, the nonlinear relation among process variables can be well described.
The selection of similar samples or the calculation of sample weights are core steps of the instantaneous learning algorithm, and can greatly influence the prediction accuracy of the algorithm. For the conventional learning-on-the-fly algorithm, on one hand, the selection of some adjustable parameters in the algorithm is often very difficult, such as the kernel width parameter in the LWPLS algorithm and the number of similar samples in the LWLS algorithm, and the selection of these parameters has no clear theoretical experience guidance and has a large influence on the performance of the model; on the other hand, two core steps of the algorithm: selecting similar samples and building a local model are independent of each other, which may result in the selected similar samples being sub-optimal for the local model. I.e. the selected similar samples are used to build the local model, but the information obtained by building the local model is not used to guide the selection of the similar samples. As can be seen from the above, the model established by the conventional immediate learning algorithm has a problem of poor prediction accuracy, and needs to be solved urgently.
Disclosure of Invention
The application provides a soft measurement modeling method, a device, electronic equipment and a storage medium for immediate learning based on integral optimization, and aims to solve the problems of poor prediction precision and the like in the prior art.
The embodiment of the first aspect of the application provides a soft measurement modeling method for instantaneous learning based on integral optimization, which comprises the following steps: calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix; fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
Optionally, in an embodiment of the present application, before the establishing a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further includes: constructing and storing the preset auxiliary variable data set in the industrial process; analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
Figure BDA0003510558310000021
wherein the content of the first and second substances,
Figure BDA0003510558310000022
to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, before the calculating, according to the preset input variable weight matrix, weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set, the method further includes:
establishing an offline ridge regression model according to the target training data set, wherein the optimization target is as follows:
Figure BDA0003510558310000023
wherein the content of the first and second substances,
Figure BDA0003510558310000024
is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,
Figure BDA0003510558310000025
the value of the standardized real quality variable is obtained;
solving the optimization target of the off-line ridge regression model to obtain a ridge regression coefficient W of the off-line ridge regression model0
W0=(XLXL T0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
Figure BDA0003510558310000031
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the calculating, according to a preset input variable weight matrix, weighted euclidean distances between acquired query data and all samples in a preset auxiliary variable data set, fusing the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculating, by using the target collaborative representation model, a weight matrix of each historical sample in the preset auxiliary variable data set, includes:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
Figure BDA0003510558310000032
wherein x isqIn order to query the data for it,
Figure BDA0003510558310000033
as a diagonal matrix
Figure BDA0003510558310000034
And the elements are all 1, symbols
Figure BDA0003510558310000035
Representing the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
Figure BDA0003510558310000036
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,
Figure BDA0003510558310000037
performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
Figure BDA0003510558310000038
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in an embodiment of the present application, the building a weighted ridge regression model according to a target training data set and a weight matrix of each historical sample, fusing an optimization target of the target collaborative representation model and the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients includes:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
Figure BDA0003510558310000041
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003510558310000042
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
Figure BDA0003510558310000043
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,
Figure BDA0003510558310000044
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
Figure BDA0003510558310000045
wherein the content of the first and second substances,
Figure BDA0003510558310000046
computing predicted values for the query data using the weighted ridge regression model coefficients
Figure BDA0003510558310000047
Figure BDA0003510558310000048
Wherein the content of the first and second substances,
Figure BDA0003510558310000049
for querying data xqThe transposing of (1).
Optionally, in an embodiment of the present application, an actual value of the query data is calculated, and the actual value and the query data are added to the target training data set.
The embodiment of the second aspect of the present application provides a soft measurement modeling apparatus for instantaneous learning based on global optimization, including: the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
Figure BDA0003510558310000051
wherein the content of the first and second substances,
Figure BDA0003510558310000052
to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, the method further includes:
the modeling module is used for building an offline ridge regression model according to the target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
Figure BDA0003510558310000053
wherein the content of the first and second substances,
Figure BDA0003510558310000054
ridge regression coefficient, λ, for an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,
Figure BDA0003510558310000055
the value of the standardized real quality variable is obtained;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0
W0=(XLXL T0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
a weight calculation module, configured to calculate a weight matrix of each input variable according to the ridge regression coefficient to obtain the preset input variable weight matrix:
Figure BDA0003510558310000056
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the calculation module is specifically configured to:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
Figure BDA0003510558310000061
wherein x isqIn order to query the data for it,
Figure BDA0003510558310000062
as a diagonal matrix
Figure BDA0003510558310000063
And the elements are all 1, symbols
Figure BDA0003510558310000064
Representing the multiplication of corresponding elements of two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
Figure BDA0003510558310000065
wherein b is a co-expression coefficient, λ1Is a coefficient of a regular term and is,
Figure BDA0003510558310000066
performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
Figure BDA0003510558310000067
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in an embodiment of the present application, the prediction module is specifically configured to:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
Figure BDA0003510558310000068
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003510558310000069
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target which is:
Figure BDA00035105583100000610
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,
Figure BDA0003510558310000071
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
Figure BDA0003510558310000072
wherein the content of the first and second substances,
Figure BDA0003510558310000073
calculating a predicted value of the query data using the weighted ridge regression model coefficients:
Figure BDA0003510558310000074
wherein the content of the first and second substances,
Figure BDA0003510558310000075
for querying data xqThe transposing of (1).
Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.
An embodiment of a third aspect of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to perform the soft measurement modeling method based on ensemble optimization learning-on-demand as described in the above embodiments.
A fourth aspect of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to execute the soft measurement modeling method based on ensemble optimization and point-of-care learning as described in the foregoing embodiments.
Therefore, the application has at least the following beneficial effects:
the data self-expression algorithm is introduced into the instant learning, and is improved aiming at the application background of the instant learning, specifically, on one hand, the weight of the input variable is considered while the sample weight is calculated through the self-expression algorithm; on the other hand, the weighted Euclidean distance between the query sample and the historical sample is calculated and is used as a regular term of the algorithm, so that the local spatial distance information of the data is fused. Compared with other existing algorithms, the method and the device have the advantages that selection of similar samples or calculation of sample weights is converted into an optimization problem, and reasonability and reliability of the sample weights are improved. In addition, compared with the traditional algorithm in which the selection of similar samples and the establishment of local models are independent, the method and the device realize the selection of similar samples and the establishment of local models simultaneously through a unified optimization target, and improve the model optimization efficiency and the prediction precision. Therefore, the problems of poor prediction precision and the like in the prior art are solved.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application;
FIG. 2 is a process schematic of a Debutanizer (DCP) provided according to an embodiment of the present application;
FIG. 3 is a graph of a real output of debutanizer process data provided in accordance with an embodiment of the present application;
FIG. 4 is a graphical illustration of a prediction bias for debutanizer data for a global optimization-based just-in-time learning soft measurement modeling method according to an embodiment of the present application;
FIG. 5 is a graphical illustration of predicted deviation of existing partial weighted partial least squares algorithms for debutanizer data, provided in accordance with one embodiment of the present application;
FIG. 6 is a diagram of an example of a soft measurement modeling apparatus for holistic optimization based just-in-time learning according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Description of reference numerals: a calculation module-100, a prediction module-200, a memory-701, a processor-702, and a communication interface-703.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
A soft measurement modeling method, an apparatus, an electronic device, and a storage medium for ensemble optimization-based point-of-care learning according to embodiments of the present application are described below with reference to the accompanying drawings. Aiming at the problems of time-varying and multi-modal characteristics in the industrial process, multiple collinearity in industrial data and the like generally mentioned in the background art, the application provides a soft measurement modeling method based on integral optimization and in the method, a soft measurement model is established through an instant learning algorithm, so that the time-varying and multi-modal problems are solved; local models (namely an offline ridge regression model and a weighted ridge regression model) are established through a ridge regression algorithm, the problem of multiple collinearity of process data is solved, and the calculation efficiency is high. In addition, the selection of similar samples is converted into an optimization problem and is fused with a local model optimization target, so that the modeling process is optimized, and the reliability of sample weight and the prediction precision of a soft measurement model are improved. Therefore, the problems of poor prediction precision and the like in the prior art are solved.
Specifically, fig. 1 is a flowchart of a soft measurement modeling method based on ensemble optimization and point-of-care learning according to an embodiment of the present application.
As shown in fig. 1, the soft measurement modeling method based on the global optimization and the immediate learning includes the following steps:
in step S101, the weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set are calculated according to the preset input variable weight matrix.
It should be noted that the preset auxiliary variable data set is obtained by normalizing the target training data set, and the specific process is as follows.
Optionally, in an embodiment of the present application, constructing the target training data set includes: constructing a preset auxiliary variable data set in the industrial process; analyzing a preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; establishing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set, wherein the standardization processing comprises the following steps:
Figure BDA0003510558310000091
wherein the function mean (-) represents the mean of each row of the calculation matrix, and the function std (-) represents the standard deviation of each row of the calculation matrix. Thereby obtaining a target training data set.
In particular, in embodiments of the present application, data X ═ X of auxiliary variables related to quality in an industrial process are collected and stored in real time by in-situ sensors and storage devices1,x2,…xn]T,X=[x1,x2,…xn]TN is the number of samples, and m is the dimension of the samples; the real quality variable value corresponding to each sample is obtained by analyzing the acquired data through laboratory tests
Figure BDA0003510558310000092
Using the collected data as an initial training data set
Figure BDA0003510558310000093
For the initial training data set
Figure BDA0003510558310000094
The normalization process is performed according to the formula (1) to make the mean value 0 and the variance 1, and a training data set is obtained
Figure BDA0003510558310000095
XLThe data obtained after the process is standardized for the data X,
Figure BDA0003510558310000096
to becomeAnd (5) normalizing the value of the variable obtained after the processing.
Optionally, in an embodiment of the present application, before calculating weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to the preset input variable weight matrix, the method further includes:
using training data sets
Figure BDA0003510558310000097
Establishing an off-line ridge regression model, and calculating a weight matrix of each input variable through a regression coefficient of the model
Figure BDA0003510558310000098
The method comprises the following specific steps:
using training data sets
Figure BDA0003510558310000099
An off-line ridge regression model is established, and the optimization goal is as follows:
Figure BDA00035105583100000910
wherein the content of the first and second substances,
Figure BDA00035105583100000911
ridge regression coefficient, λ, for an offline ridge regression model0The regular term coefficients are of an offline ridge regression model; solving the optimization target to obtain a ridge regression coefficient W of the offline ridge regression model0The analytical expression of (a) is:
W0=(XLXL T0×I)-1XLYL (3)
wherein, XL TAs data XLI is a unit matrix;
ridge regression coefficient W from offline ridge regression model0A weight matrix of each input variable is calculated by equation (4), where equation (4) is expressed as:
Figure BDA00035105583100000912
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in one embodiment of the present application, the query data is newly collected
Figure BDA00035105583100000913
Normalizing according to equation (1) based on the weight matrix WvarCalculating query data x by equations (5) and (6)qAnd data XLWeighted euclidean distances d of all samples in; the expressions of formula (5) and formula (6) are:
Dxq=Wvar(xq×1-XL) (5)
Figure BDA0003510558310000101
in the formula (I), the compound is shown in the specification,
Figure BDA0003510558310000102
as a diagonal matrix
Figure BDA0003510558310000103
And the elements are all 1, symbols
Figure BDA0003510558310000104
Representing the multiplication of the corresponding elements of the two matrices, and the function sum (-) representing the addition of the rows of the matrices.
In step S102, the weighted euclidean distance is fused into the collaborative representation regular term to obtain a target collaborative representation model, and a weight matrix of each historical sample in the preset auxiliary variable data set is calculated by using the target collaborative representation model.
Optionally, in an embodiment of the present application, query data x is establishedqAnd training data set
Figure BDA0003510558310000105
The collaborative representation model is combined with the weight matrix and the weighted Euclidean distance, and the optimization target is as follows:
Figure BDA0003510558310000106
wherein the content of the first and second substances,
Figure BDA0003510558310000107
for co-expression of coefficients, λ1Is a regular term coefficient.
Calculating data X by equation (8)LAnd query data xqThe formula (8) is expressed as:
b=(XL TWvarXL1D)-1XL TWvarxq (8)
deriving a training data set by equation (9) using co-expression coefficients
Figure BDA0003510558310000108
Weight matrix of historical samples
Figure BDA0003510558310000109
Formula (9) is represented as:
Figure BDA00035105583100001010
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
In step S103, a weighted ridge regression model is established according to the target training data set and the weight matrix of each historical sample, an optimization target of the collaborative representation algorithm is fused with an optimization target of the weighted ridge regression algorithm, and data X is calculatedLAnd query data xqCo-expression coefficients and local model coefficients.
Specifically, in the embodiment of the present application, the specific steps of solving the co-expression coefficients and the local model coefficients through the unified optimization objective are as follows:
according to a training set
Figure BDA00035105583100001011
And a weight matrix WsampleEstablishing a weighted ridge regression model, wherein the optimization goal is as follows:
Figure BDA00035105583100001012
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA00035105583100001013
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient.
The improved collaborative representation optimization target and the weighted ridge regression optimization target are combined in a weighted mode, namely the formula (7) and the formula (10), so that a unified instant learning optimization target can be obtained, and the formula (11) is shown as follows:
Figure BDA00035105583100001014
wherein a is a weight coefficient of the improved collaborative representation algorithm and the weighted ridge regression algorithm.
First, by solving the co-expression coefficient b by fixing the local model coefficient w, equation (11) can be described anew as equation (12):
Figure BDA0003510558310000111
wherein the content of the first and second substances,
Figure BDA0003510558310000112
cst (b) represents an item independent of b. The improved co-expression coefficient b is calculated by equation (13), where equation (13) is:
Figure BDA00035105583100001112
then, the fixed co-expression coefficients b solve for the local model coefficients w, and equation (11) can be re-described as equation (14):
Figure BDA0003510558310000113
wherein the content of the first and second substances,
Figure BDA0003510558310000114
cst (w) represents an item independent of w.
The coefficient w of the local model is calculated by equation (15), where equation (15) is expressed as:
Figure BDA0003510558310000115
optionally, in an embodiment of the present application, an actual value of the query data is calculated, and the actual value and the query data are added to the target training data set.
Specifically, in an embodiment of the present application, query data x is calculated by equation (16) using weighted ridge regression model coefficients wqOutput value of
Figure BDA0003510558310000116
Formula (16) is represented as:
Figure BDA0003510558310000117
when the real output value y is obtained through laboratory test analysisqThen, the sample [ x ]q,yq]Adding to training data set
Figure BDA0003510558310000118
To expand the training data set
Figure BDA0003510558310000119
The working interval contained in (1); otherwise, the training data set is maintained
Figure BDA00035105583100001110
The space contained in (a) does not change.
The soft measurement modeling method based on the integral optimization and the instant learning will be explained by taking the process data of the debutanizer as an example in combination with the attached drawings.
The Debutanizer (DCP) is part of a desulfurization and naphtha splitting plant, whose task is to reduce the concentration of butane in the bottoms as much as possible. The principle of a DCP is shown in fig. 2. Generally, the concentration of the butane at the bottom of the tower is measured on line by a gas chromatograph arranged at the top of the tower, and as certain time is required for the butane steam at the bottom of the tower to reach the top of the tower, and certain time is also required for the analysis process of the gas chromatograph, the on-line measurement of the concentration of the butane at the bottom of the tower has large lag, so that a soft measurement model needs to be established to estimate the concentration of the butane at the bottom of the tower on line in real time. In establishing a soft measurement model of the butane concentration at the bottom of the column, seven variables installed in the debutanizer column (see fig. 2) were selected as auxiliary variables, and an explanation of these seven auxiliary variables is given in table 1. The data set is from an actual industrial process, the number of samples is 2382, and the actual output curve is shown in fig. 3.
Table 1 description of auxiliary variables
Figure BDA00035105583100001111
Figure BDA0003510558310000121
The following description of the specific steps of the present application is made in conjunction with the debutanizer process:
1. the acquired data is used as a training data set and is preprocessed.
Firstly, preprocessing all samples and deleting abnormal samples in the samples; then, taking into account the processDynamic characteristics, performing dimension expansion on all samples according to the following formula, wherein the dimension of the expanded sample is 30; finally, carrying out standardization processing to obtain a final training data set
Figure BDA0003510558310000122
Then:
Figure BDA0003510558310000123
wherein the content of the first and second substances,
Figure BDA0003510558310000124
represents the predicted value of the soft measurement model to the concentration of butane at the bottom of the tower, fDCP(. represents butane concentration and X)1~X7The potential relationship of (a).
Further obtaining:
Figure BDA0003510558310000125
2. an offline ridge regression model is established using the training data set, and a weight matrix for each input variable is calculated.
From a training data set
Figure BDA0003510558310000126
Establishing an offline ridge regression model, and ridge regression coefficients from the model
Figure BDA0003510558310000127
Calculating a weight matrix for each input variable
Figure BDA0003510558310000128
3. And collecting new data for standardization processing.
For newly collected query data
Figure BDA0003510558310000129
According to the criteria of the training data setThe normalization approach normalizes the data.
4. Calculating sample collaborative representation coefficients simultaneously according to unified optimization objective
Figure BDA00035105583100001210
And weighted ridge regression model coefficients
Figure BDA00035105583100001211
First, the collected query data x is calculatedqWeighted Euclidean distance from training samples
Figure BDA00035105583100001212
Then, d is fused into a regular term of the collaborative representation, and the collaborative representation is fused with a weighted ridge regression algorithm to obtain a uniform instant learning optimization target; finally, calculating the sample collaborative representation coefficient by an alternative iterative optimization mode
Figure BDA00035105583100001213
And local model coefficients
Figure BDA00035105583100001214
5. The output values of the query data are predicted based on the weighted ridge regression model coefficients.
Computing query data x from weighted ridge regression coefficients wqPredicted output value of
Figure BDA00035105583100001215
When the true value y of the output variable is obtainedqThen, the sample [ x ]q;yq]Adding to training data set
Figure BDA00035105583100001216
The predicted deviation values of the method described herein (UniJITL) and the conventional partial weighted partial least squares (LWPLS) algorithm for the debutanizer data output variables are shown in fig. 4 and 5. As can be seen from fig. 4 and 5, the method of the present application has higher prediction accuracy than the conventional method.
According to the soft measurement modeling method based on the integral optimization and the instant learning, a weight matrix of a historical sample is obtained through a collaborative expression algorithm, a weighted ridge regression model is established through a weighted ridge regression algorithm, and the two algorithms are fused to form a unified optimization target. For the collected query data, firstly, the weighted Euclidean distance between the query data and the training sample is calculated, the weighted Euclidean distance is fused into a regular term represented in a collaborative mode, and through a unified optimization target, the selection of similar samples and the establishment of a local model are achieved at the same time. The method provided by the embodiment of the application not only can well solve the problems of nonlinearity, time-varying property and multiple collinearity in the industrial process, but also can integrate similar sample selection and local model construction into an optimization function, so that the selection of similar samples is guided by using the information of the local model, and the reliability of the similar samples and the prediction precision of the local model are improved.
Next, a soft measurement modeling apparatus for point-of-care learning based on global optimization according to an embodiment of the present application will be described with reference to the drawings.
Fig. 6 is a block diagram of a soft measurement modeling apparatus based on ensemble optimization and learning-on-demand according to an embodiment of the present application.
As shown in fig. 6, the soft measurement modeling apparatus 10 for instantaneous learning based on global optimization includes: a calculation module 100 and a prediction module 200.
The calculation module 100 is configured to calculate weighted euclidean distances between the acquired query data and all samples in the preset auxiliary variable data set according to a preset input variable weight matrix, fuse the weighted euclidean distances to a collaborative representation regular term to obtain a target collaborative representation model, and calculate a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model; the prediction module 200 is configured to establish a weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, fuse an optimized target of the target collaborative representation model and the weighted ridge regression model, and calculate a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model.
Optionally, in an embodiment of the present application, the method further includes: the construction module is used for constructing and storing a preset auxiliary variable data set in the industrial process; the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set; and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain a target training data set.
Optionally, in an embodiment of the present application, the normalization process is:
Figure BDA0003510558310000131
wherein the content of the first and second substances,
Figure BDA0003510558310000132
to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
Optionally, in an embodiment of the present application, the method further includes:
the modeling module is used for establishing an offline ridge regression model according to a target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
Figure BDA0003510558310000141
wherein the content of the first and second substances,
Figure BDA0003510558310000142
is the ridge regression coefficient, lambda, of an offline ridge regression model0Regularization term for offline ridge regression modelCoefficient, XLFor the normalized auxiliary variable data to be used,
Figure BDA0003510558310000143
the value of the real mass variable after standardization;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0
W0=(XLXL T0×I)-1XLYL
Wherein XL TAs data XLI is an identity matrix;
the weight calculation module is used for calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
Figure BDA0003510558310000144
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
Optionally, in an embodiment of the present application, the computing module 100 is specifically configured to,
calculating the weighted Euclidean distance between the query data and all samples in the standardized preset auxiliary variable data set according to a preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
Figure BDA0003510558310000145
wherein x isqIn order to query the data in the database,
Figure BDA0003510558310000146
as a diagonal matrix
Figure BDA0003510558310000147
And the elements are all 1, symbols
Figure BDA0003510558310000148
Representing the multiplication of corresponding elements of two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and a target training data set, and fusing a preset input variable weight matrix and a weighted Euclidean distance, wherein the optimization target is as follows:
Figure BDA0003510558310000149
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,
Figure BDA00035105583100001410
performing two-norm operation;
calculating a collaborative representation coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL1D)-1XL TWvarxq
and obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
Figure BDA0003510558310000151
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
Optionally, in one embodiment of the present application, the prediction module 200, in particular for,
establishing a weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
Figure BDA0003510558310000152
wherein the content of the first and second substances,
Figure BDA0003510558310000153
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
Figure BDA0003510558310000154
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,
Figure BDA0003510558310000155
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating a weighted ridge regression model coefficient w:
Figure BDA0003510558310000156
wherein the content of the first and second substances,
Figure BDA0003510558310000157
calculating the predicted value of the query data by using the coefficients of the weighted ridge regression model:
Figure BDA0003510558310000158
wherein the content of the first and second substances,
Figure BDA0003510558310000159
for querying data xqThe transposing of (1).
Optionally, in an embodiment of the present application, the method further includes: and the expansion module is used for calculating the actual value of the query data and adding the actual value and the query data to the target training data set.
It should be noted that the foregoing explanation of the embodiment of the soft measurement modeling method based on global optimization and immediate learning is also applicable to the soft measurement modeling apparatus based on global optimization and immediate learning of the embodiment, and details are not repeated here.
According to the soft measurement modeling device based on the instantaneous learning of the overall optimization, the selection of the similar samples is converted into the optimization problem, and the optimization problem is fused with the optimization target of the local model, so that the overall optimization of the selection of the similar samples and the establishment of the local model is realized, the reasonability and the reliability of the sample weight are improved, and the optimization efficiency and the prediction precision of the model are improved.
Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device may include:
memory 701, processor 702, and a computer program stored on memory 701 and executable on processor 702.
The processor 702, when executing the program, implements the soft metric modeling approach for holistic optimization based just-in-time learning provided in the embodiments described above.
Further, the electronic device further includes:
a communication interface 703 for communication between the memory 701 and the processor 702.
A memory 701 for storing computer programs operable on the processor 702.
The memory 701 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
If the memory 701, the processor 702 and the communication interface 703 are implemented independently, the communication interface 703, the memory 701 and the processor 702 may be connected to each other through a bus and perform communication with each other. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus.
Optionally, in a specific implementation, if the memory 701, the processor 702, and the communication interface 703 are integrated on a chip, the memory 701, the processor 702, and the communication interface 703 may complete mutual communication through an internal interface.
The processor 702 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement embodiments of the present Application.
The present embodiment also provides a computer-readable storage medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the soft metric modeling method based on ensemble-optimized just-in-time learning as above.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or N embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "N" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more N executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of implementing the embodiments of the present application.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the N steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

Claims (16)

1. A soft measurement modeling method of instantaneous learning based on integral optimization is characterized by comprising the following steps:
calculating the weighted Euclidean distance between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix;
fusing the weighted Euclidean distance into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;
establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and the optimized target of the weighted ridge regression model, and calculating the coefficients of the weighted ridge regression model of the preset auxiliary variable data set and the query data;
and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
2. The method of claim 1, wherein before establishing the weighted ridge regression model according to the target training data set and the weight matrix of each historical sample, the method further comprises:
constructing and storing the preset auxiliary variable data set in the industrial process;
analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;
and constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
3. The method of claim 2, wherein the normalization process is:
Figure FDA0003510558300000011
wherein the content of the first and second substances,
Figure FDA0003510558300000012
to require a normalized data set, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
4. The method of claim 1, wherein before calculating the weighted euclidean distances between the collected query data and all samples in the predetermined auxiliary variable data set according to the predetermined input variable weight matrix, the method further comprises:
establishing an offline ridge regression model according to the target training data set, wherein the optimization target is as follows:
Figure FDA0003510558300000013
wherein the content of the first and second substances,
Figure FDA0003510558300000014
is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regular term coefficient, X, of an off-line ridge regression modelLFor the normalized auxiliary variable data to be used,
Figure FDA0003510558300000015
the value of the standardized real quality variable is obtained;
solving the optimization target of the off-line ridge regression model to obtain a ridge regression coefficient W of the off-line ridge regression model0
W0=(XLXL T0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
calculating a weight matrix of each input variable according to the ridge regression coefficient to obtain a preset input variable weight matrix:
Figure FDA0003510558300000021
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
5. The method according to claim 4, wherein the calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model includes:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
Figure FDA0003510558300000022
wherein x isqIn order to query the data for it,
Figure FDA0003510558300000023
is a diagonal matrix Di,i=di,i=1,2,…,n,
Figure FDA0003510558300000024
And the elements are all 1, symbols
Figure FDA0003510558300000025
Representing the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
Figure FDA0003510558300000026
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,
Figure FDA0003510558300000027
performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
Figure FDA0003510558300000028
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
6. The method as claimed in claim 5, wherein the building a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target co-representation model and an optimization target of the weighted ridge regression model, calculating weighted ridge regression model coefficients of the preset auxiliary variable data set and the query data, and calculating the predicted value of the query data by using the weighted ridge regression model coefficients comprises:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical sample, wherein the optimization target is as follows:
Figure FDA0003510558300000031
wherein the content of the first and second substances,
Figure FDA0003510558300000032
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
Figure FDA0003510558300000033
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,
Figure FDA0003510558300000034
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
Figure FDA0003510558300000035
wherein the content of the first and second substances,
Figure FDA0003510558300000036
computing predicted values for the query data using the weighted ridge regression model coefficients
Figure FDA0003510558300000037
Figure FDA0003510558300000038
Wherein the content of the first and second substances,
Figure FDA0003510558300000039
for querying data xqThe transposing of (1).
7. The method according to any one of claims 1 to 6,
calculating an actual value of the query data, adding the actual value and the query data to the target training data set.
8. An instantaneous learning soft measurement modeling device based on global optimization, comprising:
the calculation module is used for calculating weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, fusing the weighted Euclidean distances into a collaborative representation regular term to obtain a target collaborative representation model, and calculating a weight matrix of each historical sample in the preset auxiliary variable data set by using the target collaborative representation model;
the prediction module is used for establishing a weighted ridge regression model according to a target training data set and the weight matrix of each historical sample, fusing the target collaborative representation model and an optimized target of the weighted ridge regression model, and calculating a weighted ridge regression model coefficient of the preset auxiliary variable data set and the query data; and calculating the predicted value of the query data by using the weighted ridge regression model coefficient.
9. The apparatus of claim 8, further comprising:
the construction module is used for constructing and storing the preset auxiliary variable data set in the industrial process;
the analysis module is used for analyzing the preset auxiliary variable data set to obtain a real quality variable value corresponding to each sample in the auxiliary variable data set;
and the preprocessing module is used for constructing an initial training data set according to the auxiliary variable data and the real quality variable value, and carrying out standardization processing on the training data set to obtain the target training data set.
10. The apparatus of claim 9, wherein the normalization process is:
Figure FDA0003510558300000041
wherein the content of the first and second substances,
Figure FDA0003510558300000042
for a data set that needs to be normalized, the function mean (-) represents the mean of the rows of the computation matrix, and the function std (-) represents the standard deviation of the rows of the computation matrix.
11. The apparatus of claim 8, further comprising:
the modeling module is used for building an offline ridge regression model according to the target training data set before calculating the weighted Euclidean distances between the acquired query data and all samples in a preset auxiliary variable data set according to a preset input variable weight matrix, and the optimization target is as follows:
Figure FDA0003510558300000043
wherein the content of the first and second substances,
Figure FDA0003510558300000044
is the ridge regression coefficient, lambda, of an offline ridge regression model0Is a regularization term coefficient, X, of an offline ridge regression modelLFor the normalized auxiliary variable data to be used,
Figure FDA0003510558300000045
the value of the standardized real quality variable is obtained;
a solving module for solving the optimization target of the off-line ridge regression model to obtain the ridge regression coefficient W of the off-line ridge regression model0
W0=(XLXL T0×I)-1XLYL
Wherein, XL TAs data XLI is an identity matrix;
a weight calculation module, configured to calculate a weight matrix of each input variable according to the ridge regression coefficient to obtain the preset input variable weight matrix:
Figure FDA0003510558300000046
wherein, W0(1) Is a ridge regression coefficient W0The first element of (1), W0(m) is the ridge regression coefficient W0The mth element of (1).
12. The apparatus of claim 11, wherein the computing module is specifically configured to:
calculating the weighted Euclidean distance between the query data and all samples in a standardized preset auxiliary variable data set according to the preset input variable weight matrix:
Dxq=Wvar(xq×1-XL)
Figure FDA0003510558300000051
wherein x isqIn order to query the data for it,
Figure FDA0003510558300000052
is a diagonal matrix Di,i=di,i=1,2,…,n,
Figure FDA0003510558300000053
And the elements are all 1, symbols
Figure FDA0003510558300000054
Representing the multiplication of the corresponding elements of the two matrices, the function sum (-) representing the addition of the rows of the matrices;
establishing a collaborative representation model of the query data and the target training data set, and fusing the preset input variable weight matrix and the weighted Euclidean distance, wherein an optimization target is as follows:
Figure FDA0003510558300000055
wherein b is a co-expression coefficient, λ1In the case of the regular term coefficients,
Figure FDA0003510558300000056
performing two-norm operation;
calculating a co-expression coefficient b of the normalized preset auxiliary variable data set and the query data:
b=(XL TWvarXL1D)-1XL TWvarxq
obtaining a weight matrix of each historical sample in the target training data set by using the collaborative representation coefficient:
Figure FDA0003510558300000057
wherein, b1Is the first element of the vector b, bnThe nth element of the vector b.
13. The apparatus of claim 12, wherein the prediction module is specifically configured to:
establishing the weighted ridge regression model according to the weight matrix of the target training data set and the historical samples, wherein the optimization target is as follows:
Figure FDA0003510558300000058
wherein the content of the first and second substances,
Figure FDA0003510558300000059
as coefficients of a weighted ridge regression model, lambda2Is a ridge regression regularization term coefficient;
fusing the optimization targets of the target collaborative representation model and the weighted ridge regression model to obtain a unified optimization target:
Figure FDA00035105583000000510
wherein, a is the weight coefficient of two algorithm optimization targets, b is the co-expression coefficient, lambda1In order to co-represent the regular term coefficients,
Figure FDA0003510558300000061
as coefficients of a weighted ridge regression model, lambda2A ridge regression regularization term coefficient;
calculating the weighted ridge regression model coefficient w:
Figure FDA0003510558300000062
wherein the content of the first and second substances,
Figure FDA0003510558300000063
calculating a predicted value of the query data using the weighted ridge regression model coefficients:
Figure FDA0003510558300000064
wherein the content of the first and second substances,
Figure FDA0003510558300000065
for querying data xqThe transposing of (1).
14. The apparatus of any one of claims 8-13, further comprising:
and the expansion module is used for calculating an actual value of the query data and adding the actual value and the query data to the target training data set.
15. An electronic device, comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the soft metric modeling method based on ensemble optimized just-in-time learning of any of claims 1-7.
16. A computer-readable storage medium, on which a computer program is stored, characterized in that the program is executed by a processor for implementing a soft metric modeling method based on ensemble optimized just-in-time learning according to any of claims 1-7.
CN202210151538.7A 2022-02-18 2022-02-18 Soft measurement modeling method and device based on integral optimization and instant learning Pending CN114528764A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210151538.7A CN114528764A (en) 2022-02-18 2022-02-18 Soft measurement modeling method and device based on integral optimization and instant learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210151538.7A CN114528764A (en) 2022-02-18 2022-02-18 Soft measurement modeling method and device based on integral optimization and instant learning

Publications (1)

Publication Number Publication Date
CN114528764A true CN114528764A (en) 2022-05-24

Family

ID=81623426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210151538.7A Pending CN114528764A (en) 2022-02-18 2022-02-18 Soft measurement modeling method and device based on integral optimization and instant learning

Country Status (1)

Country Link
CN (1) CN114528764A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738866A (en) * 2023-08-11 2023-09-12 中国石油大学(华东) Instant learning soft measurement modeling method based on time sequence feature extraction
CN117272244A (en) * 2023-11-21 2023-12-22 中国石油大学(华东) Soft measurement modeling method integrating feature extraction and self-adaptive composition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738866A (en) * 2023-08-11 2023-09-12 中国石油大学(华东) Instant learning soft measurement modeling method based on time sequence feature extraction
CN116738866B (en) * 2023-08-11 2023-10-27 中国石油大学(华东) Instant learning soft measurement modeling method based on time sequence feature extraction
CN117272244A (en) * 2023-11-21 2023-12-22 中国石油大学(华东) Soft measurement modeling method integrating feature extraction and self-adaptive composition
CN117272244B (en) * 2023-11-21 2024-03-15 中国石油大学(华东) Soft measurement modeling method integrating feature extraction and self-adaptive composition

Similar Documents

Publication Publication Date Title
CN107765347B (en) Short-term wind speed prediction method based on Gaussian process regression and particle filtering
CN114528764A (en) Soft measurement modeling method and device based on integral optimization and instant learning
CN103389472B (en) A kind of Forecasting Methodology of the cycle life of lithium ion battery based on ND-AR model
Kaya et al. Process capability analyses with fuzzy parameters
CN109389314B (en) Quality soft measurement and monitoring method based on optimal neighbor component analysis
CN114117919B (en) Instant learning soft measurement modeling method based on sample collaborative representation
CN109523077B (en) Wind power prediction method
CN117312816B (en) Special steel smelting effect evaluation method and system
CN114841073A (en) Instant learning semi-supervised soft measurement modeling method based on local label propagation
CN117594164A (en) Metal structure residual fatigue life calculation and evaluation method and system based on digital twin
CN114970341B (en) Method for establishing low-orbit satellite orbit prediction precision improvement model based on machine learning
CN111626359A (en) Data fusion method and device, control terminal and ship
CN101446828A (en) Nonlinear process quality prediction method
CN116821695B (en) Semi-supervised neural network soft measurement modeling method
Wang et al. Research on construction cost estimation based on artificial intelligence technology
CN116644655A (en) Industrial process soft measurement method based on weighted target feature regression neural network
CN115631804A (en) Method for predicting outlet concentration of sodium aluminate solution in evaporation process based on data coordination
CN115577856A (en) Method and system for predicting construction cost and controlling balance of power transformation project
JP7020500B2 (en) Prediction model generation method, corrosion amount prediction method for metal materials, prediction model generation program and prediction model generation device
CN109858699B (en) Water quality quantitative simulation method and device, electronic equipment and storage medium
CN114240006A (en) Water resource bearing capacity assessment method
CN110866638A (en) Traffic volume prediction model construction method and device, computer equipment and storage medium
CN110705187A (en) Method for checking and diagnosing real-time online instrument through least square algorithm
CN110188433B (en) Ridge regression soft measurement modeling method based on distributed parallel local modeling mechanism
CN113971372B (en) Wind speed prediction method and device based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination