Background
In modern industrial processes, measurement techniques and equipment play an important role in the overall production system, and corresponding measurement data can provide a solid data base for production planning scheduling, process monitoring and other industrial "big data" applications. Although. Advanced meter technology is rapidly developed in recent years, information such as flow, liquid level, pressure, temperature and the like can be easily measured in an industrial process, and information of direct or indirect reaction product quality can also be measured by an instrument to obtain real-time data. However, the equipment for analyzing the product quality information on line in real time is generally expensive and expensive to maintain, compared to the flow or temperature meters. If an offline analysis means is adopted, a certain time delay exists in the acquisition of the product quality data, so that an operator cannot know the product quality data timely and accurately. However, quality monitoring is not isolated from these data information, which reflect product quality. In recent decades, with the widespread use of data-driven methods, soft-measurement techniques have come into play. The method realizes real-time estimation of quality data by establishing a regression model between data which is easy to measure in the production process and product quality data. In recent years, research on soft measurement technique has received more and more attention from the industry and academia.
The core of soft measurement techniques is to establish a regression model between input data (typically information that is easily measured in an industrial process, such as pressure, temperature, flow, etc.) and output data (typically measurement indicators that directly or indirectly reflect quality information, such as concentration). In the existing literature and patent documents, the methods commonly used for establishing the regression model include: statistical regression, neural networks, support vector machines, and the like. In addition, as modern industrial processes, in particular the process industry, are increasingly scaled up, a plurality of production units are linked in a staggered manner. If a single soft measurement model is directly established, the soft measurement accuracy which can be achieved is always poor. In recent years, decentralized soft measurement models have gained importance, since the generalization performance of multiple models is generally superior to that of a single model. Generally speaking, implementing multi-model modeling first requires decomposing the process objects. However, due to the complex interleaving of production cells and control system feedback relationships, implementing an effective process decomposition requires relying on sufficient process mechanism knowledge and a priori knowledge of the operator.
In order to facilitate the popularization and application of the soft measurement technology, how to implement decomposition on the whole production process object on the premise of not needing process mechanism knowledge is the first difficult problem in implementing the distributed soft measurement technical scheme. In addition, even if the process object can be successfully decomposed, how to integrate these distributed soft measurement models to obtain the final estimation value of the product quality is the second problem faced by the technology of implementing the distributed soft measurement method. The concept of clustering as presented here is, above all, that the soft measurement models have to be scattered, but the estimation or prediction of the product quality gives only one result, i.e. the results of a plurality of soft measurement models are to be integrated.
Disclosure of Invention
The invention aims to solve the main technical problems that: how to decompose the large-scale production process from the data perspective to build a decentralized soft measurement model, and how to integrate and use decentralized soft measurement result integration to obtain the estimated value of the product quality data.
The technical scheme adopted by the invention for solving the technical problems is as follows: a distributed soft measurement method based on a Girvan-Newman algorithm comprises the following steps:
(1): finding out data corresponding to index capable of reflecting product quality from historical database of production process object to form output matrix Y belonging to Rn×fThe sampled data corresponding to the output Y form an input matrix X ∈ Rn×mWherein n is the number of training samples, m is the number of process measurement variables, f is the number of quality indicators, R is the set of real numbers, R is the number of training samplesn×mRepresenting a matrix of real numbers in dimension n x m.
(2): calculating the mean value mu of each column vector in the output matrix Y
1,μ
2,…,μ
fAnd standard deviation delta
1,δ
2,…,δ
fThen according to the formula
Standardizing each row vector in Y to obtain a standardized output matrix
Wherein the row vector y is equal to
Respectively represent matrices Y and
is given as a mean value vector mu [ mu ] of any one of the row vectors
1,μ
2,…,μ
f]Output standard deviation diagonal matrix
The element on the middle diagonal is δ
1,δ
2,…,δ
f。
(3): carrying out standardization processing on the matrix X to obtain a standardNormalized input matrix
(4): according to the formula
After the correlation coefficient matrix C is calculated, the number of neighbors is set to C and j is initialized to 1.
(5): the largest C elements of the jth row vector in matrix C are all set to 1, while the remaining elements are set to 0.
(6): judging whether the conditions are met: j < m? If yes, returning to the step (5) after j is set to j + 1; and if not, obtaining the updated correlation coefficient matrix C.
(7): according to the formula xi ═ max { C, CTXi, calculating to obtain a matrix xi, wherein max { C, CTDenotes taking the matrix C and the matrix CTThe maximum value of the same position element in the network is marked with a reference number T to represent the transpose of a matrix or a vector, the matrix xi can be regarded as a connecting network matrix among m variables, each variable can be used as a node in the network, the element 1 represents that a connecting edge exists between two nodes, and the element 0 represents that no connecting edge exists between the two nodes.
(8) Hierarchical clustering is carried out on m variables by utilizing a Girvan-Newman algorithm, and the specific implementation process is as follows:
calculating the edge betweenness of all connecting edges in the m node connecting networks.
Finding the connecting edge with the highest edge betweenness and removing the connecting edge from the network, namely setting 0 to the corresponding element in the connecting network matrix xi.
And thirdly, recalculating the edge betweenness of the residual connecting edges in the network.
And fourthly, repeating the steps of the second step and the third step to guide all connecting edges in the connecting network to be removed, namely the matrix xi becomes a zero matrix.
The edge betweenness of a certain connecting edge in the step (i) is defined as: and repeating the same calculation for all the nodes starting from a certain node and reaching the shortest path number of other nodes through the connecting road edge, and adding the obtained edge betweenness relative to different nodes to obtain the edge betweenness of the connecting edge.
(9): according to the hierarchical clustering result in the step (8), m variables can be divided into D blocks, and correspondingly, an input matrix can be divided into D blocks
Division into D matrices:
the step (8) and the step (9) complete the decomposition of the process object through the Girvan-Newman algorithm, and the process measurement variable is decomposed into D sub-blocks on the basis of depending on the correlation among the variables, so that the first step of the establishment of the distributed soft measurement model is completed.
(10): establishing using partial least squares algorithm
And output matrix
Soft measurement model in between:
wherein D is 1, 2, …, D, B
dAs a matrix of regression coefficients, E
dIs an error matrix.
It is to be emphasized that although the soft measurement model is established by using the partial least square algorithm in step (10), the method of the present invention may also use a neural network, support vector regression, or kernel partial least square algorithm to establish the soft measurement model.
(11): repeating the step (10) until D soft measurement models are obtained, and utilizing a regression coefficient matrix B
1,B
2,…,B
DAccording to the formula
Calculating output estimation values corresponding to the soft measurement models
(12): will output the estimated value
Are combined into a matrix
Establishing using partial least squares algorithm
And output
Soft measurement model in between:
wherein
In the form of a matrix of regression coefficients,
is an error matrix.
The above steps (1) to (12) are the off-line modeling stages of the method of the present invention, wherein step (11) establishes the distributed D soft measurement models, and step (12) integrates the distributed soft measurement results. The steps (13) to (18) shown below are the on-line soft measurement implementation process of the method of the present invention.
(13): acquiring sample data z epsilon R of new time process object
1×mAnd subjecting it to the same normalization process as the matrix X to obtain a vector
(14): according to the hierarchical clustering result in the step (8), vectors are subjected to hierarchical clustering
Divided into D rowsVector quantity:
and initializes d to 1.
(15): according to the formula
Calculating to obtain an output estimation value y corresponding to the d-th soft measurement model
d。
(16): judging whether the conditions are met: d < D? If yes, returning to the step (15) after d is set to d + 1; if not, combining the output estimated values of the D soft measurement models into a vector
(17): according to the formula
Integrating to obtain final estimation value of distributed soft measurement model to output
Then the estimate of the quality indicator at the current sampling instant is
Wherein μ and
and (3) obtaining a mean vector and a standard deviation diagonal matrix in the step (2).
(18): and returning to the step (13) to continue to carry out soft measurement on the quality index at the new sampling moment.
Compared with the prior art, the method has the advantages that:
the Girvan-Newman algorithm can be said to be used for blocking processing of variables in the method of the invention for the first time, without any process mechanism knowledge and prior operation experience. The method of the invention utilizes the Girvan-Newman algorithm to complete the process decomposition task of the monitored object, and lays a cushion for establishing a distributed soft measurement model. In addition, although the method of the present invention uses partial least squares algorithm to build the soft measurement model in step (10), the method of the present invention is not limited to use of partial least squares algorithm, and other methods such as neural network, support vector machine, kernel partial least squares algorithm, etc. can be used to build the soft measurement model. In addition, in order to integrate the distributed soft measurement results, the method completes the integration of the soft measurement results by using the partial least square algorithm again. The method is a novel distributed soft measurement method, and can be suitable for soft measurement of the quality index of the target product in the large-scale production process.
Detailed Description
The method of the present invention will be described in detail below with reference to the accompanying drawings.
As shown in FIG. 1, the invention provides a distributed soft measurement method based on Girvan-Newman algorithm, and the specific implementation mode of the method is as follows:
step (1): finding out data corresponding to index capable of reflecting product quality from historical database of production process object to form output matrix Y belonging to Rn×fThe sampled data corresponding to the output Y form an input matrix X ∈ Rn×mWherein n is the number of training samples, m is the number of process measurement variables, f is the number of quality indicators, R is the set of real numbers, R is the number of training samplesn×mA real number matrix representing dimensions n × m;
step (2): calculating the mean value mu of each column vector in the output matrix Y
1,μ
2,…,μ
fAnd standard deviation delta
1,δ
2,…,δ
fThen according to the formula
Standardizing each row vector in Y to obtain a standardized output matrix
Wherein the row vector y is equal to
Respectively represent matrices Y and
is given as a mean value vector mu [ mu ] of any one of the row vectors
1,μ
2,…,μ
f]Output standard deviation diagonal matrix
The element on the middle diagonal is δ
1,δ
2,…,δ
f;
And (3): standardizing the matrix X to obtain a standardized input matrix
And (4): according to the formula
After the correlation coefficient matrix C is calculated, a neighbor parameter C is set and j is initialized to 1.
And (5): the largest C elements of the jth row vector in matrix C are all set to 1, while the remaining elements are set to 0.
And (6): judging whether the conditions are met: j < m? If yes, returning to the step (5) after j is set to j + 1; if not, obtaining an updated correlation coefficient matrix C;
and (7): according to the formula xi ═ max { C, CTXi, calculating to obtain a matrix xi, wherein max { C, CTDenotes taking the matrix C and the matrix CTThe maximum value of the elements at the same position is marked with a number T to represent the transpose of a matrix or a vector, the matrix xi can be regarded as a connecting network matrix among m variables, each variable can be used as a node in the network, the element 1 represents that a connecting edge exists between two nodes, and the element 0 represents that no connecting edge exists between the two nodes;
and (8): and (3) performing hierarchical clustering on the m variables by using a Girvan-Newman algorithm, wherein the implementation flow of the Girvan-Newman algorithm is shown in a figure 2, and the specific implementation mode comprises the following steps (8.1) to (8.4).
Step (8.1): calculating the edge betweenness of all connecting edges in the m node connecting networks;
step (8.2): finding the connecting edge with the highest edge betweenness and removing the connecting edge from the network, namely setting 0 to the corresponding element of the connecting network matrix xi;
step (8.3): recalculating edge betweenness of the residual connecting edges in the network;
step (8.4): repeating the second step and the third step, and guiding all connecting edges in the connecting network to be removed, namely the matrix xi becomes a zero matrix;
and (9): according to the hierarchical clustering result in the step (8), m variables can be divided into D blocks, and correspondingly, an input matrix can be divided into D blocks
Division into D matrices:
step (10): establishing using partial least squares algorithm
And output matrix
Soft measurement model in between:
the specific embodiment is as follows:
initializing h to 1, setting the matrix
And matrix
And initializes the vector uIs a matrix Y
0The first column of (2).
② according to formula wh=ZTu/(uTu) calculating an input weight vector whAnd using formula wh=wh/||wh| | unitized vector wh。
Thirdly according to formula sh=Zwh/(wh Twh) Calculating a score vector sh。
Fourthly, according to a formula gh=Y0 Tsh/(sh Tsh) Calculating an output weight vector gh。
According to the formula u-Y0ghThe vector u is updated.
Sixthly, repeating from (1) to (v) until shConvergence, the criterion being the vector shWherein the elements do not change.
Keeping the input weight vector whAnd the output weight ghAnd according to the formula ph=Xi Tsh/(sh Tsh) Calculating a projection vector ph。
Updating input matrix Z and matrix Y according to the following two formulas0:
Z=Z-shph T (7)
Y0=Y0-shgh (8)
Ninthly, judging whether the conditions are met: h is less than M
dIs there a Wherein M is
dIs a matrix
If yes, setting h as h +1, and returning to the step two; if not, all the obtained input weight vectors form a matrix W
0=[w
1,w
2,…,w
h]All the output weight vectors form a matrix G
0=[g
1,g
2,…,g
h]And all projection vectors form a matrix P
0=[p
1,p
2,…,p
h]。
Method for determining number k of projection vectors reserved in partial least square model in R (zero) by using cross validation method
dThen the soft measurement model established by the partial least squares algorithm can be expressed as:
wherein the regression coefficient matrix B
d=W(P
TW)
-1G
TThe matrices W, P, G are each formed by a matrix W
0、P
0And G
01 st column to k th column of
dThe column vector of the column.
Step (11): repeating the step (10) until D soft measurement models are obtained, and utilizing a regression coefficient matrix B
1,B
2,…,B
DAccording to the formula
Calculating output estimation values corresponding to the soft measurement models
Step (12): will output the estimated value
Are combined into a matrix
Establishing using partial least squares algorithm
And output
Soft measurement model in between:
the embodiment is the same as in steps (r) to (r) above.
The steps (1) to (12) are off-line modeling stages implemented by the method of the present invention, and steps need to be reservedOutput mean vector mu and output standard deviation diagonal matrix in step (2)
And (3) performing hierarchical clustering on the result in the step (8), the regression coefficient matrixes in the step (11) and the regression coefficient matrixes in the step (12) for calling when online distributed soft measurement is performed.
Step (13): sample data z epsilon R of the latest sampling time of the acquisition process object
1×mAnd subjecting it to the same normalization process as the matrix X to obtain a vector
Step (14): according to the hierarchical clustering result in the step (8), vectors are subjected to hierarchical clustering
Division into D row vectors:
and initializes d to 1.
Step (15): according to the formula
Calculating to obtain an output estimation value y corresponding to the d-th soft measurement model
d。
Step (16): judging whether the conditions are met: d < D? If yes, returning to the step (15) after d is set to d + 1; if not, combining the output estimated values of the D soft measurement models into a vector
Step (17): according to the formula
Integrating to obtain final estimation value of distributed soft measurement model to output
Then the estimate of the quality indicator at the current sampling instant is
Wherein μ and
and (3) obtaining a mean vector and a standard deviation diagonal matrix in the step (2).
Step (18): and (4) returning to the step (13) to continue to carry out soft measurement on the quality index at the new sampling moment.
The above-described embodiments are only preferred embodiments of the present invention, and any modifications and changes made to the present invention within the spirit of the present invention and the scope of the claims should not be excluded from the scope of the present invention.