CN109376337B

CN109376337B - Distributed soft measurement method based on Girvan-Newman algorithm

Info

Publication number: CN109376337B
Application number: CN201811213322.9A
Authority: CN
Inventors: 宋励嘉; 童楚东; 俞海珍
Original assignee: Ningbo University
Current assignee: Dragon Totem Technology Hefei Co ltd; Hangzhou Woyi Digital Technology Co ltd
Priority date: 2018-10-09
Filing date: 2018-10-09
Publication date: 2021-10-01
Anticipated expiration: 2038-10-09
Also published as: CN109376337A

Abstract

The invention discloses a distributed soft measurement method based on a Girvan-Newman algorithm, and aims to solve the problems of how to decompose a large-scale production process from a data perspective so as to establish a distributed soft measurement model and how to integrate and utilize distributed soft measurement results to obtain a product quality data estimation value. The method realizes the decomposition of the process object by utilizing the Girvan-Newman algorithm for the first time, thereby laying an early-stage foundation for establishing a distributed soft measurement model. Secondly, the method integrates the distributed soft measurement result by using the partial least square algorithm, and completes the implementation process from distributed modeling to integrated soft measurement. The method is a novel distributed soft measurement method, and can be suitable for soft measurement of the quality index of the target product in the large-scale production process.

Description

Distributed soft measurement method based on Girvan-Newman algorithm

Technical Field

The invention relates to an industrial process soft measurement method, in particular to a distributed soft measurement method based on a Girvan-Newman algorithm.

Background

In modern industrial processes, measurement techniques and equipment play an important role in the overall production system, and corresponding measurement data can provide a solid data base for production planning scheduling, process monitoring and other industrial "big data" applications. Although. Advanced meter technology is rapidly developed in recent years, information such as flow, liquid level, pressure, temperature and the like can be easily measured in an industrial process, and information of direct or indirect reaction product quality can also be measured by an instrument to obtain real-time data. However, the equipment for analyzing the product quality information on line in real time is generally expensive and expensive to maintain, compared to the flow or temperature meters. If an offline analysis means is adopted, a certain time delay exists in the acquisition of the product quality data, so that an operator cannot know the product quality data timely and accurately. However, quality monitoring is not isolated from these data information, which reflect product quality. In recent decades, with the widespread use of data-driven methods, soft-measurement techniques have come into play. The method realizes real-time estimation of quality data by establishing a regression model between data which is easy to measure in the production process and product quality data. In recent years, research on soft measurement technique has received more and more attention from the industry and academia.

The core of soft measurement techniques is to establish a regression model between input data (typically information that is easily measured in an industrial process, such as pressure, temperature, flow, etc.) and output data (typically measurement indicators that directly or indirectly reflect quality information, such as concentration). In the existing literature and patent documents, the methods commonly used for establishing the regression model include: statistical regression, neural networks, support vector machines, and the like. In addition, as modern industrial processes, in particular the process industry, are increasingly scaled up, a plurality of production units are linked in a staggered manner. If a single soft measurement model is directly established, the soft measurement accuracy which can be achieved is always poor. In recent years, decentralized soft measurement models have gained importance, since the generalization performance of multiple models is generally superior to that of a single model. Generally speaking, implementing multi-model modeling first requires decomposing the process objects. However, due to the complex interleaving of production cells and control system feedback relationships, implementing an effective process decomposition requires relying on sufficient process mechanism knowledge and a priori knowledge of the operator.

In order to facilitate the popularization and application of the soft measurement technology, how to implement decomposition on the whole production process object on the premise of not needing process mechanism knowledge is the first difficult problem in implementing the distributed soft measurement technical scheme. In addition, even if the process object can be successfully decomposed, how to integrate these distributed soft measurement models to obtain the final estimation value of the product quality is the second problem faced by the technology of implementing the distributed soft measurement method. The concept of clustering as presented here is, above all, that the soft measurement models have to be scattered, but the estimation or prediction of the product quality gives only one result, i.e. the results of a plurality of soft measurement models are to be integrated.

Disclosure of Invention

The invention aims to solve the main technical problems that: how to decompose the large-scale production process from the data perspective to build a decentralized soft measurement model, and how to integrate and use decentralized soft measurement result integration to obtain the estimated value of the product quality data.

The technical scheme adopted by the invention for solving the technical problems is as follows: a distributed soft measurement method based on a Girvan-Newman algorithm comprises the following steps:

(1): finding out data corresponding to index capable of reflecting product quality from historical database of production process object to form output matrix Y belonging to R^n×fThe sampled data corresponding to the output Y form an input matrix X ∈ R^n×mWherein n is the number of training samples, m is the number of process measurement variables, f is the number of quality indicators, R is the set of real numbers, R is the number of training samples^n×mRepresenting a matrix of real numbers in dimension n x m.

(2): calculating the mean value mu of each column vector in the output matrix Y₁，μ₂，…，μ_fAnd standard deviation delta₁，δ₂，…，δ_fThen according to the formula

Standardizing each row vector in Y to obtain a standardized output matrix

Wherein the row vector y is equal to

Respectively represent matrices Y and

is given as a mean value vector mu [ mu ] of any one of the row vectors₁，μ₂，…，μ_f]Output standard deviation diagonal matrix

The element on the middle diagonal is δ₁，δ₂，…，δ_f。

(3): carrying out standardization processing on the matrix X to obtain a standardNormalized input matrix

(4): according to the formula

After the correlation coefficient matrix C is calculated, the number of neighbors is set to C and j is initialized to 1.

(5): the largest C elements of the jth row vector in matrix C are all set to 1, while the remaining elements are set to 0.

(6): judging whether the conditions are met: j < m? If yes, returning to the step (5) after j is set to j + 1; and if not, obtaining the updated correlation coefficient matrix C.

(7): according to the formula xi ═ max { C, C^TXi, calculating to obtain a matrix xi, wherein max { C, C^TDenotes taking the matrix C and the matrix C^TThe maximum value of the same position element in the network is marked with a reference number T to represent the transpose of a matrix or a vector, the matrix xi can be regarded as a connecting network matrix among m variables, each variable can be used as a node in the network, the element 1 represents that a connecting edge exists between two nodes, and the element 0 represents that no connecting edge exists between the two nodes.

(8) Hierarchical clustering is carried out on m variables by utilizing a Girvan-Newman algorithm, and the specific implementation process is as follows:

calculating the edge betweenness of all connecting edges in the m node connecting networks.

Finding the connecting edge with the highest edge betweenness and removing the connecting edge from the network, namely setting 0 to the corresponding element in the connecting network matrix xi.

And thirdly, recalculating the edge betweenness of the residual connecting edges in the network.

And fourthly, repeating the steps of the second step and the third step to guide all connecting edges in the connecting network to be removed, namely the matrix xi becomes a zero matrix.

The edge betweenness of a certain connecting edge in the step (i) is defined as: and repeating the same calculation for all the nodes starting from a certain node and reaching the shortest path number of other nodes through the connecting road edge, and adding the obtained edge betweenness relative to different nodes to obtain the edge betweenness of the connecting edge.

(9): according to the hierarchical clustering result in the step (8), m variables can be divided into D blocks, and correspondingly, an input matrix can be divided into D blocks

Division into D matrices:

the step (8) and the step (9) complete the decomposition of the process object through the Girvan-Newman algorithm, and the process measurement variable is decomposed into D sub-blocks on the basis of depending on the correlation among the variables, so that the first step of the establishment of the distributed soft measurement model is completed.

(10): establishing using partial least squares algorithm

And output matrix

Soft measurement model in between:

wherein D is 1, 2, …, D, B_dAs a matrix of regression coefficients, E_dIs an error matrix.

It is to be emphasized that although the soft measurement model is established by using the partial least square algorithm in step (10), the method of the present invention may also use a neural network, support vector regression, or kernel partial least square algorithm to establish the soft measurement model.

(11): repeating the step (10) until D soft measurement models are obtained, and utilizing a regression coefficient matrix B₁，B₂，…，B_DAccording to the formula

Calculating output estimation values corresponding to the soft measurement models

(12): will output the estimated value

Are combined into a matrix

Establishing using partial least squares algorithm

And output

Soft measurement model in between:

wherein

In the form of a matrix of regression coefficients,

is an error matrix.

The above steps (1) to (12) are the off-line modeling stages of the method of the present invention, wherein step (11) establishes the distributed D soft measurement models, and step (12) integrates the distributed soft measurement results. The steps (13) to (18) shown below are the on-line soft measurement implementation process of the method of the present invention.

(13): acquiring sample data z epsilon R of new time process object^1×mAnd subjecting it to the same normalization process as the matrix X to obtain a vector

(14): according to the hierarchical clustering result in the step (8), vectors are subjected to hierarchical clustering

Divided into D rowsVector quantity:

and initializes d to 1.

(15): according to the formula

Calculating to obtain an output estimation value y corresponding to the d-th soft measurement model_d。

(16): judging whether the conditions are met: d < D? If yes, returning to the step (15) after d is set to d + 1; if not, combining the output estimated values of the D soft measurement models into a vector

(17): according to the formula

Integrating to obtain final estimation value of distributed soft measurement model to output

Then the estimate of the quality indicator at the current sampling instant is

Wherein μ and

and (3) obtaining a mean vector and a standard deviation diagonal matrix in the step (2).

(18): and returning to the step (13) to continue to carry out soft measurement on the quality index at the new sampling moment.

Compared with the prior art, the method has the advantages that:

the Girvan-Newman algorithm can be said to be used for blocking processing of variables in the method of the invention for the first time, without any process mechanism knowledge and prior operation experience. The method of the invention utilizes the Girvan-Newman algorithm to complete the process decomposition task of the monitored object, and lays a cushion for establishing a distributed soft measurement model. In addition, although the method of the present invention uses partial least squares algorithm to build the soft measurement model in step (10), the method of the present invention is not limited to use of partial least squares algorithm, and other methods such as neural network, support vector machine, kernel partial least squares algorithm, etc. can be used to build the soft measurement model. In addition, in order to integrate the distributed soft measurement results, the method completes the integration of the soft measurement results by using the partial least square algorithm again. The method is a novel distributed soft measurement method, and can be suitable for soft measurement of the quality index of the target product in the large-scale production process.

Drawings

FIG. 1 is a flow chart of an embodiment of the method of the present invention.

FIG. 2 is a flow chart of an implementation of the Girvan-Newman algorithm.

Detailed Description

The method of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in FIG. 1, the invention provides a distributed soft measurement method based on Girvan-Newman algorithm, and the specific implementation mode of the method is as follows:

step (1): finding out data corresponding to index capable of reflecting product quality from historical database of production process object to form output matrix Y belonging to R^n×fThe sampled data corresponding to the output Y form an input matrix X ∈ R^n×mWherein n is the number of training samples, m is the number of process measurement variables, f is the number of quality indicators, R is the set of real numbers, R is the number of training samples^n×mA real number matrix representing dimensions n × m;

step (2): calculating the mean value mu of each column vector in the output matrix Y₁，μ₂，…，μ_fAnd standard deviation delta₁，δ₂，…，δ_fThen according to the formula

Standardizing each row vector in Y to obtain a standardized output matrix

Wherein the row vector y is equal to

Respectively represent matrices Y and

The element on the middle diagonal is δ₁，δ₂，…，δ_f；

And (3): standardizing the matrix X to obtain a standardized input matrix

And (4): according to the formula

After the correlation coefficient matrix C is calculated, a neighbor parameter C is set and j is initialized to 1.

And (5): the largest C elements of the jth row vector in matrix C are all set to 1, while the remaining elements are set to 0.

And (6): judging whether the conditions are met: j < m? If yes, returning to the step (5) after j is set to j + 1; if not, obtaining an updated correlation coefficient matrix C;

and (7): according to the formula xi ═ max { C, C^TXi, calculating to obtain a matrix xi, wherein max { C, C^TDenotes taking the matrix C and the matrix C^TThe maximum value of the elements at the same position is marked with a number T to represent the transpose of a matrix or a vector, the matrix xi can be regarded as a connecting network matrix among m variables, each variable can be used as a node in the network, the element 1 represents that a connecting edge exists between two nodes, and the element 0 represents that no connecting edge exists between the two nodes;

and (8): and (3) performing hierarchical clustering on the m variables by using a Girvan-Newman algorithm, wherein the implementation flow of the Girvan-Newman algorithm is shown in a figure 2, and the specific implementation mode comprises the following steps (8.1) to (8.4).

Step (8.1): calculating the edge betweenness of all connecting edges in the m node connecting networks;

step (8.2): finding the connecting edge with the highest edge betweenness and removing the connecting edge from the network, namely setting 0 to the corresponding element of the connecting network matrix xi;

step (8.3): recalculating edge betweenness of the residual connecting edges in the network;

step (8.4): repeating the second step and the third step, and guiding all connecting edges in the connecting network to be removed, namely the matrix xi becomes a zero matrix;

and (9): according to the hierarchical clustering result in the step (8), m variables can be divided into D blocks, and correspondingly, an input matrix can be divided into D blocks

Division into D matrices:

step (10): establishing using partial least squares algorithm

And output matrix

Soft measurement model in between:

the specific embodiment is as follows:

initializing h to 1, setting the matrix

And matrix

And initializes the vector uIs a matrix Y₀The first column of (2).

② according to formula w_h＝Z^Tu/(u^Tu) calculating an input weight vector w_hAnd using formula w_h＝w_h/||w_h| | unitized vector w_h。

Thirdly according to formula s_h＝Zw_h/(w_h ^Tw_h) Calculating a score vector s_h。

Fourthly, according to a formula g_h＝Y₀ ^Ts_h/(s_h ^Ts_h) Calculating an output weight vector g_h。

According to the formula u-Y₀g_hThe vector u is updated.

Sixthly, repeating from (1) to (v) until s_hConvergence, the criterion being the vector s_hWherein the elements do not change.

Keeping the input weight vector w_hAnd the output weight g_hAnd according to the formula p_h＝X_i ^Ts_h/(s_h ^Ts_h) Calculating a projection vector p_h。

Updating input matrix Z and matrix Y according to the following two formulas₀：

Z＝Z-s_hp_h ^T (7)

Y₀＝Y₀-s_hg_h (8)

Ninthly, judging whether the conditions are met: h is less than M_dIs there a Wherein M is_dIs a matrix

If yes, setting h as h +1, and returning to the step two; if not, all the obtained input weight vectors form a matrix W₀＝[w₁，w₂，…，w_h]All the output weight vectors form a matrix G₀＝[g₁，g₂，…，g_h]And all projection vectors form a matrix P₀＝[p₁，p₂，…，p_h]。

Method for determining number k of projection vectors reserved in partial least square model in R (zero) by using cross validation method_dThen the soft measurement model established by the partial least squares algorithm can be expressed as:

wherein the regression coefficient matrix B_d＝W(P^TW)^-1G^TThe matrices W, P, G are each formed by a matrix W₀、P₀And G₀1 st column to k th column of_dThe column vector of the column.

Step (11): repeating the step (10) until D soft measurement models are obtained, and utilizing a regression coefficient matrix B₁，B₂，…，B_DAccording to the formula

Step (12): will output the estimated value

Are combined into a matrix

Establishing using partial least squares algorithm

And output

Soft measurement model in between:

the embodiment is the same as in steps (r) to (r) above.

The steps (1) to (12) are off-line modeling stages implemented by the method of the present invention, and steps need to be reservedOutput mean vector mu and output standard deviation diagonal matrix in step (2)

And (3) performing hierarchical clustering on the result in the step (8), the regression coefficient matrixes in the step (11) and the regression coefficient matrixes in the step (12) for calling when online distributed soft measurement is performed.

Step (13): sample data z epsilon R of the latest sampling time of the acquisition process object^1×mAnd subjecting it to the same normalization process as the matrix X to obtain a vector

Step (14): according to the hierarchical clustering result in the step (8), vectors are subjected to hierarchical clustering

Division into D row vectors:

and initializes d to 1.

Step (15): according to the formula

Step (16): judging whether the conditions are met: d < D? If yes, returning to the step (15) after d is set to d + 1; if not, combining the output estimated values of the D soft measurement models into a vector

Step (17): according to the formula

Then the estimate of the quality indicator at the current sampling instant is

Wherein μ and

Step (18): and (4) returning to the step (13) to continue to carry out soft measurement on the quality index at the new sampling moment.

The above-described embodiments are only preferred embodiments of the present invention, and any modifications and changes made to the present invention within the spirit of the present invention and the scope of the claims should not be excluded from the scope of the present invention.

Claims

1. A distributed soft measurement method based on a Girvan-Newman algorithm is characterized by comprising the following steps:

firstly, the off-line modeling stage comprises the following steps (1) to (12);

step (1): finding out data corresponding to index capable of reflecting product quality from historical database of production process object to form output matrix Y belonging to R^n×fThe sampled data corresponding to the output matrix Y form an input matrix X ∈ R^n×mWherein n is the number of training samples, m is the number of process measurement variables, f is the number of quality indicators, R is the set of real numbers, R is the number of training samples^n×mA real number matrix representing dimensions n × m;

Standardizing each row vector in Y to obtain a standardized output matrix

Wherein the row vector y is equal to

Respectively represent Y and

The element on the middle diagonal is δ₁，δ₂，…，δ_f；

And (3): standardizing the input matrix X to obtain a standardized input matrix

And (4): according to the formula

After a correlation coefficient matrix C is calculated, setting the number of neighbors as C and initializing j to be 1;

and (5): setting all maximum C elements of j-th row vectors in a correlation coefficient matrix C to be 1, and setting the rest elements to be 0;

and (6): judging whether the conditions are met: j is less than m; if yes, returning to the step (5) after j is set to j + 1; if not, obtaining a correlation coefficient matrix C;

and (7): according to the formula xi ═ max { C, C^TXi, calculating to obtain a matrix xi, wherein max { C, C^TMeans take C and C^TThe maximum value of the element at the same position is marked with a number T which represents the transpose of a matrix or a vector, the matrix xi can be regarded as a connecting network matrix among m variables, each variable can be used as a node in the network, the element 1 represents that a connecting edge exists between two nodes, and the element 0 represents that no connecting edge exists between the two nodes;

and (8): hierarchical clustering is carried out on m variables by utilizing a Girvan-Newman algorithm, and the specific implementation process is as follows:

step (8.4): repeating the step (8.2) -and the step (8.3) until all connecting edges in the connecting network are removed, i.e. the matrix xi becomes a zero matrix;

Division into D matrices:

step (10): establishing using partial least squares algorithm

And output matrix

Soft measurement model in between:

wherein D is 1, 2, …, D, B_dAs a matrix of regression coefficients, E_dIs an error matrix;

Step (12): will output the estimated value

Are combined into a matrix

Establishing using partial least squares algorithm

And output

Soft measurement model in between:

wherein

In the form of a matrix of regression coefficients,

is an error matrix; secondly, after the off-line modeling stage is finished, implementing on-line soft measurement, comprising the following steps (13) to (18);

Division into D row vectors:

and initializing d ═ 1;

step (15): according to the formula

Calculating to obtain an output estimation value y corresponding to the d-th soft measurement model_d；

Step (16): judging whether the conditions are met: d is less than D; if yes, returning to the step (15) after d is set to d + 1; if not, combining the output estimated values of the D soft measurement models into a vector

Step (17): according to the formula

Then the estimate of the quality indicator at the current sampling instant is

Wherein μ and

the mean vector and the output standard deviation diagonal matrix in the step (2) are obtained;

2. The distributed soft measurement method based on Girvan-Newman algorithm according to claim 1, wherein the soft measurement model is established by partial least squares algorithm in the step (10), but the method can also adopt neural network, support vector regression, and kernel partial least squares algorithm to establish the soft measurement model.