CN111191726A

CN111191726A - Fault classification method based on weak supervised learning multi-layer perceptron

Info

Publication number: CN111191726A
Application number: CN201911418196.5A
Authority: CN
Inventors: 葛志强; 廖思奋
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-31
Filing date: 2019-12-31
Publication date: 2020-05-22
Anticipated expiration: 2039-12-31
Also published as: CN111191726B

Abstract

The invention discloses a process data fault classification method based on a weak supervised learning multilayer sensor, which consists of a supervised classification network consisting of the multilayer sensor, a BatchNormalization layer, a Dropout layer and a Softmax output layer and a Gaussian mixture model for acquiring inaccurate sample labels; the multi-layer perceptron can learn the feature representation of data from inaccurate label data, in addition, a Gaussian mixture model carries out unsupervised clustering on the features extracted by the multi-layer perceptron, the clustering result can be used for estimating the relation between inaccurate labels of various samples and potential real labels of the samples, namely a label probability transition matrix, and the estimated label probability transition matrix is used for correcting a network loss function to carry out secondary training on the classification network, so that the classification precision of the network on the inaccurate label samples is improved. The method can be suitable for the situation that labels of the industrial process data sample part are labeled wrongly, namely the fault classification of the labels is not accurate.

Description

Fault classification method based on weak supervised learning multi-layer perceptron

Technical Field

The invention belongs to the field of fault diagnosis and classification in industrial processes, and particularly relates to a fault classification method based on a weak supervised learning multi-layer sensor.

Background

In the industrial process monitoring, when a fault is detected, the fault information needs to be further analyzed, and fault classification is an important link in the fault information, so that the fault classification is obtained, and the recovery of the industrial process is facilitated.

In the traditional fault classification, the acquired data sample labels are assumed to be accurate, so that model training is performed, however, the labels of the industrial process data are generated in the modes of an external knowledge base, a rule base or manual calibration, and the labels of the samples may be inaccurate. In addition, inaccurate label samples are more readily available and less costly than accurate label samples. Sample label inaccuracy has become a non-negligible feature of the model. Therefore, the modeling of the inaccurate label samples through weak supervised learning is carried out in practice, and the classification precision of the model to the fault samples can be improved.

Disclosure of Invention

The invention provides a fault classification method based on a weak supervised learning multi-layer sensor, aiming at the problems that a sample label is possibly inaccurate in the current industrial process and the like.

The purpose of the invention is realized by the following technical scheme: a method for process data fault classification for a multi-layered perceptron based on weak supervised learning, the multi-layered perceptron based on weak supervised learning comprising: two layers of perceptrons MLP, a Softmax output layer and a Gaussian mixture model GMM. The process data fault classification method specifically comprises the following steps:

the method comprises the following steps: collecting samples containing inaccurate labels in historical industrial processes as training data sets

Wherein the content of the first and second substances,

in order to not accurately label the data sample,

is a label for the sample to be tested,

n represents the number of samples in the training data set, and K is the number of sample classes.

Step two: standardizing the training data set D collected in the step one, namely mapping each variable of the labeled sample set X into a sample set X with the mean value of 0 and the variance of 1_stdAnd converting each sample label of the label set Y into a one-dimensional vector through one-hot coding to obtain a standardized data set

Step three: first a new data set D is created_stdAs input, carrying out first supervised training on the network of the sensor MLP, and obtaining a sample set X at a Softmax output layer_stdBelong to its label

The posterior probability of (d).

Step four: taking the posterior probability obtained in the third step as the input of the Gaussian mixture model GMM, training the Gaussian mixture model, and using the trained parameters of the Gaussian mixture model

Estimating a label probability transition matrix T to obtain an estimation matrix

Step five: according to

Correcting the loss function of the inaccurate label sample fitted by the MLP of the step three, and obtaining a data set D by the step two_stdAs input, a second supervised training step, namely a third sensor MLP, completes weak supervised learning to obtain a trained WS-MLP network;

step six: collecting new industrial process data of unknown fault types, standardizing the process data according to the method of the second step to obtain a data set d_stdInputting the data into the WS-MLP network trained in the fifth step, calculating the posterior probability of each fault category corresponding to the sample, and taking the category with the maximum posterior probability as the category of the sample to realize the fault classification of the sample.

Further, the third step specifically comprises the following steps:

(3.1) constructing a network of the sensor MLP, wherein the network of the sensor MLP consists of a first hidden layer, a BatchNormalization layer, a Dropout layer, a second hidden layer, a BatchNormalization layer, a Dropout layer and a Softmax layer which are connected in sequence. Wherein, the weight matrix and bias of the first hidden layer and the second hidden layerThe location vectors are respectively W₁,b₁,W₂,b₂The weight matrix and the offset vector from the hidden layer of the second layer to the Softmax layer are respectively W₃,b₃These network parameters are denoted as θ ═ W₁,b₁,W₂,b₂,W₃,b₃}。

(3.2) normalized sample set D_stdAs input, the network of perceptrons MLP is supervised trained, using a cross entropy loss function:

wherein the content of the first and second substances,

is a representation of the last layer of the MLP network.

And the loss function adjusts parameters of the whole sensor MLP network through a back propagation algorithm (BP), and obtains parameters of the whole network after repeated iteration loss convergence to finish training.

Further, the fourth step specifically includes the following steps:

(4.1) each type of exemplar of the inaccurate label exemplar set consists of an exact-label exemplar and an incorrect-label exemplar, making the following assumptions: it is assumed that the generation of inaccurate labels is independent of the input, i.e. the probability that a certain class of samples is labeled into other classes is the same. And the MLP network is assumed to have perceptual consistency, i.e. the MLP network obeys a gaussian distribution for the characteristic representation of the labeled accurate samples and labeled erroneous samples in each class, respectively.

According to assumptions, it is possible to obtain:

wherein the content of the first and second substances,

is a sample set D_stdIs a sample representation, y is a potential true label of the sample, p (-) represents a probability, eⁱI ∈ {1,2, L, K } is represented in

Spatially, the ith element is a vector with 1 and other elements are 0, θ represents all weight matrix and offset vector parameters in the MLP network, μ, Σ represents the mean vector and covariance matrix of unknown gaussian distribution, respectively,

and

respectively representing the Gaussian distribution density of all samples and the class of i samples, T represents a label probability transition matrix, and defines

(4.2) for different classes of sample subsets

Modeling using a gaussian mixture model:

wherein x isⁱRepresentation of belonging toData set

The sample data of (a) is stored in the memory,

to represent

┐ i denote other categories than category i.

(4.3) establishing a two-component Gaussian mixture model, using a maximum Expectation (EM) algorithm to complete parameter estimation of the Gaussian mixture model, and solving

Namely, it is

When step (E step) is desired, the Q function is calculated:

where t is the number of iterations.

Calculation model for observed data

Degree of responsibility of

Wherein the content of the first and second substances,

denotes xⁱThe nth sample of (1).

At very big step (M steps), the mean value mu of Gaussian distribution is estimated_mAnd a mixing coefficient α_m。

Wherein S is_iTo represent

The number of samples.

And E, alternately iterating the step E and the step M until the model parameters are converged or the preset maximum iteration times. Solve out

Namely, it is

(4.4) according to the formula

Solving to obtain a mixed coefficient

And uses this to derive an estimate of the label probability transition matrix T

Wherein the content of the first and second substances,

representation estimation matrix

Row i and column k.

Further, in step five, the network of perceptrons MLP is trained for the second time using the modified loss function as:

compared with the prior art, the method has the advantages that modeling can be carried out on the scene with the inaccurate label of the labeled sample, label probability transition matrix evaluation is carried out on the inaccurate label sample, the label probability transition matrix evaluation is used for correcting the loss function of the classification network, weak supervision learning is completed, and therefore the classification precision of the model on the inaccurate label sample is improved.

Drawings

FIG. 1 is a TennesseeEastman (TE) process flow diagram;

fig. 2 is a classification accuracy comparison graph of an MLP network and a weak supervised learning based multi-layer perceptron (WS-MLP) for 9 TE process fault cases at 5 label noise ratios.

Detailed Description

The method for classifying faults based on the weakly supervised learning multi-layer perceptron of the present invention is further described in detail with reference to the following embodiments.

A process data fault classification method of a multilayer perceptron based on weak supervised learning is characterized in that the multilayer perceptron based on the weak supervised learning comprises the following steps: two layers of perceptrons MLP, a Softmax output layer and a Gaussian mixture model GMM. The process data fault classification method specifically comprises the following steps:

Wherein the content of the first and second substances,

in order to not accurately label the data sample,

is a label for the sample to be tested,

The posterior probability of (d). The process specifically comprises the following substeps:

(3.1) constructing a network of the sensor MLP, wherein the network of the sensor MLP consists of a first hidden layer, a BatchNormalization layer, a Dropout layer, a second hidden layer, a BatchNormalization layer, a Dropout layer and a Softmax layer which are connected in sequence. Wherein, the weight matrix and the offset vector of the first layer hidden layer and the second layer hidden layer are respectively W₁,b₁,W₂,b₂The weight matrix and the offset vector from the hidden layer of the second layer to the Softmax layer are respectively W₃,b₃These network parameters are denoted as θ ═ W₁,b₁,W₂,b₂,W₃,b₃}。

(3.2) normalized sample set D_stdAs input, sense of oppositionThe network of perceptron MLPs is supervised trained, using a cross-entropy loss function:

wherein the content of the first and second substances,

is a representation of the last layer of the MLP network.

The general label probability transition matrix is difficult to obtain, the generation and the input of the label are independent according to the assumption of inaccuracy, the label probability transition matrix has perception consistency with an MLP network, and unsupervised learning can be carried out on the first training result of the MLP network by utilizing a Gaussian mixture model, so that the mixing coefficient learned by the Gaussian mixture model is approximate to the elements in the label probability transition matrix, and the method specifically comprises the following steps:

According to assumptions, it is possible to obtain:

wherein the content of the first and second substances,

is a sample set D_stdIs a sample representation, y is a potential true label of the sample, p (-) represents a probability, eⁱI e {1,2, …, K } is shown in

and

(4.2) for different classes of sample subsets

Modeling using a gaussian mixture model:

wherein x isⁱRepresentation belonging to a data set

The sample data of (a) is stored in the memory,

to represent

┐ i denote other categories than category i.

Namely, it is

When step (E step) is desired, the Q function is calculated:

where t is the number of iterations.

Calculation model for observed data

Degree of responsibility of

Wherein the content of the first and second substances,

denotes xⁱThe nth sample of (1).

Wherein S is_iTo represent

The number of samples.

Namely, it is

(4.4) according to the formula

Solving to obtain a mixed coefficient

Wherein the content of the first and second substances,

representation estimation matrix

Row i and column k.

Step five: according to

Correcting the loss function of the inaccurate label sample fitted by the MLP of the step three, and obtaining a data set D by the step two_stdAnd (4) as input, performing second supervised training on the network of the three sensors MLP to finish weak supervised learning to obtain a trained WS-MLP network.

The network training of the second perceptron MLP uses a modified loss function as:

In order to evaluate the classification effect of the fault classification model, a classification F corresponding to a certain fault is defined₁Index, the calculation formula is as follows:

precision＝TP/(TP+FP)

recall＝TP/(TP+FN)

wherein, TP is the number of samples with correct classification of the fault samples; FP is the number of samples for classifying other category samples into the category faults by mistake, and FN is the number of samples for classifying the category fault samples by mistake.

Examples

The performance of the fault classification method for the multi-layered perceptron based on weakly supervised learning is described below with reference to a specific TE procedure example. The TE process is a standard data set commonly used in the field of fault diagnosis and fault classification, and the whole data set includes 53 process variables, and the process flow thereof is shown in fig. 1. The process consists of 5 operation units, namely a gas-liquid separation tower, a continuous stirring type reaction kettle, a partial condenser, a centrifugal compressor, a reboiler and the like.

9 faults in the TE process are selected, and the specific conditions of the 9 selected faults are given in table 1.

Table 1: TE Process Fault Listing

For the process, 34 variables of 22 process measurement variables and 12 control variables are used as modeling variables, and classification performance is tested on 9 types of fault condition data.

The MLP network consists of a first hidden layer, a BatchNornavigation layer, a Dropout layer, a second hidden layer, a BatchNornavigation layer, a Dropout layer and a Softmax layer which are connected in sequence. The number of input nodes of the MLP network is 34, the number of nodes of two hidden layers is 200 and 100 respectively, the number of nodes of a last Softmax layer is 9, the momentum value of a BatchNormalization layer is set to be 0.5, the loss proportion of nodes of a Dropout layer is 0.5, an Adam optimizer with an initial learning rate of 0.001 is used, the batch size is 110, and the iteration times are 30.

In fig. 2, a comparison of classification effects of an MLP network and a weak supervised learning-based multi-layer perceptron (WS-MLP) model under an F1 index is shown, MLP hidden nodes of the two networks are kept consistent, and by adjusting the label inaccuracy rates of input samples, the labels of samples in proportions of 0%, 10%, 20%, 30%, 40% and 50% are set to be labeled incorrectly, so as to observe the change condition of a classification index F1. The WS-MLP is accurate (namely 0% of sample labels are marked incorrectly) except in sample labels, and has better classification effect than the MLP network in other situations, thereby verifying the improvement of classification performance brought by estimating the label probability transition matrix by the Gaussian mixture model and correcting the MLP network loss function by using the same in the method; meanwhile, the classification performance of the WS-MLP model is close to that of an MLP network under the condition of accurate labels, which shows that the WS-MLP model is not only suitable for inaccurate label samples, but also suitable for fault classification of accurate label samples.

Claims

1. A process data fault classification method of a multilayer perceptron based on weak supervised learning is characterized in that the multilayer perceptron based on the weak supervised learning comprises the following steps: two layers of perceptrons MLP, a Softmax output layer and a Gaussian mixture model GMM. The process data fault classification method specifically comprises the following steps:

Wherein the content of the first and second substances,

in order to not accurately label the data sample,

is a label for the sample to be tested,

The posterior probability of (d).

Step five: according to

step six: collecting new industrial process data of unknown fault category according to the step twoThe method of (1) standardizes the process data to obtain a data set d_stdInputting the data into the WS-MLP network trained in the fifth step, calculating the posterior probability of each fault category corresponding to the sample, and taking the category with the maximum posterior probability as the category of the sample to realize the fault classification of the sample.

2. The fault classification method according to claim 1, wherein step three specifically comprises the steps of:

wherein the content of the first and second substances,

is a representation of the last layer of the MLP network.

3. The fault classification method according to claim 1, wherein the fourth step specifically comprises the steps of:

According to assumptions, it is possible to obtain:

wherein the content of the first and second substances,

and

(4.2) for different classes of sample subsets

Modeling using a gaussian mixture model:

wherein x isⁱRepresentation belonging to a data set

The sample data of (a) is stored in the memory,

to represent

Representing other categories than category i.

Namely, it is

When step (E step) is desired, the Q function is calculated:

where t is the number of iterations.

Calculation model for observed data

Degree of responsibility of

Wherein the content of the first and second substances,

denotes xⁱThe nth sample of (1).

Wherein S is_iTo represent

The number of samples.

Namely, it is

(4.4) according to the formula

Solving to obtain a mixed coefficient

Wherein the content of the first and second substances,

representation estimation matrix

Row i and column k.

4. The fault classification method according to claim 1, wherein in step five, the network second training of the perceptron MLP uses a modified loss function as: