CN111105074A

CN111105074A - Fault prediction method based on improved deep belief learning

Info

Publication number: CN111105074A
Application number: CN201911156833.6A
Authority: CN
Inventors: 杨锦; 朱进源; 王玉峰; 姚会举; 梁超
Original assignee: JIANGNAN ELECTROMECHANICAL DESIGN RESEARCH INSTITUTE
Current assignee: JIANGNAN ELECTROMECHANICAL DESIGN RESEARCH INSTITUTE
Priority date: 2019-11-22
Filing date: 2019-11-22
Publication date: 2020-05-05

Abstract

The invention provides a fault prediction method based on improved deep belief learning, which comprises the following steps of: data acquisition-data preprocessing-test model acquisition-fault prediction-result analysis. Aiming at the problems that the characteristics of the data of the prediction object are difficult to discover, the failure trend is not obvious and the like, the data of the prediction object is deeply mined by utilizing the strong characteristic extraction capability of deep learning, the change information of the failure of the prediction object is learned, the indexes of the residual life or the health degree and the like are used as the learning target of the model, and the dangerous part is discovered before the failure so as to avoid the failure.

Description

Fault prediction method based on improved deep belief learning

Technical Field

The invention relates to a fault prediction method based on improved deep belief learning, belongs to the technical field related to fault prediction, and particularly relates to a fault prediction method based on a neural network.

Background

The fault prediction is one of the hot problems of research in the current colleges and universities and industrial departments, and with the development of scientific technology, the fault prediction gradually replaces fault diagnosis and is changed from after-service to before-service. In addition, the amount of state data acquired at present is larger and larger, and the traditional fault prediction method is difficult to fully utilize the characteristic information in the large data, so that the information expression is incomplete, and the reliability of the prediction result is not high. Although many failure prediction techniques based on machine learning and deep learning methods belong to classification prediction, regression prediction is few, and in many regression prediction methods, it is difficult to process a large amount of data using simple regression prediction, and the output range of a prediction result is limited due to the range of a nonlinear activation function in regression prediction using deep learning.

Disclosure of Invention

In order to solve the technical problems, the invention provides a fault prediction method based on improved deep belief learning, which is improved on the basis of a DBN model, an output layer of the model adopts an extreme learning mode without adding a nonlinear activation function, the output range of the model is not limited, and credible output can be obtained during prediction.

The invention is realized by the following technical scheme.

The invention provides a fault prediction method based on improved deep belief learning, which comprises the following steps of:

① obtaining data, obtaining structured data and learning models that reflect changes in the state of the predicted object;

②, preprocessing data, namely defining a label for the structured data and acquiring label data, preprocessing the structured data through characteristic engineering and acquiring characteristic data;

③ obtaining a test model by setting a prediction error value and an error function, initializing a learning model and inputting characteristic data to train the initialized learning model, obtaining a fine error model according to the prediction error value and the trained learning model, obtaining an error function value according to the error function and the trained learning model, and obtaining the test model according to the fine error model and the error function value;

④ failure prediction, namely acquiring a failure threshold value through a test model, and acquiring prediction information according to the failure threshold value;

⑤ analyzing the result, and counting the prediction accuracy according to the test information.

The step ② is divided into the following steps:

(2.1) defining a label for the structured data, and acquiring label data;

(2.2) preprocessing the structured data;

(2.3) extracting characteristic parameters from the structured data;

(2.4) dividing the characteristic parameters to obtain a data set;

and (2.4) vectorizing and normalizing the data set to obtain characteristic data.

In the step (2.4), if the data set exceeds five thousand, the data set is divided into a training set, a verification set and a test set, and if the data set does not exceed five thousand, the data set is divided through k-fold cross verification.

The step ③ is divided into the following steps:

(3.1) setting a prediction error value and initializing a learning model;

(3.2) carrying out greedy unsupervised pre-learning layer by layer on the learning model according to the characteristic data to obtain a coarse error model;

(3.3) obtaining a fine error model through the coarse error model and the label data;

(3.4) acquiring the output of the learning model through the characteristic data and the fine error model; (3.5) acquiring an error function value through the output of the learning model and the label data, and finishing learning if the error function value is smaller than the set prediction error value; otherwise, the learning model corrects the learning model based on an error back propagation method until the error function value is minimum;

and (3.6) obtaining a test model according to the fine error model and the minimum error function value.

In step ①, the structured data is data with each attribute having the same format.

In step ②, the label indicates that the structured data reflects the state representation of the predicted object.

In the step (3.3), a matrix equation is solved through the coarse error model and the label data together, and the matrix equation is solved into the hyper-parameters of the fine error model.

The matrix equation is:

Hβ＝Y；

the H represents the output matrix of the coarse error model, Y represents the tag data matrix, and β is the minimum norm least squares solution.

The invention has the beneficial effects that: aiming at the problems that the characteristics of the predicted object data are difficult to discover and the failure trend is not obvious, the predicted object data are deeply mined by utilizing the strong characteristic extraction capability of deep learning, the change information of the failure of the predicted object is learned, indexes such as the residual life or the health degree of the predicted object are used as the learning target of the model, and dangerous parts are discovered before the failure so as to avoid the failure.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the present invention for obtaining a test model.

Detailed Description

The technical solution of the present invention is further described below, but the scope of the claimed invention is not limited to the described.

As shown in fig. 1 and fig. 2, a failure prediction method based on improved deep belief learning includes the following steps:

The step ② is divided into the following steps:

(2.1) defining a label for the structured data, and acquiring label data;

(2.2) preprocessing the structured data;

(2.3) extracting characteristic parameters from the structured data;

(2.4) dividing the characteristic parameters to obtain a data set;

The step ③ is divided into the following steps:

(3.1) setting a prediction error value and initializing a learning model;

The matrix equation is:

Hβ＝Y；

Furthermore, the invention can effectively solve the problems of feature extraction of a large amount of data and limited range of prediction results by utilizing a prediction method combining a deep neural network and linear regression.

Example 1

As mentioned above, the fault prediction method based on the improved deep belief learning comprises the following implementation steps:

step 1, data acquisition: acquiring structured data capable of reflecting state changes of a predicted object by any method;

step 2, data preprocessing: after the structured data in the step 1 is obtained, defining labels for the data, preprocessing the data by using feature engineering, extracting feature parameters from the processed data, dividing the data, if the data volume is too much, dividing the data into a training set, a verification set and a test set, if the data volume is too little, dividing the data by using k-fold cross verification, and then vectorizing and normalizing the data;

step 3, obtaining a test model: firstly defining a prediction error value and initializing a learning model, wherein input and output layers of the model are respectively determined by training data and label data, then carrying out non-supervised pre-learning of a greedy type layer by layer on partial layers of the model by utilizing the training data to obtain a learning model with a coarse error, solving a matrix equation by utilizing the output of the coarse error model and the label data together, taking the obtained solution as a hyper-parameter of the other part of the learning model, and finally outputting the learning model as a linear expression of the output of the coarse error model and the hyper-parameter of the other part.

And secondly, defining an error function related to the learning model output and the label, if the error function value is lower than a set prediction error value, indicating that the learning is finished, otherwise, correcting the learning model by using an error back propagation method until the error function is minimum.

Finally, the generalization ability of the learning model is verified on the verification set, and if the generalization ability is good, the performance of the test set data test model can be utilized to further learn the label corresponding to the test set;

step 4, failure prediction: when a mapping relation between the remaining life or the health degree and the data label is established, fault threshold value division is carried out on the data label, after the label of the trained learning model on the test set is obtained, prediction information of the label is defined according to the divided fault threshold value, and if the fault threshold value is exceeded, a fault is represented;

step 5, result analysis: and 4, after the prediction information in the step 4 is obtained, comparing the prediction information with the original labels on the test set or the actual application one by one to evaluate the prediction accuracy of the model, representing the prediction result of the receivable confidence in a given confidence interval, and if not, not collecting the confidence and counting the prediction accuracy of the model.

The structured data in step 1 is data with the same format and length, and is also called quantitative data by row attributes (characteristics) and column number.

The label in the step 2 represents that the characteristic data reflects the state expression of the predicted object, wherein the state is a residual life label, a characteristic degradation data label or a health label of the predicted object.

The characteristic engineering in the step 2 is a general term of a series of preprocessing methods for data, and includes but is not limited to data cleaning, missing value identification and filling, and principal component analysis dimension reduction.

The characteristic parameter extraction in the step 2 is to select data which can generate large influence on the prediction result according to the indexes of the sensitivity, the relevance and the like of the data, so that the overfitting of the model caused by data redundancy is avoided.

The overfitting indicates that the model performed well on the training set samples and poorly on the validation set and test set.

In the step 2, the k-fold cross validation is to divide the original data set into k subsets, use k-2 subsets as a training set, use the remaining two subsets as a validation set and a test set respectively, repeat the k times, calculate the model error (including the training error, the validation error and the test error) of k times, and use the error as the real error of the learning model.

The initializing of the learning model in the step 3 includes initial setting of a model topology structure and initial setting of relevant parameters in the model, including learning times, learning rate, interlayer connection weight matrix, bias of each node, and the like of the learning model, and the weight matrix and the bias are also called as hyper-parameters.

The topological structure is a model structure and comprises an input layer, a plurality of middle hidden layers, an output layer and the number of nodes in each layer, wherein the number of model layers is also called depth.

The partial layers of the model in said step 3 represent the part of the learning model from the input layer to the last hidden layer.

The step 3 of the non-supervision pre-learning of the greedy type layer by layer is to start from the input layer, regard the input layer and the first hidden layer as a small model and learn by using the training data without labels, which is also called encoding, and after the first hidden layer is well learned, regard the first hidden layer and the second hidden layer as a small model and learn in the same way until learning the last hidden layer.

The small model is a Restricted Boltzmann Machine (RBM), and the visible layer and the hidden layer of the small model are connected in a bidirectional mode.

In step 3, the matrix equation is H β ═ Y, where H denotes the output matrix of the coarse error model, Y denotes the label matrix, and β is required to be the minimum norm least squares solution of the matrix equation.

Another part of the model in said step 3 represents the part between the last hidden layer to the output layer.

The hyper-parameters in the step 3 are the general names of the connection weight matrix between layers and the bias of each node in the learning model;

the linear expression in the step 3 is

Where H denotes the output matrix of the coarse error model,

the matrix equation H β is expressed as a minimum norm least squares solution of Y, and O represents the expression of the label data by the learning model output layer, also referred to as the actual output of the learning model.

The error function in the step 3 represents a function for measuring the distance between the actual output O of the learning model and the label Y, and the formula is

Where m denotes the characteristic dimension of the tag matrix and n denotes the number of samples of the tag matrix.

In the step 3, an Error Back Propagation (BP) method is an influence of the Error function L on each hyper-parameter in the learning model, that is, a partial derivative of each hyper-parameter in the learning model with respect to L is solved, which is a chain derivation rule of a complex function, the hyper-parameter is adjusted together with parameters such as the learning rate in the learning model according to the influence, and the hyper-parameter is learned again, and the BP method is repeatedly adopted until the Error of the learning model meets the requirement or the learning times are used up.

The generalization ability in the step 3 refers to the expression ability of the learning model on new data, and the better the expression ability is, which indicates that the stronger the generalization ability is, the weaker the generalization ability of the over-fitted learning model is.

The fault threshold division in the step 4 is to define the variation or available range of the parameter according to the specific situation and related knowledge of the predicted object, wherein two thresholds (delta) are used₁，δ₂) Dividing three intervals to respectively represent no fault, general fault and fault, and labeling the value y not more than delta₁Indicates no fault, delta₁＜y＜δ₂Denotes a general failure, y ≧ delta₂Indicating a fault.

The confidence interval in step 5 is a range defining which standard tag value the actual output value of the learning model belongs to.

And 5, the prediction accuracy in the step 5 refers to the matching degree of the membership label data, actually output by the learning model in the confidence interval in the test set or the actual application, and the original label, and the confidence is collected and the prediction result is obtained if the membership label data is matched.

Example 2

step 1, data acquisition: acquiring structured data capable of reflecting state changes of a predicted object by any method; if the data is not unstructured data, the data should be processed into structured data, and the data capable of reflecting state changes includes but is not limited to vibration, force, electromagnetism, temperature, current, voltage, level, flow, displacement, and the like;

step 2, data preprocessing: because the supervised learning mode is adopted for training, after the data are obtained, the label of the data is defined, namely the target output of the model is determined; under the condition of containing multi-dimensional characteristic data, one-dimensional or several-dimensional characteristics can be used as a data label, or indexes such as residual service life or health degree of a predicted object which can be reflected by characteristic data are defined by domain experts, and the final prediction accuracy of a learning model is greatly influenced by the quality of training data;

further, after the label data exist, the characteristic data are further analyzed and processed, firstly, abnormal values in the characteristic data are removed, whether missing values exist in the data or not is judged, when the missing values exist, the missing values can be deleted when the data quantity is sufficient, and when the data quantity is insufficient, the missing parts can be filled with 0 or an average value;

furthermore, checking correlation coefficients, multiple collinearity, variance and the like among all attributes in the data to select good attributes as characteristics, improving the data quality, further, when the data amount is small, dividing an original data set into k subsets by using k-fold cross validation, taking k-2 subsets as a training set, and taking the remaining two subsets as a validation set and a test set respectively, and repeating for k times; thirdly, vectorizing and normalizing the data characteristics to ensure that the model correctly identifies the characteristic data and the importance of each characteristic;

step 3, obtaining a test model: referring to fig. 2, after training data (feature data and label data) is prepared, the data is learned, firstly, learning error requirements of a model are defined to evaluate the quality of the model, then, learning iteration times of another stage of pre-training and fine-tuning in the model are set, and the structure of the model is set. However, in order to reduce the time of the model adjustment stage as much as possible, the intermediate hidden layer and the number of nodes are determined according to the empirical formulas. Further, setting relevant parameters in the model specifically includes: batch learning sample number, super-parameter updating momentum parameter, super-parameter expansion rate, initial learning rate of pre-learning and fine-tuning, learning rate attenuation coefficient and layer node random zero setting threshold;

further, the invention sets the learning rate to be reduced along with the increase of the iteration times, namely, the learning rate is initial learning rate multiplied by the learning rate attenuation coefficient, the characteristic data is used for pre-training the super-parameters of all layers except the output layer in the learning model after the learning rate is ready, the super-parameters in each RBM are set to zero, the batch multiple thought is adopted, the 1-step contrastive divergence method is used for pre-training layer by layer, then the trained super-parameters are assigned to the learning model, forward propagation is carried out according to the general neural network mode until the super-parameters are propagated to the last hidden layer, the nonlinear activation function of each layer is a sigmoid function, the data matrix of the layer is recorded as H, a label matrix Y is introduced, and the Moore-Penrose generalized inverse matrix of the H is used for solving the matrix equation H β which is the minimum norm of the Y

And is assigned as the connection weight between the last hidden layer and the output layer without the need of the output layerTo activate nonlinearly and bias in the layers, the output matrix of the learning model is then

And calculating the error L between the output O and the output Y of the model by using the label data matrix Y again, calculating the gradient of the error to the hyper-parameter after obtaining the error of one batch due to a batch multiple training mode adopted by the learning model, further executing a BP algorithm, and updating the hyper-parameter according to the gradient, wherein the updating process of the hyper-parameter is as follows: the hyper-parameter (hyper-parameter expansion rate × hyper-parameter) (hyper-parameter update momentum parameter × hyper-parameter update change rate + fine-tuned initial learning rate × hyper-parameter gradient).

Specifically, forward propagation of the rest batches is carried out by using the updated model, the error after one iteration is expressed as the average error of the errors of multiple batches, the iteration is continued after one batch iteration is finished, and if the error does not meet the requirement, the learning model continues to carry out iterative adjustment until the iteration is finished; if the error is in a descending trend under the condition that the error is not satisfied, the iteration times are increased to ensure that the learning model is adjusted in a ready state, and if necessary, partial parameters are modified to reduce the iteration times; if the error tends to be stable under the condition that the error is not satisfied, the structure and related parameters of the model need to be changed, including changing the depth and the number of nodes of the model and changing the parameters of the model.

Further, if the error meets the requirement, jumping out of the iteration loop to stop learning, further entering a verification stage, using a verification set as a data set of the trained model, verifying the generalization capability of the model on the verification set, if the verification error does not meet the requirement, reusing the test set data to train the model, if the verification error meets the requirement, using the test set data to test the model, and using the actual output on the test set as the input of fault prediction;

step 4, failure prediction: the essence of labeling the data is to establish a mapping relation between the data and fault information capable of reflecting the data, and when the fault information is not obvious, future data can be used as a label of historical data;

further, when the fault state is judged, a fault threshold judgment method is adopted, namely the change or the available range of the parameters is defined according to the specific condition and the related knowledge of the predicted object, and the divided threshold defines the data state reflected by the fault information;

preferably, the present invention divides two thresholds, delta₁、δ₂，δ₁Definition, delta, representing no fault and general fault₂Definition of general fault and fault, label value y is less than or equal to delta₁Indicates no fault, delta₁＜y＜δ₂Denotes a general failure, y ≧ delta₂Representing faults, taking each output value of the learning model on the test set as o, judging the interval corresponding to o according to the principle, and further obtaining a prediction result;

step 5, result analysis: the output value of the learning model is generally quantitative, and although the accuracy of fault prediction can be judged under the threshold judgment method, the accuracy of model prediction is difficult to compare with the original label when the accuracy of the model prediction is evaluated due to the quantitative value;

furthermore, a small number epsilon is given, if y falls into an interval [ o-epsilon, o + epsilon ], the predicted value o is consistent with y, the prediction is considered to be accurate, o at the moment of confidence acquisition is recorded as 1, otherwise, the predicted value is recorded as 0, and the actual prediction accuracy of the model is as follows:

in conclusion, the invention adopts the deep neural network to extract the characteristics, has a certain inhibition effect on data noise, is easy to realize in engineering, can be widely applied to the fault prediction work of objects such as machinery, electronics, electromechanics, hydraulic pressure and the like, and in a further scheme, the invention can ensure the range of the model predicted value under the action that the output layer does not adopt the nonlinear activation function, thereby improving the performance of the model in fault prediction.

Claims

1. A fault prediction method based on improved deep belief learning is characterized by comprising the following steps: the method comprises the following steps:

2. The method of claim 1, wherein the step ② is divided into the following steps:

(2.1) defining a label for the structured data, and acquiring label data;

(2.2) preprocessing the structured data;

(2.3) extracting characteristic parameters from the structured data;

(2.4) dividing the characteristic parameters to obtain a data set;

3. The failure prediction method based on improved deep belief learning as claimed in claim 2, characterized in that: in the step (2.4), if the data set exceeds five thousand, the data set is divided into a training set, a verification set and a test set, and if the data set does not exceed five thousand, the data set is divided through k-fold cross verification.

4. The method of claim 3, wherein the step ③ is divided into the following steps:

(3.1) setting a prediction error value and initializing a learning model;

5. The method for predicting faults based on improved deep belief learning as claimed in claim 1, wherein the structured data is data with each attribute having the same format in step ①.

6. The method for fault prediction based on improved deep belief learning as set forth in claim 1, wherein the step ② is characterized in that the label representation structured data reflects a state expression of the predicted object.

7. The improved deep belief learning-based failure prediction method of claim 4, characterized by: in the step (3.3), a matrix equation is solved through the coarse error model and the label data together, and the matrix equation is solved into the hyper-parameters of the fine error model.

8. The improved deep belief learning-based failure prediction method of claim 7, characterized by: the matrix equation is:

Hβ＝Y；