CN117521512A

CN117521512A - Bearing residual service life prediction method based on multi-scale Bayesian convolution transducer model

Info

Publication number: CN117521512A
Application number: CN202311568493.4A
Authority: CN
Inventors: 姜斌; 彭华超; 冒泽慧
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-11-22
Filing date: 2023-11-22
Publication date: 2024-02-06

Abstract

The invention discloses a bearing residual service life prediction method based on a multi-scale Bayesian convolution transducer model, which comprises the following steps: carrying out health state division and normalization data preprocessing on the state monitoring data of the whole life cycle of the bearing, and constructing a training set and a testing set; constructing a multi-scale Bayesian convolution transducer model for residual service life prediction; training the model by a back propagation algorithm taking uncertainty into account to obtain a trained model; inputting the test set into the trained model to obtain a predicted value and probability distribution of the residual service life of the bearing; calculating a confidence interval with the confidence of the residual service life prediction result being 95%; visualizing the kernel distribution of the residual service life prediction probability distribution; and calculating the excessive prediction rate and uncertainty estimation value of the residual service life prediction result to evaluate the credibility of the prediction result. The method can effectively improve the reliability, accuracy and robustness of bearing residual life prediction.

Description

Bearing residual service life prediction method based on multi-scale Bayesian convolution transducer model

Technical Field

The invention belongs to the technical field of prediction of residual service life of bearings, and relates to a method for predicting residual service life of bearings based on a multi-scale Bayes convolution transducer model.

Background

Bearings are key mechanical components that are widely used in industrial equipment, especially in rotating machinery. The service performance of the bearing will inevitably drop until complete failure due to long-term operation in severe environments, which will lead to industrial equipment shutdown and considerable economic loss and casualties. The residual service life prediction is one of prediction and health management technologies, and can predict the residual normal working time of equipment before complete failure occurs in advance, so that cost-effective maintenance decision can be made in advance to improve the running stability and reliability of the equipment. Therefore, it is important to develop a reliable method for predicting the remaining useful life of a bearing.

With the continuous progress of industry-related data transmission and computing technologies, a large number of industrial equipment status monitoring signals are collected, which has led to a widespread development of data-driven-based residual life prediction methods. The method can learn potential degradation characteristics from the full life cycle state monitoring data of the mechanical equipment, and establish a nonlinear mapping relation between the degradation characteristics and the residual service life value, so as to realize the prediction of the residual service life. The data-driven based remaining life prediction method generally includes two types, i.e., a shallow machine learning method and a deep learning method. The shallow machine learning method comprises a support vector machine, a correlation vector machine, a hidden Markov model and the like. However, these methods only learn shallow features, resulting in poor prediction performance of remaining service life, and require complex manual feature engineering, which is very time consuming and laborious. With the rapid development of deep learning techniques with powerful presentation learning capabilities, deep learning models such as recurrent neural networks, convolutional neural networks, and their variants such as long and short term memory and gated recurrent units have become more effective residual life prediction techniques. The deep learning model can directly learn the deep features from the original full life state monitoring data to predict the end-to-end residual service life without any artificial feature engineering. These deep learning methods achieve good predictive performance in terms of residual life prediction. However, due to gradient elimination and explosion problems of recurrent neural networks, the receptive field of convolutional neural networks is limited, resulting in insufficient long-distance modeling capabilities of these recurrent neural networks, convolutional neural networks, and variant models thereof, only limited residual life prediction performance can be obtained.

In addition, in an industrial scenario, a great deal of measurement uncertainty such as random noise, signal interference and the like is hidden in the bearing state monitoring data, which can lead to inaccurate residual service life prediction results and poor reliability. Moreover, the acquisition of the state monitoring data of the whole life cycle of the bearing is very expensive, and a great amount of time and labor are required, so that enough state monitoring data are difficult to acquire to train the deep learning model, the uncertainty of the deep learning model is further brought, and the problems of over fitting and poor generalization capability of the prediction model exist. These different types of uncertainties may together lead to deviations of the remaining life predictions from the true values, resulting in unreliable corresponding maintenance decisions. Although these existing deep learning models achieve a certain prediction performance in terms of residual service life prediction, due to the network structure which is usually deterministic, only single-point estimation can be provided, but uncertainty of residual service life prediction cannot be handled, which results in low prediction accuracy and reliability, and a residual service life prediction result which is too confident is obtained, so that unreliable maintenance decisions can be generated, and the reliability and safety of equipment operation are affected. Therefore, it is of great significance to study a residual life prediction method that can handle uncertainty and provide residual life prediction results with higher accuracy and reliability.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a bearing residual service life prediction method based on a multi-scale Bayesian convolution transducer model. The method combines a convolutional neural network with strong long-distance modeling capability and a convolutional neural network with strong local modeling capability to obtain a multi-scale convolutional transducer model with multi-scale long-term and short-term modeling capability. In order to deal with uncertainty, the multi-scale convolution transducer model is expanded into a Bayes deep learning framework to establish a multi-scale Bayes convolution transducer model applied to the prediction of the residual service life of the bearing, and the multi-scale convolution transducer model can extract more complete and accurate multi-scale degradation characteristics and uncertainty information. In addition, the excessive prediction error and uncertainty are considered in the training process of the multi-scale Bayesian convolution transducer model, so that the robustness, generalization performance and excessive prediction risk avoidance capability of the residual service life prediction of the model can be improved. The method for predicting the residual service life of the bearing based on the multi-scale Bayesian convolution transducer model can effectively treat various uncertainties and provide more accurate, more reliable and excessively high prediction risk avoiding residual service life prediction results.

The technical scheme of the invention is as follows.

A bearing residual service life prediction method based on a multi-scale Bayesian convolution transducer model comprises the following steps:

step 1, collecting state monitoring data of the whole life cycle of a bearing, carrying out health state division and normalization data preprocessing, and dividing the data into a training set and a testing set;

step 2, constructing a multi-scale Bayes convolution transducer model and initializing;

step 3, inputting the training set sample D into a multi-scale Bayesian convolution transducer model, and obtaining an approximate variation distribution Q of a true posterior distribution P (W|D) of a parameter W of the multi-scale Bayesian convolution transducer model by a Monte Carlo sampling method _θ Sampling model parameters W in (W), calculating and storing the predicted value of the residual service life of the bearingCorresponding regression loss->Distribution loss->

Step 4, repeatedly executing the step 3 until the maximum Monte Carlo sampling times N are reached _s ＝10 ³ Obtaining the probability distribution of the residual service life prediction of each training sample, and taking the average value of all the predicted values of each training sample as the final residual service life predicted value

Step 5, calculating average regression loss, average distribution loss and uncertainty loss And adding to obtain the overall loss value->Updating the approximate variation distribution Q through a back propagation algorithm _θ Parameter θ of (W) to minimize the overall loss value +.>

Step 6, repeatedly executing the steps 3 to 5 until the maximum training round E is reached _max =500, save the optimal multi-scale bayesian convolution transducer model;

step 7, inputting the test set sample into an optimal multi-scale Bayesian convolution transducer model, and calculating and storing a predicted value of the residual service life of the bearing;

step 8, repeatedly executing the step 7 until the maximum predicted times P are reached _max ＝10 ⁴ Further, the probability distribution of the residual service life prediction of each test sample is obtained for subsequent reliability analysis, and the reliability of the residual service life prediction result is further determined; calculating a confidence interval with the confidence of the residual service life prediction result being 95%;

step 9, carrying out reliability analysis considering uncertainty on the prediction results of the test set, wherein the reliability analysis comprises nuclear distribution visualization of residual service life prediction probability distribution, an excessive prediction rate MOP and an uncertainty estimation value MCE so as to evaluate the reliability degree of the prediction results;

in the step 1, the data preprocessing is performed on the obtained bearing state monitoring data, and specifically includes:

1.1 Dividing the whole life cycle data of the bearing into different health phases, namely a normal operation phase and a rapid degradation phase according to the root mean square value of the original data;

1.2 According to different health stages, calculating residual service life values corresponding to the state monitoring data at different moments as labels; the remaining life value is calculated as follows:

wherein RUL (T) represents the residual service life value of the bearing at the moment T, T _total Indicating the total life length of the bearing, T _c Indicating the start time of the rapid degradation phase;

1.3 Normalized remaining useful life value to [0,1 ] by a min-max normalization formula]The interval, the minimum-maximum normalization formula is:wherein Y is the sample data to be normalized, Y _min And Y _max Respectively minimum value and maximum value of sequence data Y _N The final normalized sample data;

1.4 Sliding window method is adopted, the sliding window with the length of 2560 is slid from the beginning to the end of the bearing state monitoring data to realize the segmentation of the state monitoring data, a sample is generated by taking the sliding window as a unit, the residual service life value corresponding to the last state monitoring data of each sliding window is used as the label value of the sample, and finally a sample set which can be used for training and testing of the deep learning model is generated.

Specifically, in step 2, the multi-scale bayesian convolution transducer model includes a bayesian convolution entry embedding layer, a multi-scale feature extractor and a bayesian regression predictor, and the structure and construction process thereof include:

firstly, input sample data is passed through Bayes convolution vocabulary entry embedding layer to generate vocabulary entry sequence data X _t Then through position encoder E _pos To X _t Adding position information to obtain data X _pos After which X is _pos Inputting the residual service life prediction values into a multi-scale feature extractor, and finally inputting the extracted information into a Bayesian regression predictor to obtain residual service life prediction values

The Bayesian convolution entry embedding layer comprises a Bayesian expansion causal convolution neural network, a maximum pooling layer and a Bayesian linear layer, and can map an input sample into an entry sequence without losing uncertainty informationWhere l represents the number of entries, d _model Representing an embedding dimension;

the multi-scale feature extractor comprises a multi-scale Bayes convolution sparse self-attention module, residual connection and layer normalization, a Bayes time convolution neural network and a full connection layer; input data X _pos Firstly, inputting the Head consisting of different Bayes convolution sparse self-attentions into a multi-scale Bayes convolution sparse self-attentions module _i I=1, 2, 6, degradation features and uncertainty information can be extracted from the data at different time scales, then the extracted information at different time scales is fused by stitching and bayesian linear layers, and an output P is obtained ₁ Which is in accordance with the initial input data X _pos Adding and performing layer normalization to obtain Output ₁ Afterwards, output is carried out again ₁ Is input into a Bayesian time convolution neural network to further enhance the local characteristics and time sequenceExtracting features and obtaining an output P ₂ Inputting the mixture into a full connection layer to obtain an output F, and finally mixing F with P ₂ Adding and carrying out layer normalization to obtain a final Output value Output ₂ ；

The Bayesian regression predictor comprises three Bayesian linear layers, wherein each layer is connected with a LeakyReLU activation layer and is used for establishing a nonlinear probability mapping relation between information extracted by the multi-scale feature extractor and a true residual service life value.

Further, the multi-scale Bayesian convolution sparse self-attention module comprises six Head heads consisting of six Bayesian convolution sparse self-attention with different convolution kernel sizes _i I=1, 2, 6 and bayesian linear layers; the method can extract multi-scale degradation characteristics and uncertainty information from different time scales for input data; the six head extracted degradation characteristics with different time scales and uncertainty information are spliced together, and then are fused through a Bayesian linear layer to finally obtain an output value P ₁ The formula of the whole process is as follows:

P ₁ ＝PReLU(Concat(Head ₁ ,Head ₂ ,...,Head ₆ ))W ₂ +b ₂

wherein W is ₂ And b ₂ Network weight variables and bias variables, which are Bayesian linear layers, are random variables and are approximated by a variational distribution Q of the true posterior distribution P (W|D) of a multiscale Bayesian convolution transducer model parameter W _θ Sampling in (W), determining the value, head _i I=1, 2,..6 represents the i-th bayesian convolution sparse self-attention;

the Bayesian convolution sparse self-attention comprises a Bayesian expansion causal convolution neural network, query Q sparseness measurement and Bayesian self-attention; first, a Bayesian dilation causal convolutional neural network maps the local context of input data segment by segment into a query vector Q, a key vectorK and a value vector V, followed by a probability distribution of attention of the ith queryKullback-Leibler divergence with uniformly distributed U (a, b) as a sparsity measure of query vector Q, mu dominant queries are screened out to form a new query vector +.>Finally, the Bayesian self-attention is calculatedWherein q is _i ,k _i ,v _i Row i, PReLU () representing the nonlinear activation function, W, representing Q, K, V, respectively ₁ And b ₁ Network weight variables and bias variables representing bayesian linear layers; the Bayesian convolution sparse self-attention can integrate local context information of input data into global attention calculation so as to realize the simultaneous extraction of long-term and short-term degradation characteristics and uncertainty information, and can inhibit the influence of high uncertainty data on characteristic extraction.

Further, the bayesian time convolution neural network comprises two parallel network channels: one network channel is formed by connecting two groups of Bayesian expansion causal convolutional neural networks, a weight normalization layer, a ReLU activation layer and a discarding layer in series, and the other network channel is a single-layer Bayesian expansion causal convolutional neural network; the outputs of the two network channels are added as the output value P of the Bayesian time convolution neural network ₂ 。

Further, the Bayesian dilation causal convolutional neural network comprises three parts, namely convolution, dilation and causal; the convolution part is used for sliding six convolution kernels on input data in parallel and carrying out convolution calculation; the expansion means that when the convolution calculation is performed, the input data is sampled at intervals according to the expansion rate, namely, only part of the data is subjected to the convolution calculation; the causality refers to that only historical data before the moment t is considered when the convolution calculation is carried out on the data at the moment t, so that future data leakage is prevented; weight variable and bias of this networkThe variables are distributed from approximate variation Q _θ (W) sampling to determine a value.

Specifically, the regression loss described in step 3Piecewise weighted loss function to account for excessive prediction errorWherein n is the number of samples, " >y _i Representing the true remaining life value of the ith sample,/->Is the predicted value of the s-th residual service life of the i-th sample, and the parameter gamma ₁ And gamma ₂ Weights respectively representing an excessively high prediction error and an excessively low prediction error; parameter gamma ₁ Reflecting the suppression degree of the excessive prediction error, setting gamma ₁ =a, a∈r and a > 1, γ ₂ =1. Said distribution loss->Approximate variation distribution Q for optimizing model parameters W _θ Kullback-Leibler divergence KL (Q) between (W) and its true posterior distribution P (w|d) _θ (W) P? w|d)); KL (Q) _θ An approximate solution of (W) ||p (w|d)) as a distribution loss value +.>I.e. < ->Wherein W is _s Representing the distribution Q of variation from approximation _θ Model parameters of the s-th Monte Carlo sample in (W), θ is a parameter approximating the variation distribution.

Specifically, in step 5, the uncertainty is lostPredicting the sum of the variance and covariance of the distribution for the remaining useful life, while ensuring +.>Still active near 0, will +.>Defined as an exponential function based on a natural number eWhere N represents the number of samples, N _s Represents the number of Monte Carlo samples, +.>The s-th predicted value representing the i-th sample,/->For the average predictive value of the ith sample, Λ _sm A covariance matrix representing the true residual life values and residual life predictions for all samples, where the residual life predictions are N for each sample _s Average of the repeated predictions; finally, the regression loss, the mean value of the distribution loss and the uncertainty loss value are calculated and added according to different weight values as the total loss value +.>I.e. < ->Wherein lambda is a weight parameter and is set to 0.01.

Specifically, in step 9, the too high prediction rate MOP is defined asWherein the method comprises the steps ofUncertainty estimate MCE is defined as +.>

Compared with the prior art, the invention has the following advantages and beneficial effects:

(1) The multi-scale Bayesian convolution transducer model is constructed, so that multi-scale long-term and short-term modeling capability and uncertainty processing capability are obtained, more accurate and rich multi-scale degradation characteristics and uncertainty information can be extracted from bearing state monitoring data, and the integrity of information extraction is ensured; compared with the existing deterministic deep learning model capable of only providing single-point residual service life predicted values, the multi-scale Bayesian convolution transducer model provided by the invention can carry out uncertain representation on all model parameters through probability distribution, so that the prediction uncertainty caused by random noise and limited state monitoring data is effectively processed, the generalization performance of the model is improved, and the residual service life predicted results of more accurate, more reliable and excessively high prediction risk avoidance are obtained.

(2) The invention constructs the Bayesian convolution sparse self-attention module, which can fully integrate local context information into global dependency modeling so as to realize the simultaneous extraction of long-term and short-term degradation characteristics and uncertainty information and inhibit the influence of high uncertainty data on characteristic extraction; in addition, through parallel multiple different Bayesian convolution sparse self-attentions, multi-scale degradation characteristics and uncertainty information can be extracted from different time scales, and fusion of the different time scale information is carried out, so that the integrity of information representation is ensured, and the adaptability of the deep learning model and the robustness of residual service life prediction are improved.

(3) The invention constructs a new loss function integrating regression loss, distribution loss and uncertainty loss; then, based on variational reasoning, a new back propagation training algorithm is provided, and the algorithm considers not only residual service life prediction regression errors but also uncertainty information and excessive prediction errors in the multi-scale Bayesian convolution transducer model training process, so that suppression of uncertainty and excessive prediction is realized, and robustness, generalization and excessive prediction risk avoidance capability of residual service life prediction are improved.

(4) The invention constructs a credibility analysis method considering uncertainty, which comprises the following steps of nuclear distribution visualization of residual service life prediction distribution, over-high prediction rate and uncertainty estimation value; by the method, the credibility of the residual service life prediction result can be effectively estimated.

Drawings

FIG. 1 is a flow chart of a method for predicting the residual service life of a bearing based on a multi-scale Bayesian convolution transducer model.

FIG. 2 is a block diagram of a multi-scale Bayesian convolution transducer model.

FIG. 3 is a block diagram of a multi-scale Bayesian convolution sparse self-attention module.

Fig. 4 is a bayesian convolution sparse self-attention structure diagram.

Fig. 5 is a bayesian time convolutional neural network.

Fig. 6 is a bayesian regression predictor.

Fig. 7 is a proctisia laboratory bench for collecting bearing condition monitoring data.

Fig. 8 (a), (b) and (c) are the prediction results of the remaining service life of the test bearing corresponding to the method, the bayesian multi-scale convolutional neural network, the bayesian long-term memory network and the bayesian gating cycle unit under three different noise conditions.

Fig. 9 (a) and (b) show a kernel distribution visualization of the remaining life prediction probability distribution corresponding to the inventive method and bayesian multi-scale convolutional neural network, respectively.

Detailed Description

According to the bearing residual service life prediction method based on the multi-scale Bayesian convolution transducer model, firstly, in the aspect of a neural network model, a convolution neural network and the transducer model are fused together and expanded into a Bayesian deep learning framework, and a reliable residual service life uncertainty prediction method with uncertainty quantification based on the multi-scale Bayesian convolution transducer model is established, wherein a model core module is a multi-scale Bayesian convolution sparse self-attention mechanism and a Bayesian time convolution network. Complex multi-scale degradation features and uncertainty information in the data can be extracted from both global and local and from different time scales by this model. In order to improve the robustness, generalization and risk avoidance capability of the model, a back propagation algorithm which considers uncertainty and excessive prediction errors in the training process is provided on the model training method. The method solves the problems of poor generalization performance, low reliability and low prediction precision of the prediction model caused by complicated random noise and uncertainty brought by insufficient state monitoring data, and improves the reliability, accuracy and robustness of the prediction of the residual service life of the bearing.

The invention is further described below with reference to the accompanying drawings.

The embodiment of the invention provides a bearing residual service life prediction method based on a multi-scale Bayesian convolution transducer model, which is shown in a figure 1, and comprises the following steps:

the state monitoring data of the whole life cycle of the bearing is vibration signals in the vertical direction and the horizontal direction of the bearing.

Specifically, the health state division and normalization data preprocessing are carried out on the obtained bearing full life cycle state monitoring data, and the method specifically comprises the following steps:

1.2 Calculating residual service life values corresponding to the state monitoring data at different moments according to different health stages, and taking the residual service life values as labels; the remaining life value is calculated as follows:

1.3 Normalized remaining useful life value to [0,1 ] by a min-max normalization formula]The interval, the minimum-maximum normalization formula is:wherein Y is the sample data to be normalized, Y _min And Y _max Respectively the minimum value and the maximum value of the sample data Y _N The final normalized sequence data;

1.4 Sliding window method is adopted, the sliding window with the length of 2560 is slid from the beginning to the end of the bearing state monitoring data to realize the segmentation of the state monitoring data, a sample is generated by taking the sliding window as a unit, the residual service life value corresponding to the last group of state monitoring data of each sliding window is used as the label value of the sample, and finally a sample set which can be used for training and testing of the deep learning model is generated.

Step 2: constructing a multi-scale Bayes convolution transducer model and initializing;

as shown in fig. 2, the multi-scale bayesian convolution transducer model includes a bayesian convolution entry embedding layer, a multi-scale feature extractor and a bayesian regression predictor, and the specific structure and construction process thereof are as follows:

firstly, input sample data is passed through Bayes convolution vocabulary entry embedding layer to generate vocabulary entry sequence data X _t Then through position encoder E _pos To X _t Adding position information to obtain data X _pos After which X is _pos Inputting the information into a multi-scale feature extractor, and finally inputting the extracted information into a Bayesian regression predictor to obtain the residual service lifePredictive value

the multi-scale feature extractor comprises a multi-scale Bayes convolution sparse self-attention module, residual connection and layer normalization, a Bayes time convolution neural network and a full connection layer; input data X _pos Firstly, inputting the Head consisting of different Bayes convolution sparse self-attentions into a multi-scale Bayes convolution sparse self-attentions module _i I=1, 2, 6, degradation features and uncertainty information can be extracted from the data at different time scales, then the extracted information at different time scales is fused by stitching and bayesian linear layers, and an output P is obtained ₁ Which is in accordance with the initial input data X _pos Adding and performing layer normalization to obtain Output ₁ Afterwards, output is carried out again ₁ Inputting into Bayesian time convolution neural network, further enhancing extraction of local features and time sequence features, and obtaining output P ₂ Inputting the mixture into a full connection layer to obtain an output F, and finally mixing F with P ₂ Adding and carrying out layer normalization to obtain a final Output value Output ₂ ；

The multi-scale Bayesian convolution sparse self-attention module, as shown in FIG. 3, comprises six Head heads consisting of six Bayesian convolution sparse self-attention with different convolution kernel sizes _i I=1, 2, 6 and bayesian linear layers; the method can extract multi-scale degradation characteristics and uncertainty information from different time scales for input data; the degradation characteristics and uncertainty information of different time scales extracted by six heads are spliced in one firstAfter that, the output value P is finally obtained by fusing the Bayesian linear layers ₁ The formula for the whole process is described as follows:

P ₁ ＝PReLU(Concat(Head ₁ ,Head ₂ ,...,Head ₆ ))W ₂ +b ₂

wherein W is ₂ And b ₂ Network weight variables and bias variables, which are Bayesian linear layers, are random variables and are approximated by a variational distribution Q of the true posterior distribution P (W|D) of a multiscale Bayesian convolution transducer model parameter W _θ Sampling in (W), determining the value, head _i I=1, 2,..6 represents the i-th bayesian convolution sparse self-attention.

The Bayesian convolution sparse self-attention, as shown in fig. 4, comprises a Bayesian expansion causal convolution neural network, a query Q sparseness metric and Bayesian self-attention; firstly, a Bayesian dilation causal convolutional neural network maps the local context of input data into a query vector Q, a key vector K and a value vector V segment by segment; thereafter, a new query vector is obtained by the sparsity measure of query vector QFinally, the local context information of the input data can be integrated into global attention calculation through Bayesian self-attention, so that long-term and short-term degradation characteristics and uncertainty information can be extracted simultaneously, and the influence of high uncertainty data on characteristic extraction can be restrained.

The Bayesian expansion causal convolutional neural network comprises three parts, namely convolution, expansion and causal; the convolution part is used for sliding six convolution kernels on input data in parallel and carrying out convolution calculation; the expansion means that when the convolution calculation is performed, the input data is sampled at intervals according to the expansion rate, namely, only part of the data is subjected to the convolution calculation; causality refers to the fact that only the history before the time t is considered when the convolution calculation is carried out on the data at the time t Data, prevent future data leakage; the weight and bias variables of the network are distributed from approximate variation Q _θ (W) sampling to determine a value.

The sparsity measure of the query vector Q leads to the attention probability p (k) of the corresponding query according to the dominant query-key pair in self-attention _j |q _i ) The principle of the distribution deviation from the uniform distribution U (a, b) is that mu dominant queries are screened out to form new query vectorsSpecifically, the attention of the ith query in the query vector Q is defined as the kernel smoothing of the probabilistic form, i.eWherein->Attention probability distribution for ith query, q _i ,k _i ,v _i Row i, exp () representing the exponential function based on the natural constant e, respectively, of Q, K, V; the attention probability distribution p (k) of the ith query _j |q _i ) Kullback-Leibler divergence with the uniform distribution U (a, b) as a sparsity measure of the query vector Q, i.e +.>d is the dimension of the input data, L _K The number is the number of the queries; obtaining the empirical approximation value of the sparsity measure by a maximum mean value measurement methodCalculation of +.>To obtain the first mu dominant query-key pairs, thereby obtaining a query vector comprising mu dominant queries +.>

The Bayesian self-attention isWherein W is ₁ And b ₁ Network weight variables and bias variables representing Bayesian linear layers, which are both random variables and are distributed Q from approximate variation _θ (W) sampling determination value, PReLU () represents a nonlinear activation function, which is defined as +.>Where a is a learnable parameter.

The bayesian time convolution neural network, as shown in fig. 5, includes two parallel network channels: the network channel comprises two groups of Bayesian expansion causal convolutional neural networks, a weight normalization layer, a ReLU activation layer and a discarding layer which are connected in series; the other network channel is a single-layer Bayesian dilation causal convolutional neural network; the outputs of the two network channels are added as the output value P of the Bayesian time convolution neural network ₂ 。

The bayesian regression predictor, as shown in fig. 6, includes three bayesian linear layers, each layer is connected with a LeakyReLU activation layer, and the specific calculation process is as follows:wherein alpha is E [0,1 ]]Indicating leakage rate, W ₃ 、W ₄ And b ₃ 、b ₄ The weights and bias variables respectively representing the Bayesian linear layer are distributed from the approximate variation Q _θ (W) sampling to determine a value; this section is used to build the information Output extracted by the multi-scale feature extractor ₂ And a nonlinear probability mapping relation between the true residual service life value.

The initialization process of the multi-scale Bayes convolution transducer model is as follows: from a normal standard distributionRandom values are obtained by random sampling, and then all models are obtained Parameters are randomly initialized to random values for these samples to facilitate the diversity and learning capabilities of the model.

Step 3, inputting the training set sample D into a multi-scale Bayesian convolution transducer model, and obtaining an approximate variation distribution Q of a true posterior distribution P (W|D) of a parameter W of the multi-scale Bayesian convolution transducer model by a Monte Carlo sampling method _θ In (W), sampling model parameter W, calculating and storing bearing residual service life predicted valueCorresponding regression loss->Distribution loss->

The multi-scale Bayesian convolution transducer model is a Bayesian deep learning framework, and from the perspective of probability distribution, uncertainty characterization can be incorporated into a deep learning model based on variation reasoning, so that the uncertainty is quantized; first the model regards all its model parameters W as random variables rather than as determined values, and during each forward propagation it derives a distribution Q from an approximation of the true posterior distribution P (w|d) of the model parameters W _θ And (3) sampling in (W) to obtain the values of all model parameters W, and then calculating corresponding residual service life predicted values.

Approximation variational distribution Q of true posterior distribution P (W|D) by variational reasoning _θ (W), in particular by means of the Kullback-Leibler divergence KL (Q) _θ (W) P (W) D)) to measure the distance of these two distributions; according to Bayes formula satisfied by all model parametersKL (Q) _θ Conversion of (W) P (W D) to->Wherein the method comprises the steps ofP (W) is a priori distribution, P (d|w) is a likelihood function, and P (D) is an edge distribution.

Said distribution lossApproximate variation distribution Q for optimizing model parameters W _θ Kullback-Leibler divergence KL (Q) between (W) and its true posterior distribution P (w|d) _θ (W) P? w|d)); by varying the distribution Q from the approximation _θ Performing Monte Carlo sampling in (W) to obtain KL (Q) _θ An approximate solution of (W) ||P (W|D)) and is taken as a distribution loss value, namelyWherein W is _s Representing the distribution Q of variation from approximation _θ Model parameters of the s-th Monte Carlo sample in (W), θ is a parameter approximating the variation distribution.

In the training process of the multi-scale Bayesian convolution transducer model, the distribution loss is continuously reducedTo obtain an approximate variation distribution Q which most approximates the true posterior distribution P (W|D) _θ (W)。

Specifically, approximate variation distribution Q _θ (W) is set to Gaussian distributionWherein the learnable distribution parameter θ= { μ, σ }; then adopting a re-parameterization skill to ensure the gradient information of the learned multi-scale Bayesian convolution transducer model parameter W= { W, b }, wherein the calculation formula is +. >

The regression lossPiecewise weighted loss function to account for excessive prediction errorWherein n is the number of samples, ">y _i Representing the true remaining life value of the ith sample,/->Is the predicted value of the s-th residual service life of the i-th sample, and the parameter gamma ₁ And gamma ₂ Weights respectively representing an excessively high prediction error and an excessively low prediction error; parameter gamma ₁ Reflecting the suppression degree of the excessive prediction error, setting gamma ₁ =a, a∈r and a > 1, γ ₂ ＝1。

Specifically, the residual life prediction value can be calculated by the following formula

Wherein W is a parameter of a multi-scale Bayesian convolution transducer model, { X ^* ,Y ^* Is the sample to be predicted, θ is the approximate variation distribution Q _θ Parameters of (W), N _s The number of Monte Carlo samples is the number of repeated predictions.

Step 5, calculating average regression loss, average distribution loss and uncertainty lossAnd adding to obtain the overall loss value->Updating the approximate variation distribution Q through a back propagation algorithm _θ Parameter θ of (W) to minimize the overall loss value +.>The specific calculation process is as follows:

said uncertainty lossPredicting the sum of the variance and covariance of the distribution for the remaining useful life, while ensuring +.>Still active near 0, will +.>Defined as an exponential function based on a natural number eWherein->The s-th predicted value representing the i-th sample,/->For the average predictive value of the ith sample, Λ _sm A covariance matrix representing the true residual life values and residual life predictions for all samples, where the residual life predictions are N for each sample _s Average of duplicate predictions,/>The s-th predicted value representing the i-th sample, is->Average predicted value for the i-th sample;finally, the regression loss, the mean value of the distribution loss and the uncertainty loss value are calculated and added according to different weight values as the total loss value +.>I.e. < ->Wherein lambda is a weight parameter and is set to 0.01.

Specifically, the update formula of the parameter θ in the back propagation algorithm is

Step 8, repeatedly executing the step 7 until the maximum predicted times P are reached _max ＝10 ⁴ And further obtaining the probability distribution of the residual service life prediction of each test sample for subsequent reliability analysis, so as to determine the reliability of the residual service life prediction result. The confidence interval with the confidence of the residual service life prediction result of 95% is calculated, and the specific calculation process is as follows:

the 95% confidence interval is calculated by adopting a quantile estimation method, and P is calculated first _max The average value M and standard deviation ST of the remaining life predictions, after which the upper and lower limits U and L of the 95% confidence interval can be found by the formulas u=m+1.96 ST and l=m-1.96 ST, respectively.

said remaining useful lifeNuclear distribution visualization of a mission prediction probability distribution, referred to as rendering P _max The nuclear density estimation curves of the residual service life predicted values are analyzed to analyze the distribution characteristics of the residual service life predicted values and judge the credibility; if the data distribution on the kernel density estimation curve is more concentrated and is closer to the real life value, the reliability of the residual life prediction value is higher.

The excessive prediction rate MOP is defined asWherein->Uncertainty estimate MCE is defined as +.>Where N represents the number of samples, N _s Representing the number of monte carlo samples.

The method provided by the invention is further described below with a specific example.

In the embodiment, a bearing full life cycle state monitoring dataset collected by a PRONOSTIA of an experimental platform where French FEMTO-ST research is located is used, as shown in fig. 7, and experimental verification is carried out on the bearing residual life prediction method based on a multi-scale Bayesian convolution transducer model.

The accelerated life test on the PRONOSTIA experiment platform is used for enabling the bearing to be rapidly degraded and disabled in a short time, the acceleration sensor is arranged in the horizontal direction and the vertical direction of the bearing to collect vibration signals of the bearing, the sampling frequency is 25.6KHz, signals are collected every 10s, 0.1s of signals are collected every time, and each group of sampling data comprises 2560 data points; when the amplitude of the collected vibration signal exceeds 20g, the bearing is completely failed, and the residual service life is 0; the data set has the full life cycle state monitoring data of 17 groups of bearings under 3 working conditions (different rotating speeds and loads); the invention uses the monitoring data of the bearing life cycle state under the first working condition (the rotating speed is 1800r/min and the load is 4000N) to verify the effectiveness and the superiority of the method. Specifically, bearings 1_1,1_2,1_5,1_6 and 1_7 are used as training sets, and bearings 1_4 are used as test sets.

Selecting three models of a Bayes multi-scale convolutional neural network, a Bayes long-short-term memory network and a Bayes gating cycle unit as a comparison analysis method; in addition, 50% of data in the training set is randomly selected in the experiment and used for training a deep learning model so as to simulate a situation of insufficient data of state monitoring; to verify the effectiveness and superiority of the invention on data sets containing different levels of noise, different levels of noise are added in the test set, and the following three data conditions are set: condition a is the original data with no added noise; the condition b is that composite noise which is formed by Gaussian white noise and Laplacian noise and has a signal-to-noise ratio of 10 is added into the original data; the condition c is that composite noise with the signal-to-noise ratio of 1 is added into the original data, and the composite noise consists of Gaussian white noise, laplace noise and random pulse disturbance; in order to evaluate the prediction performance of different models, four evaluation indexes including Root Mean Square Error (RMSE), mean Absolute Error (MAE), score function Score and coverage probability (PICP) of a prediction interval are selected and defined asWhere n is the number of samples, y _i And->Respectively representing a real residual service life value and a predicted residual service life value; / >Wherein-> Wherein when L _i ≤C _i ≤U _i At time C _i =1 otherwise C _i ＝0，L _i And U _i Respectively represent the ith sampleThe lower and upper limits of the confidence interval of the prediction.

In this embodiment, the main structural parameter setting conditions of the multi-scale bayesian convolution transducer model are as follows: the convolution kernel sizes in six Bayes convolution sparse self-attentiveness modules of the multi-scale Bayes convolution sparse self-attentiveness module are respectively taken as 3, 5, 7, 9, 11 and 15, and the expansion rates are all 1; the convolution kernel sizes of the Bayes expansion causal convolution neural network in the Bayes time convolution neural network are all 11, and the expansion rates are all 2. The super parameter setting condition in the training process is as follows: the batch size, i.e. the number of samples trained at one time, was 128, the training round was 500, the optimization algorithm was Adam optimization algorithm, and the learning rate was set to 0.0005.

Under the test set of the three data conditions a, b and c, the test bearing residual service life prediction results provided by the method, the Bayes multi-scale convolutional neural network, the Bayes long-short term memory network and the Bayes gating circulating unit are shown in fig. 8, and four prediction performance evaluation indexes of corresponding Root Mean Square Error (RMSE), mean Absolute Error (MAE), score function Score and coverage probability (PICP) of a prediction interval are shown in table 1; tables 2 and 3 show the corresponding uncertainty estimate MCE and the over-prediction MOP, respectively; fig. 9 (a) and (b) show the inventive method and the visualization of the kernel distribution of the bayesian multi-scale convolutional neural network corresponding to the residual life prediction probability distribution, respectively, wherein the sampling instants for the samples are 11470 seconds, 12020 seconds, 12810 seconds, 13080 seconds, 13620 seconds and 14140 seconds.

TABLE 1

TABLE 2

TABLE 3 Table 3

From the comparative prediction results of the inventive method in fig. 8, table 1, table 2, table 3 and the bayesian multi-scale convolutional neural network, bayesian long-term memory network and bayesian gating cyclic unit model, it can be seen that: the bearing residual service life prediction result provided by the method provided by the invention obtains the optimal values on four evaluation indexes of Root Mean Square Error (RMSE), average absolute error (MAE), score function Score and coverage probability (PICP) of the prediction interval, which shows that the method can provide a more accurate, more reliable and excessively high prediction risk avoiding residual service life prediction result.

As can be seen from fig. 9, compared with the bayesian multi-scale convolutional neural network, the residual service life prediction value obtained by the method of the present invention is more concentrated in distribution on the corresponding kernel density estimation curve and is closer to the real service life value, which indicates that the residual service life prediction value with higher reliability can be obtained by the method of the present invention.

In conclusion, the method can extract more accurate, rich and complete multi-scale degradation characteristics and uncertainty information from the bearing state monitoring data, can effectively inhibit uncertainty and excessive prediction errors, and has higher generalization performance, robustness and more superior residual service life prediction performance.

Claims

1. The method for predicting the residual service life of the bearing based on the multi-scale Bayesian convolution transducer model is characterized by comprising the following steps:

step 3, inputting the training set sample D into a multi-scale Bayesian convolution transducer model, and obtaining an approximate variation distribution Q of a true posterior distribution P (WD) of a parameter W of the multi-scale Bayesian convolution transducer model by a Monte Carlo sampling method _θ Sampling model parameters W in (W), calculating and storing the predicted value of the residual service life of the bearingCorresponding regression loss->Distribution loss->

Step 5, calculating average regression loss, average distribution loss and uncertainty loss And adding to obtain the integral loss valueUpdating the approximate variation distribution Q through a back propagation algorithm _θ Parameter θ of (W) to minimize the overall loss value +.>

1.3 Normalized remaining useful life value to [0,1 ] by a min-max normalization formula]The interval, the minimum-maximum normalization formula is:wherein Y is the sample data to be normalized, Y _min And Y _max Respectively minimum value and maximum value of sequence data Y _N For the final normalized number of samplesAccording to the above;

2. The method for predicting the residual service life of a bearing based on a multi-scale Bayesian convolution transducer model according to claim 1, wherein in the step 2, the multi-scale Bayesian convolution transducer model comprises a Bayesian convolution entry embedding layer, a multi-scale feature extractor and a Bayesian regression predictor, and the specific structure and the construction process comprise:

the multi-scale feature extractor comprises a multi-scale Bayes convolution sparse self-attention module, residual connection and layer normalization, a Bayes time convolution neural network and a full-scale Bayes convolution sparse self-attention module A connection layer; input data X _pos Firstly, inputting the Head consisting of different Bayes convolution sparse self-attentions into a multi-scale Bayes convolution sparse self-attentions module _i I=1, 2, 6, degradation features and uncertainty information can be extracted from the data at different time scales, then the extracted information at different time scales is fused by stitching and bayesian linear layers, and an output P is obtained ₁ Which is in accordance with the initial input data X _pos Adding and performing layer normalization to obtain Output ₁ Afterwards, output is carried out again ₁ Inputting into Bayesian time convolution neural network, further enhancing extraction of local features and time sequence features, and obtaining output P ₂ Inputting the mixture into a full connection layer to obtain an output F, and finally mixing F with P ₂ Adding and carrying out layer normalization to obtain a final Output value Output ₂ ；

3. The method for predicting the residual service life of a bearing based on a multi-scale Bayesian convolution transducer model according to claim 2, wherein the multi-scale Bayesian convolution sparse self-attention module comprises six Head heads consisting of six Bayesian convolution sparse self-attention modules with different convolution kernel sizes _i I=1, 2, 6 and bayesian linear layers; the method can extract multi-scale degradation characteristics and uncertainty information from different time scales for input data; the six head extracted degradation characteristics with different time scales and uncertainty information are spliced together, and then are fused through a Bayesian linear layer to finally obtain an output value P ₁ The formula of the whole process is as follows:

P ₁ ＝PReLU(Concat(Head ₁ ,Head ₂ ,...,Head ₆ ))W ₂ +b ₂

the Bayesian convolution sparse self-attention comprises a Bayesian expansion causal convolution neural network, query Q sparseness measurement and Bayesian self-attention; first, a Bayesian dilation causal convolutional neural network maps the local context of input data segment by segment into a query vector Q, a key vector K, and a value vector V, followed by an attention probability distribution for the ith queryKullback-Leibler divergence with uniformly distributed U (a, b) as a sparsity measure of query vector Q, mu dominant queries are screened out to form a new query vector +. >Finally, the Bayesian self-attention is calculatedWherein q is _i ,k _i ,v _i Row i, PReLU () representing the nonlinear activation function, W, representing Q, K, V, respectively ₁ And b ₁ Network weight variables and bias variables representing bayesian linear layers; the Bayesian convolution sparse self-attention can integrate local context information of input data into global attention calculation so as to realize the simultaneous extraction of long-term and short-term degradation characteristics and uncertainty information, and can inhibit the influence of high uncertainty data on characteristic extraction.

4. The method for predicting the residual service life of a bearing based on a multi-scale Bayesian convolution transducer model as set forth in claim 2, wherein the Bayesian time convolution neural network comprises two parallel network channels: one network channel is formed by connecting two groups of Bayesian expansion causal convolutional neural networks, a weight normalization layer, a ReLU activation layer and a discarding layer in series, and the other network channel is a single-layer Bayesian expansion causal convolutional neural network; the outputs of the two network channels are added as the output value P of the Bayesian time convolution neural network ₂ 。

5. The method for predicting the residual service life of the bearing based on the multi-scale Bayesian convolution transducer model according to claim 2, wherein the Bayesian expansion causal convolution neural network comprises three parts of convolution, expansion and causal; the convolution part is used for sliding six convolution kernels on input data in parallel and carrying out convolution calculation; the expansion means that when the convolution calculation is performed, the input data is sampled at intervals according to the expansion rate, namely, only part of the data is subjected to the convolution calculation; the causality refers to that only historical data before the moment t is considered when the convolution calculation is carried out on the data at the moment t, so that future data leakage is prevented; the weight and bias parameters of this network are random variables and are distributed from approximately variation Q _θ (W) sampling to determine a value.

6. The method for predicting remaining life of a bearing based on a multi-scale bayesian convolution transducer model according to claim 1, wherein said regression loss in step 3Piecewise weighted loss function for consideration of too high prediction error>Wherein n is the number of samples, ">y _i Representing the true remaining life value of the ith sample,/->Is the predicted value of the s-th residual service life of the i-th sample, and the parameter gamma ₁ And gamma ₂ Weights respectively representing an excessively high prediction error and an excessively low prediction error; parameter gamma ₁ Reflecting the suppression degree of the excessive prediction error, setting gamma ₁ =a, a∈r and a > 1, γ ₂ ＝1。

7. The method for predicting remaining life of bearing based on multi-scale Bayesian convolution transducer model as recited in claim 1, wherein in step 3, said distribution loss isApproximate variation distribution Q for optimizing model parameters W _θ Kullback-Leibler divergence KL (Q) between (W) and its true posterior distribution P (w|d) _θ (W) P? w|d)); KL (Q) _θ An approximate solution of (W) ||p (w|d)) as a distribution loss value +.>I.e.Wherein W is _s Representing the distribution Q of variation from approximation _θ Model parameters of the s-th Monte Carlo sample in (W), θ is a parameter approximating the variation distribution.

8. The method for predicting remaining life of a bearing based on a multi-scale bayesian convolution transducer model according to claim 1, wherein said uncertainty loss in step 5For remaining service lifeSum of variance and covariance of the prediction distribution of the life, while ensuring +.>Still active near 0, will +.>Defined as an exponential function based on a natural number e, i.e. +.>Where N represents the number of samples, N _s Represents the number of Monte Carlo samples, +.>The s-th predicted value representing the i-th sample,/->For the average predictive value of the ith sample, Λ _sm A covariance matrix representing the true residual life values and residual life predictions for all samples, where the residual life predictions are N for each sample _s Average of the repeated predictions; finally, the regression loss, the mean value of the distribution loss and the uncertainty loss value are calculated and added according to different weight values as the total loss value +.>I.e. < ->Wherein lambda is a weight parameter and is set to 0.01.

9. The method for predicting remaining life of bearing based on multi-scale bayesian convolution transducer according to claim 1, wherein in step 9, said too high prediction rate MOP is defined as Wherein->Uncertainty estimate MCE is defined as