CN117473411A

CN117473411A - Bearing life prediction method based on improved transducer model

Info

Publication number: CN117473411A
Application number: CN202310865817.4A
Authority: CN
Inventors: 向玲; 付晓梦婷; 胡爱军; 邴汉昆; 朱国鹏; 王凯伦
Original assignee: North China Electric Power University
Current assignee: North China Electric Power University
Priority date: 2023-07-14
Filing date: 2023-07-14
Publication date: 2024-01-30

Abstract

A method of bearing life prediction based on an improved transducer model, the method comprising the steps of: a. preprocessing data; b. building a rolling bearing life prediction model: extracting features of the data by using a dynamic convolution layer, and constructing a rolling bearing life prediction model by using an encoder structure comprising a multi-head ProbSparse self-attention module and adopting different optimization strategies for each head; c. training a bearing residual life prediction model; d. prediction of bearing remaining life. The invention adopts a method combining dynamic convolution, a transducer model architecture and ProbSparse self-attention to predict the residual service life of the rolling bearing, can effectively extract the characteristics related to the service life in bearing vibration signals, has generalization capability for predicting the bearing under a plurality of working conditions, can maintain higher prediction precision for bearing data, and ensures the safe and stable operation of mechanical equipment.

Description

Bearing life prediction method based on improved transducer model

Technical Field

The invention relates to the technical field of bearing testing, in particular to the technical field of bearing testing based on an improved transducer model.

Background

With the development of modern industry, mechanical devices gradually tend to be intelligent and precise, and accompanying the requirement for long-term safe and stable operation of mechanical devices, the requirement for mechanical parts is higher. Rolling bearing is used as a key component for determining the health condition of rotary machinery and the residual service life of the rotary machinery, plays a critical role in the safe operation of equipment, any form of bearing failure can cause the abnormal operation of the whole equipment, and the accurate prediction of the residual service life of the bearing is a key for avoiding the failure of the bearing and the system thereof. Therefore, advanced algorithms are required to predict the remaining life of the rolling bearing to avoid equipment risk.

The patent of the invention with the publication number of CN 110232249B discloses a method for predicting the residual life of a rolling bearing, which predicts the life of the rolling bearing by using vibration signals of the rolling bearing by training a multi-scale convolutional neural network model. The method comprises the following steps: using a plurality of unused bearings to perform an accelerated degradation experiment to obtain a full-life vibration signal; converting the service life of the bearing into a health index by using an anti-hyperbolic tangent function; establishing a multi-scale convolutional neural network model, and training the model by using the obtained data; measuring vibration signals of the rolling bearing with the service life to be predicted by using an acceleration sensor; inputting the obtained vibration signals into a trained multi-scale convolutional neural network model to obtain health indexes of the rolling bearing with the service life to be predicted; and converting the obtained health index into the residual life of the rolling bearing with the life to be predicted. The patent aims to provide a prediction method capable of efficiently and accurately predicting the residual life of a rolling bearing under a complex working condition. However, the rolling bearings used in the industrial environment are different in types, and for different types of bearings, degradation characteristics contained in the monitored data have large differences even under the same running condition, and for a large amount of acquired data, the traditional machine learning algorithm is difficult to accurately characterize a complex nonlinear relationship in the data, the characteristic weights of the data cannot be adjusted in a self-adaptive manner, so that the accuracy of bearing life prediction is low. Therefore, how to effectively extract the degradation characteristics of the bearing and complete the life prediction of the rolling bearing under different working conditions is particularly important.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a bearing life prediction method based on an improved transducer model, which can effectively extract bearing degradation characteristics and improve the accuracy of rolling bearing life prediction under different working conditions.

The problems of the invention are solved by the following technical proposal:

a method of bearing life prediction based on an improved transducer model, the method comprising the steps of:

a. data preprocessing:

(1) collecting rolling bearing life vibration data obtained by monitoring a sensor, and selecting horizontal vibration signals in the rolling bearing life vibration data as a task set;

(2) performing fast Fourier transform on the horizontal time domain signals, converting the horizontal time domain signals into frequency domain data, and sequentially accumulating the frequency domain data to obtain accumulated amplitude characteristics;

(3) dividing a task set into a training set and a testing set;

b. building a rolling bearing life prediction model:

extracting features of bearing data by utilizing a dynamic convolution layer, then adopting different optimization strategies for each head by utilizing an encoder structure comprising a multi-head ProbSparse self-attention module, establishing the connection between the extracted features and health indexes, and constructing an improved Transformer model by combining a Transformer to obtain a rolling bearing life prediction model;

c. training a bearing residual life prediction model:

after the cumulative amplitude features are extracted from the training set, the cumulative amplitude features are input into an improved transducer model for training, and meanwhile, network parameters are adjusted according to the change condition of a loss function;

d. prediction of bearing remaining life:

and extracting the cumulative amplitude characteristic from the test set, and inputting the cumulative amplitude characteristic into a trained complete rolling bearing life prediction model to obtain a predicted value of the residual life of the bearing.

According to the improved transducer model-based bearing life prediction method, the calculation formula for performing fast Fourier transform on the horizontal time domain signal to convert the horizontal time domain signal into frequency domain data is as follows:

wherein X is _N (k) X is a discrete Fourier transform function ₁ (k) Is the discrete Fourier transform of even number, X ₂ (k) Is a discrete fourier transform of an odd term,n is the transform interval length of the discrete fourier transform, which is a twiddle factor.

According to the improved transducer model-based bearing life prediction method, the process of extracting the characteristics of the bearing data by using the dynamic convolution layer is as follows:

dynamic perceptrons are obtained by integrating multiple linear functionsTo realize the distribution update of the parameters, the process is as follows:

wherein x and y represent input and output, respectively, g represents a normalization function,represents the weight matrix corresponding to the input, K represents the number of integrated linear functions, pi _k (x) Attention weight representing the kth integrated function generated, +.>And (3) withWeight matrix and bias vector representing the kth integrated function, respectively,>and->Respectively representing a weighted weight matrix and a weighted bias vector.

The improved transducer model-based bearing life prediction method uses an encoder structure comprising a multi-head ProbSparse self-attention module, adopts different optimization strategies for each head, avoids information loss, and has the following working procedures:

a. assigning a default value of 5InL to each query vector and key vector;

b. calculate the sparsity score M (q _i K) to represent the relationship between the query vector and the key vector:

wherein q _i Represents the ith query point, k _i Representing an ith key value point, L representing a sequence length, and d representing a vector dimension;

c. selecting the most relevant query vector and key vector according to the sparsity score (selecting the query point and key value point with the highest sparsity score, obtaining the query vector and key vector through combination), and calculating the attention weight and output vector, wherein the calculation process is as follows:

where Attention (Q, K, V) represents the Attention function, here calculated as the output vector, and calculated as the total score function by the multi-headed Attention function below), Q, K and V are the query vector, key vector and value vector, respectively,represents a sparse vector of the same size as Q, d _k For the key vector dimension, softmax is the classification function;

d. calculating a plurality of attention coefficients and splicing the results to capture relevant information on different subspaces:

firstly, converting an input vector into three groups of vectors, wherein the three groups of identical term vectors are multiplied by different matrixes to obtain different weight matrixes, the number of each group of vectors is h, and the dimension is d _model And/h, then calculating the obtained vector through a plurality of attention functions to obtain a weighting matrixAnd +.>The following formulas bring the groups of matrices into the calculation respectively, and finally obtain a multi-head attention result) the multi-head attention is calculated as follows:

wherein Attention (·) is an Attention function, multi head (·) is a multi head Attention function, d _model Representing the embedding dimension, head _i Represents the ith attention coefficient, W ^O Representing a weight matrix.

In the above bearing life prediction method based on the improved transducer model, in the rolling bearing life prediction model, each element in an input sequence is converted into a high-dimensional vector by an encoder, so that the context information of the element in the whole sequence is represented, firstly, multi-head self-attention operation is carried out on the input vector, then the input vector is input into a feedforward neural network, the input vector and the feedforward neural network are connected by using layer normalization and a residual connection sequence, and finally, an output vector is obtained, and the calculation formula of the feedforward neural network is as follows:

wherein GeLU represents a Gaussian error linear unit activation function, the activation function is denoted by g, W ₁ ，W ₂ ，b ₁ ，b ₂ The weights and the deviations of the two full connection layers are respectively;

the overall computational process of the encoder can be expressed as:

wherein,and->The outputs of the multihead self-attention module and the feedforward neural network module are respectively LN representing a layer normalization function.

The rolling bearing life prediction model comprises a dynamic convolution layer, four encoders and a full-connection layer, wherein the convolution kernel is 3, the number of multi-head attention heads h is 8, and the dimension d is embedded _model 256 training lots, 128, learning rate 0.001, and cycle number 50.

According to the improved transducer model-based bearing life prediction method, the accumulated amplitude characteristic is calculated as follows:

a. let the original vibration time domain signal be x= [ X ] ₁ ,X ₂ ,...,X _n ]N is the total number of samples, wherein,m represents each sample length; />Sample points representing each set of samples, i=1, 2, …, n, j=1, 2, …, m;

b. after performing fast Fourier transform on the original vibration time domain signal, performing iterative summation on the amplitude value in each sample to obtain an accumulated amplitude value characteristic, wherein a calculation formula of a t-th sampling TF characteristic is as follows:

advantageous effects

The invention adopts a method combining dynamic convolution, a converter model framework and ProbSparse self-attention to predict the residual service life of the rolling bearing, not only can effectively extract fault characteristics related to the service life in bearing vibration signals, but also has certain generalization capability for predicting the bearing under a plurality of working conditions, can maintain higher prediction precision for bearing data under a plurality of working conditions, and ensures safe and stable operation of mechanical equipment.

Drawings

The invention is described in further detail below with reference to the accompanying drawings.

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a diagram of a dynamic convolution architecture;

fig. 3 (a) -3 (c) are encoder configuration diagrams;

FIG. 4 is a diagram of a model framework of the present invention;

fig. 5 (a) -5 (f) are graphs of bearing predictions for various operating conditions.

The symbols used herein are: x is the original vibrationTime domain signal, X _i For each sample signal, TF is the cumulative amplitude signature, y represents the output of the dynamic convolutional layer, attention (·) is the Attention function, Q, K and V are vectors of input data obtained by different linear transformations, multi-head (·) is the multiheaded Attention function, and FF (·) is the result of the feedforward neural network.

Detailed Description

Aiming at the limitations of the existing bearing residual life prediction method, the invention provides a bearing life prediction method based on an improved transducer model, which can effectively extract the degradation characteristics of the bearing, further realize the accurate prediction of the residual life of the rolling bearing and ensure the normal operation of equipment.

Referring to fig. 1, the present invention includes the steps of:

a. data preprocessing:

(2) performing fast Fourier transform on the horizontal time domain signals to convert the horizontal time domain signals into frequency domain data, and sequentially accumulating the frequency domain data to obtain accumulated amplitude characteristics;

(3) dividing a task set into a training set and a testing set;

b. building a rolling bearing life prediction model:

extracting features of bearing data by utilizing a dynamic convolution layer, adopting different optimization strategies for each head by utilizing an encoder structure comprising a multi-head ProbSparse self-attention module, establishing the connection between the extracted features and health indexes, and constructing an improved transducer model by combining a transducer to obtain a rolling bearing life prediction model;

c. training a bearing residual life prediction model:

inputting the preprocessed training set into an improved transducer model for training, and adjusting network parameters according to the change condition of a loss function;

d. prediction of bearing remaining life:

the preprocessed test set (see table 2, the test set comprises bearings 1-3, 1-4, 1-5, 2-3, 2-4 and 2-5) is input into a completely trained rolling bearing life prediction model, a predicted value of the residual life of the bearing is obtained, and the residual life of the bearing is predicted by using a quantized index of the residual life (the residual life of the bearing is predicted by sigmoid function output).

The cumulative amplitude feature is calculated as follows:

b. after the original vibration time domain signal is subjected to fast Fourier transform, iterative summation is sequentially carried out, and the accumulated amplitude characteristic, namely TF characteristic for short, is obtained, and the calculation formula of the TF characteristic of the t-th sampling is as follows:

the method for converting the original time domain data into the frequency domain data is as follows:

the spectrum signal is obtained by performing discrete fourier transform (DPT) on a time domain signal, and is mainly used for describing frequency components of an original signal and amplitude values of the frequency components. The DPT calculation formula is as follows:

wherein,as a twiddle factor, X (i) is the original discrete time domain signal sequence, N is the transform interval length of the discrete fourier transform, X (k) is the relative amplitude, and k is the sequence number.

Compared with the discrete Fourier transform, the fast Fourier transform can reduce the operand, improve the calculation speed, and the formula for performing the fast Fourier transform is as follows:

wherein X is ₁ (k) Is the discrete Fourier transform of an even number of terms in X (i), X ₂ (k) Is a discrete fourier transform of an odd term.

The dynamic convolution network is a neural network which dynamically aggregates a plurality of parallel convolution kernels according to an attention mechanism and can adaptively adjust convolution parameters according to different input data, so that the characteristic expression capacity and the generalization capacity of a model are obviously enhanced on the premise of increasing a small amount of network calculation amount. The process of extracting the characteristics of the bearing data by using the dynamic convolution layer is as follows:

a. the parameters of the traditional convolution kernel do not change when the model runs, and the result is as follows:

y＝g(W ^T x+b)

wherein x and y respectively represent input and output, g and W ^T And b represent the activation function, the transpose of the weight matrix, and the bias vector, respectively.

b. Dynamic convolution kernel by integrating multiple linear functionsTo realize the distribution update of the parameters, the process is as follows:

wherein K represents the number of integrated linear functions, pi _k (x) Represents the attention weight of the kth integrated function generated,and->Weight matrix and bias vector representing the kth integrated function, respectively,>and->Respectively representing a weighted weight matrix and a weighted bias vector.

The dynamic convolution is similar to the structure of the dynamic sensor and has K dynamic convolution kernels, the calculation flow is shown in figure 2, firstly, input data is subjected to global average pooling in an attention layer, then the obtained K attention weights are normalized through a full connection layer, a ReLU activation function and a softmax activation function and are sequentially distributed to the corresponding convolution kernels, finally, the obtained products are subjected to integrated processing, and output characteristics are obtained through batch normalization and activation layers.

The invention uses the multi-head ProbSparse self-attention to replace the common multi-head self-attention in the encoder, and avoids information loss by adopting different optimization strategies for each head. The working flow is as follows:

a. assigning a default value of 5InL, i.e., in (L), to each query vector and key vector, representing a logarithmic function; wherein the query vector represents a vector of relevance to be calculated, representing each point as a representation at the time of the query, each data point having a query vector; the key vector represents this point as the representation vector of the object being compared, one for each data point;

b. calculate the sparsity score M (q _i K) to represent the relationship between the query vector and the key vector as follows:

wherein q _i Represents the ith query point, k _i Represents the ith key point, L tableShowing the sequence length, d representing the vector dimension;

c. the most relevant key vector and value vector are selected according to the sparsity score, and the attention weight and output vector are calculated. The calculation process is as follows:

wherein Q, K and V are respectively a query vector, a key vector and a value vector,represents a sparse vector of the same size as Q, d _k For the key vector dimension, softmax is the classification function;

d. multi-head attention is achieved by calculating a plurality of attention coefficients and stitching the results to capture relevant information on different subspaces. FIGS. 3 (a) and 3 (b) illustrate the workflow by first converting an input vector into three distinct sets of vectors, each set having a number of vectors of h and a dimension of d _model And/h, then calculating the obtained vector through a plurality of attention functions to obtain a weighting matrixAnd +.>The calculation formula of the multi-head attention is as follows:

wherein d _model Representing the embedding dimension, head _i Represents the ith attention coefficient, W ^O Representing a weight matrix.

In the modified transducer model, each element in the input sequence is converted into a high-dimensional vector by an encoder, so that the context information of the element in the whole sequence is represented, and the structure of the context information is shown in fig. 3 (c). Firstly, carrying out multi-head self-attention operation on an input vector, then inputting the input vector into a feedforward layer, and connecting the input vector and the feedforward layer by using layer normalization and residual connection sequence to finally obtain an output vector.

The calculation formula of the feed-forward layer network is as follows:

wherein GeLU represents an activation function, W ₁ ，W ₂ ，b ₁ ，b ₂ The weights and the deviations of the two full connection layers are respectively;

the overall computational process of the encoder can be expressed as:

wherein,and->The outputs of the multi-headed self-attention module and the feedforward layer module, respectively, LN represents the layer normalization function.

The improved transducer model comprises a dynamic convolution layer, four encoders and a full-connection layer, wherein the convolution kernel is 3, the number of multi-head attention heads h is 8, and the dimension d is embedded _model 256 training lots, 128, learning rate 0.001, and cycle number 50.

The invention adopts a method of combining dynamic convolution, a converter model framework and ProbSparse self-attention to predict the residual life of the rolling bearing, the model framework structure is shown in figure 4, firstly, the characteristic information of the bottom layer is captured and enriched through a dynamic convolution layer, and an embedded vector and a position code are added to obtain an embedded input; then, the coded features are fed back to the stacked probspark self-attention module, and advanced feature representations in the data are captured and learned through the encoder; finally, through the processing of a regressor consisting of the full connection layer, the mapping between the output characteristics and the residual life labels is established, so that the bearing life prediction is completed. The method can extract the fault characteristics related to the service life in the bearing signals, has a certain generalization capability for predicting the bearings under a plurality of working conditions, is suitable for predicting the bearing data under a plurality of working conditions, keeps higher prediction precision, and ensures the safe and stable operation of mechanical equipment.

The invention has the following advantages:

a. the invention introduces dynamic convolution, so that the model combines the advantages of the dynamic convolution and the attention mechanism, and can adaptively discover and highlight the characteristic information which is beneficial to bearing life prediction;

b. the invention selects the ProbSparse self-attention to replace the traditional self-attention, thereby reducing the calculation complexity and improving the learning capacity of the network;

c. the invention utilizes the powerful feature extraction capability of the transducer architecture to estimate the residual service life of the bearing, and the proposed method is experimentally verified through the IEEE PHM 2012 challenge data set. Experimental results show that the method has higher prediction precision and can adapt to the prediction of bearing data under various working conditions.

The validity of the invention is verified by experimental analysis of the IEEE PHM 2012 challenge dataset.

The experimental platform for acquiring data sets (proctisia) mainly comprises 4 modules: the device comprises an acquisition module, a loading module, a power module and a measurement module. The data set comprises 6 training data sets running to faults and 11 monitoring data sets of the test bearings under 3 working conditions, the information of each working condition is shown in table 1, the acceleration sensor and the temperature sensor are used for measuring vibration signals in the horizontal direction and the vertical direction of the bearings respectively, the sampling frequency of the sensor is 25.6kHz, the sampling interval is 10s, the sampling time length is 0.1s each time, namely 2560 data sample points each time are sampled. Under the influence of radial loads, the horizontal vibration signal contains more degradation information, so that only the horizontal vibration signal is used to reduce noise interference.

Table 1 bearing condition table

TABLE 2 training, test set partitioning

The failure cause and degradation process of the bearings in the data set are different, and the training set needs to contain all degradation features as much as possible to improve the prediction accuracy. Thus, a total of 6 sets of experiments were performed to verify the reliability of the method, the specific arrangement being shown in table 2.

The data is preprocessed and the model is built and predicted through the steps, and meanwhile, an Adam optimizer with the learning rate of 0.001 is used for optimizing the weight of the whole neural network, so that the MSE loss function loss value is minimized. Meanwhile, an average absolute error (MAE), a Root Mean Square Error (RMSE) and an average Score are used as evaluation indexes, wherein the smaller the values of the MAE and the RMSE are, the higher the Score is, the better the model prediction effect is, and the calculation formula is as follows:

wherein n is the total number of samples, y _i Representing the actual remaining life percentage of the ith sample,predicted value representing residual life of ith sample, er _i Representing the error between the actual remaining life and the predicted value, A _i Representing the score of the i-th sample.

Table 3 gives the experimental results of the proposed method on each set of tasks. From the results, it can be seen that the RMSE and MAE of some bearings are smaller and the scores are higher, such as bearings 1-3 and 2-5, which indicates that the data are more adaptive to the network model, contain more degradation information, and thus have better prediction effect. However, some bearings, such as bearings 1-4 and 2-4, have poor predictions, with significantly larger RMSE and MAE. This is because the number of data samples in the two sets is relatively small, and the model cannot fully identify degradation information in the bearing, resulting in lower predictive performance.

TABLE 3 prediction results

The prediction results of the remaining life of the bearing are shown in fig. 5 (a) to 5 (f). The graph shows that the fluctuation between the predicted result and the actual result is smaller, which shows that the invention has higher prediction precision on the residual life of most bearings, and can keep higher stability in the early stage or the later stage of bearing degradation.

To verify the effect of introducing dynamic convolution and probspark self-attention mechanisms in the transducer, three additional sets of networks similar in structure to the model of the present invention were created, each experimental and comparative, including (1) removing the dynamic convolution layer and probspark self-attention (model 1). (2) removing the dynamic convolution layer (model 2). (3) remove ProbSparse self-attention (model 3). The results obtained and the predicted results of these models are shown in table 4.

Table 4 comparison of predicted results

From the table, the results of model 2 and model 3 are better than model 1, indicating that the introduction of the dynamic convolution layer and probspark self-attention mechanism has a positive impact on the improvement of the prediction capability of the Transformer network. The two modules are integrated into the transducer, the degradation characteristic is optimized by using dynamic convolution in the early stage, the self-attention characteristic of ProbSparse is further deepened in the later stage, and the prediction precision is improved. The scores were increased by 12% and 3% compared to model 2 and model 3, respectively. Thus, the proposed predictive improvement in this chapter with respect to the bearing RUL can prove to be effective for a transducer network.

Claims

1. A method for predicting bearing life based on an improved transducer model, the method comprising the steps of:

a. data preprocessing:

(3) dividing a task set into a training set and a testing set;

b. building a rolling bearing life prediction model:

c. training a bearing residual life prediction model:

d. prediction of bearing remaining life:

2. The improved transducer model-based bearing life prediction method of claim 1, wherein the calculation formula for performing a fast fourier transform on the horizontal time domain signal to frequency domain data is as follows:

3. The method for predicting bearing life based on an improved transducer model according to claim 2, wherein the process of extracting features from the bearing data using a dynamic convolution layer is as follows:

wherein x and y represent input and output, respectively, g represents a normalization function,representing a matrix of weights corresponding to the input,k represents the number of integrated linear functions, pi _k (x) Attention weight representing the kth integrated function generated, +.>And->Weight matrix and bias vector representing the kth integrated function, respectively,>and->Respectively representing a weighted weight matrix and a weighted bias vector.

4. A method of predicting bearing life based on an improved transducer model according to claim 3, wherein the encoder architecture comprising a multi-headed probspark self-attention module is used to avoid information loss by employing different optimization strategies for each head, the workflow is as follows:

a. assigning a default value of 5InL to each query vector and key vector;

c. selecting the most relevant query vector and key vector according to the sparsity score, selecting the query point and key value point with the highest sparsity score, obtaining the query vector and key vector by combination, and calculating the attention weight and output vector, wherein the calculation process is as follows:

where Attention (Q, K, V) represents the Attention function, here calculated as the output vector, calculated as the total score function by the following multi-headed Attention function, Q, K and V are the query vector, key vector and value vector, respectively,represents a sparse vector of the same size as Q, d _k For the key vector dimension, softmax is the classification function;

firstly, converting an input vector into three groups of vectors, namely three groups of identical term vectors, obtaining different weight matrixes by multiplying different matrixes, wherein the number of each group of vectors is h, and the dimension is d _model And/h, then calculating the obtained vector through a plurality of attention functions to obtain a weighting matrixAnd +.>The following formulas are that the groups of matrixes are respectively brought into calculation, and finally a multi-head attention result is obtained, wherein the multi-head attention calculation formulas are as follows:

5. The method for predicting bearing life based on improved transducer model as claimed in claim 4, wherein in the rolling bearing life prediction model, each element in the input sequence is converted into a high-dimensional vector by an encoder, so as to represent the context information of the element in the whole sequence, firstly, the input vector is subjected to multi-head self-attention operation, then is input into a feedforward neural network, and the input vector and the feedforward neural network are connected in sequence by using layer normalization and residual connection, and finally, an output vector is obtained, wherein the calculation formula of the feedforward neural network is as follows:

wherein GeLU represents Gaussian error linear unit activation function, g represents activation function, W ₁ ，W ₂ ，b ₁ ，b ₂ The weights and the deviations of the two full connection layers are respectively;

the overall computational process of the encoder can be expressed as:

6. The method for predicting bearing life based on improved transducer model as recited in claim 5, wherein said rolling bearing life prediction model comprises a dynamic convolution layer, four encoders and a full-connection layer, the convolution kernel size is 3, the number of multi-head attention heads h is 8, and the multi-head attention heads h are embedded in the dynamic convolution layerDimension d _model 256 training lots, 128, learning rate 0.001, and cycle number 50.

7. The improved transducer model based bearing life prediction method of claim 6, wherein the cumulative amplitude signature is calculated as follows: