CN118190420A

CN118190420A - Rolling bearing fault diagnosis method based on personalized federal transfer learning

Info

Publication number: CN118190420A
Application number: CN202410347304.9A
Authority: CN
Inventors: 汪永超; 李世昌; 李锋; 徐超; 李翰儒
Original assignee: Industrial Technology Research Institute Of Yibin Sichuan University; Sichuan University
Current assignee: Industrial Technology Research Institute Of Yibin Sichuan University; Sichuan University
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-06-14

Abstract

The invention discloses a rolling bearing fault diagnosis method based on personalized federal transfer learning, which comprises the following steps: s1, federal learning initialization training; s2, performing personalized adjustment on local training and global aggregation of the Union learning to obtain local optimal parameters of the round; s3, pre-training of the model is completed; s4, migrating the pre-trained global model to a target task; s5, taking the loss function of model migration as a total loss function of the personalized federal migration learning model, training the total loss function until convergence, and completing training; s6, classifying the target task sample to be tested by using the trained personalized federal transfer learning model. According to the invention, the effective information of each source task sample is aggregated together by utilizing personalized federal learning to obtain a universal model with good generalization performance, and the universal model is transferred to a target task for domain adaptation, so that the rolling bearing fault is accurately and stably diagnosed.

Description

Rolling bearing fault diagnosis method based on personalized federal transfer learning

Technical Field

The invention belongs to the technical field of mechanical equipment fault detection, and relates to a rolling bearing fault diagnosis method based on personalized federal transfer learning.

Background

In industrial applications, rolling bearings are one of the key components in widespread use, which play a critical role in ensuring smooth operation of mechanical devices. The main function of the rolling bearing is to support the rotating shaft, reduce friction in the moving process, and therefore improve the running efficiency and service life of equipment. However, due to its widespread use in industrial applications, bearing failure has become one of the main causes of mechanical equipment failure. Bearing failures are caused by a variety of factors including excessive loads, poor lubrication, contamination, fatigue, and material defects, which may cause damage to the internal structure of the bearing, such as damage to the rolling elements, inner and outer races, and cage, thereby causing noise, vibration, and eventual equipment failure. Therefore, timely detection and diagnosis of bearing failure is critical to ensure stability and reliability of industrial systems.

With the advanced development of artificial intelligence and sensor technology, a plurality of intelligent bearing fault diagnosis methods have been proposed in the past few years, and great improvements are provided for the field. According to the current research, the existing intelligent fault diagnosis methods can be roughly divided into three categories: machine learning method, deep learning method, and transfer learning method. In early studies, traditional machine learning methods such as Support Vector Machines (SVMs), decision trees, random forests, etc., which have proven their effectiveness in many practical applications, were mainly used. With the appearance of the Internet of things, the unprecedented data volume can be easily obtained, more fault related information is provided than before, and the accuracy of fault diagnosis is improved. Furthermore, conventional machine learning methods typically require manual feature extraction, a process that is not only time consuming and requires expertise, but may ignore some of the underlying information useful for fault diagnosis. To solve these problems, a deep learning method is beginning to be applied to fault diagnosis. For example, convolutional Neural Networks (CNNs) and long-term memory networks (LSTM) have demonstrated excellent performance in bearing fault diagnosis. However, although the deep learning method has a remarkable effect on fault diagnosis, when a scene in which data distribution is uneven and the number of samples is rare is processed, performance thereof may be affected. Furthermore, deep learning methods typically require a large amount of annotation data to train, and acquiring such data in practical applications can be a challenge.

In order to solve these problems, some researches have been started in recent years to explore the application of transfer learning in bearing failure diagnosis. Such as: han et al propose a deep migration network with Joint Distributed Adaptation (JDA) aimed at minimizing both marginal and conditional distribution differences; li et al propose a multi-layer domain self-adaptation (MLDA) method, which is characterized in that the marginal distribution and the change of the condition distribution among different working conditions are matched by adding multi-core maximum mean value difference and pseudo tag learning in a plurality of adaptation layers, so that the insensitive characteristic of the working conditions can be effectively extracted for bearing fault diagnosis; an and the like propose a method for combining domain alignment and discriminant feature learning, and the joint training of classification loss, center-based discriminant loss and correlation alignment loss can enable the representation learned in the source domain to adapt to the target domain, so that the cross-domain test performance is effectively improved. The method is based on domain adaptation of difference, and the distribution difference between the source domain and the target domain is reduced by various means, so that the adaptability and generalization performance of the model are improved. However, these approaches are often accompanied by increased computational complexity and challenges for parameter adjustment, such that the accuracy of fault diagnosis is compromised.

Federal learning is a machine learning method that has begun to prevail in recent years and is characterized by having multiple devices or servers cooperatively train a shared model without sharing the respective raw data, which was proposed by Google in 2016, and is mainly used to solve the data privacy and security problems. Federal learning can be considered a subset or special case of distributed machine learning, which requires only transmission of model parameters compared to traditional distributed learning that requires large amounts of raw data, thus greatly reducing communication costs. When the federal study is used for processing different fault samples on the same equipment, the problem of data privacy is not required to be considered, and although a model with excellent generalization performance can be obtained with higher calculation efficiency, the problems of distribution difference, unbalanced quantity and the like of the fault samples still have great influence on the precision and the performance of the model, so that the accuracy of fault diagnosis is reduced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a rolling bearing fault diagnosis method based on personalized federal migration learning.

The aim of the invention is realized by the following technical scheme: the rolling bearing fault diagnosis method based on personalized federal transfer learning comprises the following steps:

S1, inputting source task samples of different fault types of a rolling bearing as clients of federal learning, sending initial global model parameters to each client by a central server, carrying out local training after each client receives the global model parameters to obtain local optimal parameters of the round, sending the local optimal parameters back to the central server, and executing global aggregation to obtain new global model parameters; repeating the step r times to finish the federal learning initialization training;

S2, performing personalized adjustment on local training and global aggregation of the Union learning: introducing a Bayesian hierarchical model to optimize parameter estimation, and quantifying uncertainty inside a client and between clients by calculating experience variance of locally optimal parameters and experience variance of global sharing model parameters; after receiving global model parameters of a central server, a client calculates initial values and training steps of local training according to experience variances of local optimal parameters and experience variances of global shared model parameters, and constructs a local loss function by using a cross entropy function, so that local optimal parameters of the round are obtained;

S3, the client sends the local optimal parameters of the round back to the central server, the central server carries out global aggregation by using an aggregation rule of introducing the empirical variance of the local optimal parameters and the empirical variance of the global shared model parameters to obtain a new round of global model parameters, and the weighted average of the local loss functions of all the clients is used as a global loss function; s2 and S3 are repeatedly executed until the global model parameter training is converged, and model pre-training is completed;

S4, migrating the pre-trained global model to a target task, and fine-tuning the model: inserting a model patch into the model to adapt to a target task, inputting a target task sample to be tested into the trimmed model, updating parameters of the model patch and parameters of a new classification layer, and keeping other pre-training model parameters unchanged; finally, calculating the prediction probability of the model for each fault type, and taking the pseudo class label with the highest prediction probability as a target task sample to be tested, thereby obtaining a loss function of the model transferred to the target task;

S5, as the parameters of the pre-training model except the classification layer are kept unchanged, taking the model migration loss function as a total loss function of the personalized federal migration learning model, training the total loss function to convergence, and completing training;

s6, classifying the target task sample to be tested by using the trained personalized federal transfer learning model.

The beneficial effects of the invention are as follows:

(1) In PFTL of the invention, source task samples with different distribution types are used as client inputs of federal learning to perform model pre-training, and the obtained global model is used as a pre-training model, so that the effect of data integration is achieved, and the generalization capability of the model is greatly improved.

(2) The Bayesian hierarchical model is introduced in the federal learning training process to quantify the uncertainty between the inside of the client and the client, so that the model can be fully trained even when the model faces the conditions of large source task sample distribution difference, few effective fault samples, unbalanced number of different fault samples and the like, and the diagnosis precision of the model on each fault type cannot be greatly different.

(3) In order to further improve the diagnosis precision of the model, the model is finely tuned, a model patch is inserted into the model, other parameters of the pre-trained model are kept unchanged, and only model patch parameters and new classification layer parameters are updated, so that the finely tuned model can be quickly trained to be converged by utilizing a target task sample, adaptation to the target task is completed, and the target task sample has higher diagnosis precision.

(4) PFTL, so that the rolling bearing fault diagnosis method based on PFTL can utilize source task samples with large distribution difference and unequal known fault types to perform high-precision fault diagnosis on target task samples.

Drawings

FIG. 1 is a flow chart of an implementation of a rolling bearing fault diagnosis method based on PFTL;

FIG. 2 is a client m t-th round of iterative update procedure;

FIG. 3 is a pre-training model network architecture;

FIG. 4 is a diagram of a trimmed model network structure;

FIG. 5 is a model patch structure;

FIG. 6 is a Kaiser Chu Da (CWRU) bearing experimental platform;

FIG. 7 is the mean accuracy of fault diagnosis when tested with a pre-trained model;

FIG. 8 is a graph of the change in accuracy and loss during training of model migration;

FIG. 9 is a confusion matrix of classification results;

FIG. 10 is a t-SNE diagram of model extraction features.

Detailed Description

In order to solve the problems of low diagnosis precision caused by large sample distribution difference, few effective fault samples and unbalanced number of different fault samples in the fault diagnosis of the rolling bearing, the invention provides a rolling bearing fault diagnosis method based on Personalized federal transfer learning (Personalized FEDERATED TRANSFER LEARNING, PFTL). In PFTL provided by the invention, various fault samples are used as the input of each client of federation learning, and the local training and aggregation rules in the federation learning are subjected to personalized adjustment, so that the problems of overfitting and the like are avoided at the local training stage of each client, and the generalization capability of a model can be enhanced through the aggregation of a global model of federation learning; the global model with excellent generalization performance is migrated to the target task by means of the thought of migration learning, and the model is further fine-tuned by using the target task sample to complete domain adaptation, so that the model can perform best on the target task, and the model can be used for fault diagnosis on the target task sample. PFTL enable high-precision fault diagnosis in the face of large sample distribution differences, few effective fault samples and unbalanced numbers of different fault types. The technical scheme of the invention is further described below with reference to the accompanying drawings.

As shown in fig. 1, the rolling bearing fault diagnosis method based on personalized federal transfer learning of the invention comprises the following steps:

S1, inputting source task samples of different fault types of a rolling bearing as clients of federal learning, wherein the fault types of the source task samples are known and are classified label source task samples; the central server sends initial global model parameters to each client, each client receives the global model parameters and then carries out local training to obtain local optimal parameters of the round, and then sends the local optimal parameters back to the central server and carries out global aggregation to obtain new global model parameters of the round; repeating the step r times, and finishing federal learning initialization training through r rounds of iteration; federal learning is a special distributed learning, where a client is used to perform local training, and one fault type corresponds to one client; the central server is used for aggregating parameters sent by the clients to obtain a global model. The method is equivalent to that each client is used for learning information in each fault type respectively and then sending the information to the central server for integration, and has the advantage that each client is independent, so that data of each fault type can be well processed even if the data are distributed differently and in different quantities. The number of clients is determined based on actual studies, with one client for each distribution or fault type of data, e.g., the kesixi data set used in the experimental section herein is 10-class, with 10 clients.

S2, performing personalized adjustment on local training and global aggregation of the Union learning: introducing a Bayesian hierarchical model to optimize parameter estimation, and quantifying uncertainty inside a client and between clients by calculating experience variance of locally optimal parameters and experience variance of global sharing model parameters;

The specific implementation method for optimizing parameter estimation by introducing the Bayesian hierarchical model comprises the following steps: the core of personalized tuning of federal learning is to optimize parameter estimation by a bayesian hierarchical model. Specifically, a two-stage Bayesian hierarchical model is adopted, which comprises a top layer and a bottom layer. The top-level model focuses on the uncertainty between clients, aiming at exploring the commonalities and disparities of different clients. For example, when analyzing clients of different regions or types, the top-level model may attempt to capture overall trends and differences between the clients. In the invention, the top layer model is used for estimating the sharing parameters sent to the client by the federal learning center. The underlying model focuses on the uncertainty inside a single client, focusing on analyzing the distribution and characteristics of the data inside the client. This level deals with specific parameters of individual clients, which may be influenced by client-specific distribution. In combination with these two hierarchical models, the bayesian hierarchical model can take into account both global shared information and private features of a single client. In order to describe the uncertainty inside the client and between the clients, the empirical variance of the local optimal parameters and the empirical variance of the global sharing model parameters are used for quantization respectively, and the calculation formula is as follows:

In the middle of And/>Representing local and global empirical variances,/>, respectivelyAnd/>Respectively represent the locally optimal parameters/>And a mean value of the global shared model parameter θ ⁽ⁱ⁾. The basic process of federal learning is that a central server transmits parameters to each client, the clients perform local training to obtain local optimal parameters and then transmit the local optimal parameters back to the central server, and then the central server performs aggregation to obtain global model parameters of the next round. Because the process of this iteration needs to take many rounds until the global model loss function converges, t represents the t-th iteration.

After receiving the global model parameters of the central server, the client calculates initial values and training steps of local training according to the empirical variance of the local optimal parameters and the empirical variance of the global shared model parameters, and constructs a local loss function by using a cross entropy function, thereby obtaining the local optimal parameters of the round.

S3, the client sends the local optimal parameters of the round back to the central server, the central server carries out global aggregation by using an aggregation rule of introducing the empirical variance of the local optimal parameters and the empirical variance of the global shared model parameters to obtain a new round of global model parameters, and the weighted average of the local loss functions of all the clients is used as a global loss function; and repeating the steps S2 and S3 until the global model parameters are trained to be converged (namely, the global loss function reaches the minimum value), and completing the pre-training of the model.

The specific implementation method of the steps is as follows: under the federal learning framework, the central server S and the client M perform iterative update with a communication round of T, m=1, 2, …, M; after the initialization training is completed through r-round communication, the pre-training is performed. Federal learning in the t-th round of pre-training iteration, the specific update process of the client m is shown in fig. 2, and includes the following steps:

S31, the central server S sends a global sharing model parameter theta ^t-1 to the client m, and the empirical variance is calculated by using the formula (1) and the formula (3) And/>

S32, calculating a local training initial valueAnd training step number l _m:

S33, constructing a local loss function by using the cross entropy function:

Wherein, N _m is the number of m samples of the client side, and C is the number of fault categories; y _jc denotes whether the jth sample belongs to class c, if so, it is 1, otherwise it is 0; predicting the probability that a sample j belongs to a category c under the parameter theta by the model;

according to the calculated initial value The training step number l _m and the learning rate eta optimize the local training loss by using random gradient descent to obtain the local optimal parameter/>

S34, the client m will locally optimize the parametersAnd sending the parameters back to the central server, and after the central server receives the parameters of all the clients, executing global aggregation operation and updating global model parameters theta ^t through the following steps:

the weighted average of all client local loss functions is then defined as a global loss function, expressed in detail as follows:

Where w _m is the weight of client m, which is determined mainly based on the number of samples of the client.

Steps S31 to S34 are repeatedly performed until the global loss function L (θ) reaches a minimum value, thus completing the pre-training of the model. Notably, because uncertainties inside the client and between the clients are fully considered in the local training and global aggregation processes, the model achieves an enhancement effect when facing diagnosis of specific types of faults, and meanwhile the generalization performance of the global model is improved.

S4, migrating the pre-trained global model to a target task, and fine-tuning the model: inserting a model patch into the model to adapt to a target task, then inputting a target task sample to be tested (a target task sample without a class label) into the trimmed model, updating parameters of the model patch and parameters of a new classification layer, and keeping parameters of other pre-training models unchanged; finally, calculating the prediction probability of the model for each fault type, and taking the pseudo class label with the highest prediction probability as a target task sample to be tested, thereby obtaining a loss function of the model transferred to the target task;

The diagnosis model is pre-trained by using federal learning, and although the generalization capability of the model can be improved by learning diversified source task sample data, the learning of private features by the global model is slightly lacking, and domain suitability needs to be considered due to data distribution difference when the model is migrated to a target task and faces to unlabeled target task sample data. Thus, there is also a need for further optimization and tuning of diagnostic models using target task data. In the pre-training stage, a deep neural network is used for carrying out model training of a client and a central server, the network structure of the deep neural network is shown in figure 3, the deep neural network comprises four convolution layers, an average pooling layer, a full connection layer and a classification layer which are sequentially connected, the output of the convolution layer 2 and the input of the convolution layer 1 are connected and then used as the input of the convolution layer 3, and the output of the convolution layer 4 and the input of the convolution layer 3 are connected and then used as the input of the average pooling layer;

Wherein the method comprises the steps of For pretraining the network,/>Mainly convolution layer and full connection layer parameters,/>Classification layer representing a pre-trained model,/>Is a parameter of the classification layer. In the pre-training process, after the extraction of the sample characteristics of a plurality of source tasks is fully learned through federal learning, only the pre-training network parameters/>, which are required to be kept, are transferred to the target taskUnchanged, hierarchical parameters/>And updating. At the same time, in order to realize high parameter adaptability between the source task and the target task, fine tuning is performed on the network framework, a model patch is inserted into the pre-training model, and the model patch is added after the convolution layer 2 and the convolution layer 4 respectively. The trimmed frame is mainly composed of three parts, as shown in fig. 4: all pre-trained models (parameterized/>, except for the classification layer) New class layer pi (v) corresponding to the target task (parameterized as v), and inserting model patches of the pre-trained convolutional layer (parameterized as/>)。

For a sample to be diagnosed as belonging to that type of fault, the corresponding feature must be extracted. Generally, the feature comprises a low-level feature, which is the most basic and original feature of the sample and may be similar to other samples, and a high-level feature, which is a more abstract and differentiated representation based on the low-level feature, is a key of fault diagnosis, but the low-level feature and the high-level feature are required to participate in diagnosis together to achieve ideal diagnosis precision. Therefore, when the problem of insufficient sample training caused by large sample distribution difference, unbalanced quantity and the like is faced, the invention firstly utilizes the advantages of federal learning to learn the representation of the low-level characteristics of the samples with various fault types and integrates the representation into the final global model, so that the basic information learned by the global model is complete and rich. The global model obtained by pre-training can easily extract low-level features of the fault type samples, so that the extraction capacity of the high-level features is only required to be enhanced on the basis of the model. Therefore, the model structure is kept unchanged, a model patch is inserted into the convolution layer for learning the representation of the high-level features of the sample, then all pre-training model parameters except the classification layer are kept unchanged, and only the model patch parameters and the new classification layer parameters need to be updated until the model patch parameters are adapted to the target task, so that the high-precision fault diagnosis can be performed by using the low-level features extracted by the convolution layer and the high-level features extracted by the model patch.

The model patch is a parameterized neural network for adapting the pre-trained model to the target task, the structure of which is shown in fig. 5. The invention uses a simple residual network architecture to construct a model patch neural network and uses a bottleneck network architecture to minimize the number of parameters. Specifically, the model patch includes a projection down layer, an activation function, a projection up layer, and a shortcut connection; projection the lower layer projects the original k-dimensional channel to d (d < k, e.g., k=8d) dimension by a1 x k x d convolution operation phi _down, while projection the upper layer projects it back to the original dimension by a1 x d x k convolution operation phi _up. One input to the model patchIts output can be expressed as:

sigma (·) is the activation function, which is used by the present invention ReUL;

The model patch has the function of modifying the pre-training network in the fine tuning process, so that the pre-training network can learn the characteristic representation specific to the target task while maintaining the performance of the pre-training model, and reduce the parameters required to be trained for adapting to the target task, thereby reducing the dependence on target task data and avoiding the problem of overfitting in the target task.

In the step S4, the method for obtaining the loss function of the model migration to the target task includes: taking K target task samples without class labels to input a model, calculating the probability that a sample j ⁿ (n epsilon 1, …, K) belongs to each class, taking the class with the highest probability as a pseudo label q (q represents the q-th class fault class, q epsilon 1, …, C) of the sample, and then the probability that the target task sample belongs to the pseudo label is as follows:

In the middle of Is the model at the parameter/>And the probability that the predicted sample j ⁿ belongs to category q under v,Is the q-th category in the original output of the model at the parameter/>And a score under v.

S5, as the parameters of the pre-training model except the classification layer are kept unchanged, taking the model migration loss function as a total loss function of the personalized federal migration learning model, training the total loss function to convergence, and completing training; taking the sum of negative logarithms of probabilities that all target task samples belong to corresponding pseudo tags as a loss function of model migration adaptation target tasks, and specifically expressing as follows:

Using random gradient descent method to reduce loss function Training to convergence to enable the fine-tuned model to adapt to a target task, and obtaining the optimal parameter/>And v ^*, completing training of the personalized federal transfer learning model, and obtaining a personalized model specific to the target task.

S6, completing classification of the target task sample to be tested by using the trained personalized federal transfer learning model: and inputting the target task sample to be tested into a trained PFTL model, calculating the probability that the target task sample to be tested belongs to each category, and taking the category with the highest probability as a category label (fault type) of the target task sample to be tested, thereby completing the whole fault diagnosis process of the rolling bearing.

In general, after migrating the pre-trained model to the target task, fine tuning the model and then updating the parameters of the model patchAnd new classification layer parameters v while keeping the other pre-trained model parameters unchanged. Therefore, only a small part of parameters are required to be relearned to adapt to the target task, the dependence of the model on target task data is reduced, the problem of overfitting is avoided, and the training speed is improved. Therefore, the personalized model trained by PFTL aiming at the target task can realize high-precision fault diagnosis on the target task sample to be tested under the conditions of large sample distribution difference, few effective fault samples, unbalanced sample numbers of different fault types and the like.

The diagnostic effect of the present invention is further verified by experiments as follows.

(1) The experimental device comprises: the rolling bearing data used in the experiments was the kesi Chu Da study bearing dataset. As shown in fig. 6, the experimental apparatus is: the fan end bearing is a 6203-2RS JEM SKF deep groove ball bearing; a 1.5KW motor; the driving end bearing is a 6205-2RS JEM SKF deep groove ball bearing; a torque sensor/encoder; a power tester. Before the experiment starts, electric spark machining is used to apply single-point damage (note: 0.7112 mm damage in data set is used for NTN bearing) with diameters of 0.1778, 0.3556 and 0.5334 mm to the outer ring, the inner ring and the rolling body of the SKF bearing, and because the damage positions of the outer ring are relatively fixed, the damage points can directly influence the vibration response of the motor/bearing system at different positions of a bearing load area, and in order to quantify the influence, the damage points of the outer ring of the driving end bearing and the fan end bearing are respectively placed at three different positions of 3 o ' clock, 6 o ' clock and 12 o ' clock. The processed faulty bearings were reinstalled into the motor and experimental data recorded under motor load conditions of 0,1, 2 and 3 horsepower, respectively. And an acceleration sensor is respectively arranged above the bearing seats at the fan end and the driving end of the motor and is used for collecting vibration acceleration signals of the fault bearing, a 16-channel data recorder is used for collecting vibration signals, sampling frequencies are 12kHz and 48kHz, a torque sensor/decoder is used for measuring power and rotating speed, and finally, operation data of all working conditions and the fault bearing are completely collected.

The data sets are divided into four categories according to sampling rate and fault bearing location: 48k reference, 12k drive end fault, 48k drive end fault, and 12k fan end fault. The motor load and the motor rotating speed are divided into four working conditions in each category: load 0hp speed 1797rmp (condition 1), load 1hp speed 1772rmp (condition 2), load 2hp speed 1750rmp (condition 3), and load 3hp speed 1730rmp (condition 4). Each operating mode includes a data set of rolling element faults, inner ring faults and outer ring faults. The outer ring faults are further divided into three categories with respect to the load area according to the fault location: center (6 o ' clock position), orthogonal (3 o ' clock position) and relative (12 o ' clock position). Each fault can be further divided into three types according to the fault size: 0.1778 mm, 0.3556 mm and 0.5334 mm. Because the data in different directions of the outer ring faults are not greatly different, the invention only takes the data of the center (6 o' clock position) as the outer ring fault data. The fault types in each condition in the dataset can be ultimately classified into 10 categories: normal, failure 1 (inner ring failure, failure size 0.1778 mm), failure 2 (inner ring failure, failure size 0.3556 mm), failure 3 (inner ring failure, failure size 0.5334 mm), failure 4 (outer ring failure, failure size 0.1778 mm), failure 5 (outer ring failure, failure size 0.3556 mm), failure 6 (outer ring failure, failure size 0.5334 mm), failure 7 (rolling body failure, failure size 0.1778 mm), failure 8 (rolling body failure, failure size 0.3556 mm), and failure 9 (rolling body failure, failure size 0.5334 mm).

(2) PFTL parameter settings: the initial global model parameters are assigned by random initialization, the federal learning initialization training communication round r=3, the client local model learning rate eta _o =0.001 in the initialization training process, and the model parameters after the initialization training can reflect the uncertainty between the inside of the client and the client to a certain extent; in the personalized adjustment phase and the subsequent model migration phase of federal learning, the learning rate η=0.0006. PFTL after the network structure and parameters were set, they remained unchanged in all experiments below.

(3) Experiment and analytical comparison: in the fault diagnosis experiment, bearing data under the working condition 1 is used as a source task sample to carry out fault diagnosis on bearing data (namely a target task sample) under the working condition 2. Before the experiment starts, 200 samples are randomly extracted from each fault type in the bearing data of the working condition 1 to serve as source task samples for the experiment, and 100 samples are randomly extracted from each fault type in the bearing data of the working condition 2 to serve as target task samples for the experiment.

The method of the invention uses 10 source task samples of fault types as client inputs of federal learning to pretrain, directly uses a model for completing pretraining to carry out fault diagnosis on target task samples, and the average accuracy rate of the fault type diagnosis after the experiment is repeated 10 times is shown in figure 7. As can be seen from fig. 7, the model only after pre-training has a certain generalization performance, although the average diagnosis accuracy of the target task sample with unknown distribution reaches more than 60%, the model needs to be further adapted to the target task due to lack of exploration of advanced features of the target task.

After the pre-training is completed, the adaptation of the model to the target task is continuously completed according to the flow of the invention. The graphs of the loss function and the accuracy in this process are shown in fig. 8 (a) and (b), respectively. In the experiment, in order to completely adapt to the target task, 30 epochs are trained in total, as clearly shown in fig. 8, the model gradually converges with the increase of iteration times, when epochs=15, we can see that the accuracy reaches more than 95%, the loss is close to zero, the fluctuation of the subsequent curve is smaller, the gradual trend is smooth, the convergence speed of the fine-tuned pre-trained model in the process of adapting to the target task is very fast, and the fine-tuned pre-trained model has strong feature extraction capability and learning capability.

And then testing classification performance of the model under the condition of unbalanced sample number, keeping the target task samples unchanged, taking source domain samples from bearing data under the working condition 1 again, taking 200 samples of normal and fault 1 according to the proportion of 5:4:3:2:1, 160 samples of fault 2 and fault 3, 120 samples of fault 4 and fault 5, 80 samples of fault 6 and fault 7, and 40 samples of fault 8 and fault 9. And then training the model completely according to the flow in the second section and classifying the fault types of the target task samples by using the model, wherein the final result is represented by a confusion matrix as shown in fig. 9, so that the accuracy is still influenced by the number of the source task samples, but the final average test accuracy reaches 96.7%, and the stronger classification capability is shown. In addition, to demonstrate that the model can learn feature representations of different fault types effectively, we visually represent the features extracted by the model with t-SNE, as shown in FIG. 10, it can be seen that the distance between the same types is relatively close and the separation between the different types is apparent.

Finally, the classification accuracy and the average accuracy of the method provided by the invention for 10 fault types are combined with other three migration learning methods, namely: multi-layer domain adaptation (MLDA), improved migration joint allocation (ENHANCED TRANSFER joint mapping, ETJM) and domain migration multi-core learning (domain transfer multiple KERNEL LEARNING, DTMKL) are compared. The three methods used the same number and ratio of unbalanced samples as the method of the present invention, and in order to reduce the error caused by randomness, each method took the average of the 20 experimental results as the final result, which is shown in table 1. As can be seen from table 1, although the imbalance in the number of source task samples causes the classification accuracy of each method for 10 fault types to fluctuate, it is obvious that PFTL is least affected; and PFTL has higher classification accuracy and average accuracy than the other three methods.

Table 1 comparison of accuracy of methods

Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims

1. The rolling bearing fault diagnosis method based on personalized federal transfer learning is characterized by comprising the following steps of:

2. The rolling bearing fault diagnosis method based on personalized federal transition learning according to claim 1, wherein in the step S2, a specific implementation method for introducing a bayesian hierarchical model to optimize parameter estimation is as follows: a two-stage Bayesian hierarchical model is adopted, and the two-stage Bayesian hierarchical model comprises a top layer and a bottom layer: estimating sharing parameters sent to the client by the federal learning center by using a top layer model, and analyzing the distribution and characteristics of the internal data of the client by using a bottom layer model; in order to describe the uncertainty inside the client and between the clients, the empirical variance of the local optimal parameters and the empirical variance of the global sharing model parameters are used for quantization respectively, and the calculation formula is as follows:

In the middle of And/>Representing local and global empirical variances,/>, respectivelyAnd/>Respectively represent local optimum parametersAnd a mean value of the global shared model parameter θ ⁽ⁱ⁾.

3. The rolling bearing fault diagnosis method based on personalized federal transition learning according to claim 2, wherein the specific implementation method of step S3 is as follows: under the federal learning framework, the central server S and the client M perform iterative update with a communication round of T, m=1, 2, …, M; in the t-th iteration, the specific updating process of the client m is as follows:

S32, calculating a local training initial valueAnd training step number l _m:

S33, constructing a local loss function by using the cross entropy function:

Where w _m is the weight of client m;

Steps S31 to S34 are repeatedly performed until the global loss function L (θ) reaches the minimum.

4. The rolling bearing fault diagnosis method based on personalized federal transition learning according to claim 1, wherein in a pre-training stage, a deep neural network is used for model training of a client and a central server, the network structure comprises four convolution layers, an average pooling layer, a full connection layer and a classification layer which are sequentially connected, the output of the convolution layer 2 and the input of the convolution layer 1 are connected to be used as the input of the convolution layer 3, and the output of the convolution layer 4 and the input of the convolution layer 3 are connected to be used as the input of the average pooling layer; the model patches are added after the convolution layer 2 and the convolution layer 4 respectively; the model patch comprises a projection downward layer, an activation function, a projection upward layer and a shortcut connection; the projection down layer projects the original k-dimensional channel to the d-dimension by a 1 x k x d convolution operation phi _down, while the projection up layer projects it back to the original dimension by a 1 x d x k convolution operation phi _up.

5. The rolling bearing fault diagnosis method based on personalized federal transition learning according to claim 1, wherein in the step S4, the method for obtaining the loss function of the model transition to the target task is as follows: taking K target task samples without class labels to input a model, calculating the probability that a sample j ⁿ belongs to each class, taking the class with the highest probability as a pseudo label q of the sample, wherein the probability that the target task sample belongs to the pseudo label is as follows:

In the middle of Is the model at the parameter/>Probability of predicting sample j ⁿ belonging to class q under θ and v,/>Is the q-th category in the original output of the model at the parameter/>Scores at θ and v, n ε 1, …, K.

6. The rolling bearing fault diagnosis method based on personalized federal transition learning according to claim 1, wherein in step S5, the sum of negative logarithms of probabilities that all target task samples belong to their corresponding pseudo labels is used as a loss function of model transition adaptive target tasks, specifically expressed as follows:

Using random gradient descent method to reduce loss function Training to convergence to enable the fine-tuned model to adapt to a target task, and obtaining the optimal parameter/>And theta ^* and v ^*, completing training of the personalized federal migration learning model, and obtaining a personalized model specific to the target task.