CN116910652A

CN116910652A - Equipment fault diagnosis method based on federal self-supervision learning

Info

Publication number: CN116910652A
Application number: CN202310893683.7A
Authority: CN
Inventors: 刘振宇; 郑皓文; 刘惠; 谭建荣
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-07-20
Filing date: 2023-07-20
Publication date: 2023-10-20

Abstract

The invention discloses a device fault diagnosis method based on federal self-supervision learning. Comprising the following steps: firstly, initializing the weight of a feature extractor by a server and transmitting the weight to each client; then, each client acquires signals generated when the local equipment works by using a sensor and records the signals as local vibration data, so as to obtain a non-tag data set and a tag data set; then, each client trains the local feature extractor under the federal self-supervision learning framework respectively, and further obtains the trained local feature extractor; respectively training classifiers by each client under a supervision and learning framework to obtain corresponding client classifiers, and connecting the feature extractor and the client classifiers to form a fault diagnosis model; and finally, performing equipment diagnosis by using a fault diagnosis model of the client. The invention solves the problems that the fault data set of the rotating equipment is smaller and dispersed, and the high-precision diagnosis model is difficult to train due to the lack of labels.

Description

Equipment fault diagnosis method based on federal self-supervision learning

Technical Field

The invention belongs to the field of fault diagnosis of rotating equipment, relates to an equipment fault diagnosis method in the fields of machine learning, deep learning and time sequence classification, and in particular relates to an equipment fault diagnosis method based on federal self-supervision learning.

Background

Rotating equipment has found widespread use in the modern industry and is becoming increasingly complex and more sophisticated, such as aeroengines, gas turbines. When the rotating equipment has faults such as bearing damage, blade fracture and the like, serious accidents can be caused, and huge economic losses are caused. Therefore, the method and the device have the advantages that the state of the running device is accurately identified, the device is intervened in time when early symptoms of faults occur, and the method and the device have important significance in the aspects of improving production efficiency, reducing disaster loss and the like.

At present, fault diagnosis of rotating equipment mostly belongs to a data driving method, and a mapping relation from a vibration signal to an equipment state is excavated through a large amount of data. Common methods include support vector machines, decision trees, neural networks, and the like. The deep neural network is one of the main current research directions because of the capability of automatically extracting the characteristics. The method disclosed in the patent 'a method for diagnosing faults of cement production rotating equipment based on machine learning' uses a one-dimensional convolutional neural network and a fully-connected neural network to extract vibration characteristics, and then uses ensemble learning to obtain diagnosis results from a plurality of classifiers. The method disclosed by the patent 'a rotary equipment fault diagnosis method, a system and a readable storage medium based on a depth residual error network' uses the depth residual error network to extract fault characteristics from vibration signals.

While existing methods reach a fairly high level of diagnostic accuracy, they use large-sized, fully labeled data set training models, which are limited in many instances. Data starvation annotation is the most common limitation. The vibration signal needs to be marked by an expert with field knowledge, the cost is high, and in practical application, only a small part of data is marked. Data security is also a consideration. Different customers use the device under different conditions. To ensure model robustness, data should be collected for all customers as much as possible. Customers may be reluctant to share their own data for training a more superior model for benefit considerations or concerns about risk of data leakage.

Disclosure of Invention

In order to solve the problems and needs in the background art, the invention provides a device fault diagnosis method based on federal self-supervision learning, which can train a fault diagnosis model by using a plurality of scattered and unshared small-sized data sets lacking labels and is used for online diagnosis. The method can train the efficient fault diagnosis model under the conditions of smaller fault data sets, scattered fault data sets and lacking labels.

The specific technical scheme of the invention comprises the following steps:

s1: the server initializes the weight of the feature extractor and transmits the weight to each client, and each client uses the weight as the initial weight of the local feature extractor;

s2: each client acquires signals generated when the local equipment works by using a sensor and records the signals as local vibration data, and then, the local vibration data are preprocessed to obtain a label-free data set and a label-bearing data set;

s3: under the federal self-supervision learning framework, each client trains a local feature extractor by using the unlabeled data set, so as to obtain a trained local feature extractor;

s4: each client trains the classifier by using the label data set under the supervision and learning framework to obtain a corresponding client classifier, and in each client, the current feature extractor is connected with the client classifier to form a fault diagnosis model;

s5: and preprocessing sensor data of the equipment to be diagnosed, and then accessing the sensor data into a fault diagnosis model of a corresponding client to obtain a corresponding equipment diagnosis result.

In the step S1, the feature extractor adopts a convolution neural network with residual connection.

In S2, each client performs the following steps:

s21: collecting signals generated by local equipment during working by using an accelerometer and recording the signals as local vibration data;

s22: randomly selecting local vibration data with preset proportion, marking the selected local vibration data according to the real state of the equipment to obtain an initial labeled data set, and marking the unselected local vibration data as an initial unlabeled data set;

s23: dividing all signals of an initial labeled data set and an initial unlabeled data set into a plurality of sections by using a sliding window to obtain divided labeled data sets and cut unlabeled data sets respectively;

s24: and respectively carrying out numerical scaling on the segmented labeled data set and the segmented unlabeled data set by using a maximum-minimum method to obtain a final unlabeled data set and a final labeled data set.

The step S3 is specifically as follows:

s31: in each round of training, each client trains a local feature extractor by using an unlabeled data set under a self-supervision learning framework, obtains the weights of the local feature extractors of each client after the current round of training and uploads the weights to a server;

s32: after the server aggregates the local feature extractor weights of all the clients, obtaining global feature extractor weights and transmitting the weights to the local feature extractors of all the clients;

s33: repeating S31-S32, updating the weights of the global feature extractors in multiple rounds until the preset rounds are reached, and transmitting the final weights of the global feature extractors to the local feature extractors of all clients, so that all clients obtain trained local feature extractors.

In S31, each client performs the following steps:

s311: carrying out data enhancement on each piece of non-tag data in the non-tag data set by using two different data enhancement methods to obtain a corresponding enhancement sample pair;

s312: and training the data set for one round by using the enhanced sample corresponding to the unlabeled data set under the self-supervision learning framework, obtaining the weight of the local feature extractor after the current round of training, and uploading the weight to the server.

In the S311, the first data enhancement sampleFirstly adding Gaussian noise to each piece of label-free data, and then scaling the data added with the noise to obtain the label-free data; second data enhancement sample->Is obtained by smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise.

The local feature extractor outputs a first feature matrix in the training processAnd a first feature matrix->The feature extractor weights are optimized using a gradient descent algorithm, the loss function comprising a first loss function and a second loss function, the first loss function loss1 having the following formula:

wherein N is the batch size, alpha and beta are the first and second super parameters respectively,is an indication function; i and j respectively represent a first index and a second index of samples in the batch; l (L) _c For the context contrast loss function, l _t For time contrast loss function->Representing a first clipping feature matrix,/a>Representing a second clipping feature matrix, s1 and s2 representing first and second starting positions, respectively, and e1 and e2 representing first and second ending positions, respectively; t represents the length of the feature matrix time dimension; />Representing a second clipping feature matrix->The characteristics of sample i at time step t +.>Representing a first clipping feature matrix->The characteristics of sample i at time step t, < >>Representing a first clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample j of (a);

the calculation formula of the second loss function loss2 is as follows:

wherein ,first feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix->First feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix->Representing features extracted by each client using global feature extractor weights received from the server at the time of the r-th round of training.

In S32, the server uses a weighted average method to aggregate the local feature extractor weights of all clients, and the calculation formula is as follows:

where k is the client index, |D|, |D ^k The I represents the data volume of the global client and the data volume of the kth client respectively, θ ^G ，θ ^k The weight of the global feature extractor and the weight of the local feature extractor of the kth client are represented, respectively.

In S4, each client performs the following steps:

s51: inputting the tag data set into a local feature extractor of the updated weight to obtain a feature matrix data set;

s52: and taking the feature matrix data set as the input of the support vector machine, and training the support vector machine to obtain the client classifier.

In the step S5, the signal segmentation is performed on the sensor data of the device to be diagnosed by using the sliding window, and the normalization processing is performed by using the maximum-minimum method, and then the signal segmentation is input into the fault diagnosis model of the corresponding client.

In the method of the invention, a feature extractor effective for all clients is trained from scattered client data by using federal learning; during the training process, self-supervised learning is used to learn useful knowledge from a large amount of unlabeled data; the final performance of the classifier is improved from a small amount of tag data using supervised learning.

Compared with the prior art, the invention has the beneficial effects that:

1. model training is performed locally at the client, the client data does not need to be uploaded to the server, and the client does not need to worry about data leakage.

2. The method can learn knowledge from a large amount of unlabeled data, and can play good performance under the condition that the current equipment fault diagnosis application generally lacks labels.

3. The invention adopts contrast type self-supervision learning to train a robust model from a small-sized unlabeled dataset; the fault feature extractor with global knowledge is aggregated from a plurality of clients by adopting federal learning, and local data sharing of the clients is avoided, so that the problems that a fault data set of the rotating equipment is small and scattered, and a high-precision diagnosis model is difficult to train due to lack of labels are solved.

Drawings

FIG. 1 is a flow chart showing the steps of the present invention.

FIG. 2 is a schematic diagram of training and use of the model according to the present invention.

FIG. 3 is a schematic diagram of an experimental bench according to an embodiment of the invention.

Fig. 4 shows three client data distribution cases according to an embodiment of the present invention.

Fig. 5 is a confusion matrix of the first client fault diagnosis result in the embodiment of the present invention.

Fig. 6 is a confusion matrix of fault diagnosis results of the second client according to the embodiment of the present invention.

Fig. 7 is a confusion matrix of fault diagnosis results of a third client according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the kesixi Chu Da bearing failure dataset (CWRU) as data for a specific example.

The test bed of CWRU dataset comprises motor, torque sensor and power tester. The shaft of the motor, on which the accelerometer is mounted and which samples the vibration signal at a frequency of 12KHZ for one second, is supported by the bearing to be measured, and a schematic of the data set of the laboratory table is shown in fig. 3. There are three types of faults in the bearing, namely, inner ring damage, outer ring damage and roller damage. Each fault type can also be subdivided into three severity levels. Therefore, in the normal state, there are ten states in total of the bearing data.

Fig. 1 and 2 illustrate the flow of the present invention, in combination with the implementation of a CWRU data set, specifically including the following steps:

in S1, the feature extractor employs a convolutional neural network with residual connection (res net), which is a widely used deep learning model. The server and the feature extractor structure of the client remain identical and in order to accommodate time series data, the convolution kernel of Resnet is limited to sliding only along the time dimension.

in S2, each client performs the following steps:

s21: collecting signals generated by local equipment during working by using an accelerometer and recording the signals as local vibration data; specifically, as shown in fig. 3, an accelerometer is mounted at a position of the motor near the bearing. The motor is operated and at some later time the accelerometer is activated to sample the ten second signal and the bearing condition is recorded.

S22: randomly selecting local vibration data with preset proportion, marking the selected local vibration data according to the real state of the equipment reflected by the data, and marking the selected local vibration data according to the following steps of 1:2, dividing the ratio into an initial labeled data set and an initial test set, and marking the unselected local vibration data as an initial unlabeled data set; in a specific implementation, the preset ratio is set to 30%.

S23: dividing all signals of an initial labeled data set, an initial unlabeled data set and an initial test set into misaligned sections by using a sliding window with the length of 1024 and the sliding step length of 1024 to respectively obtain a divided labeled data set, an initial unlabeled data set and a test set; the CWRU data set includes vibration data of 161 bearings, and in this embodiment, three clients are taken as an example to describe the vibration data of 161 bearings, so that the vibration data of 161 bearings are split and then divided into local data sets of the three clients according to a non-independent same distribution mode, and the data distribution is shown in fig. 4, so that the data distribution of the three clients has a large difference. The ratio of the amount of data of the local tagged data set, the untagged data set, and the test set for each client is about 1:7:2.

s24: respectively carrying out numerical scaling on the segmented labeled data set, the segmented unlabeled data set and the segmented test set by using a maximum-minimum method, and scaling to a [0,1] interval to obtain a final unlabeled data set, a final labeled data set and a final test set;

s3 specifically comprises the following steps:

in S31, each client performs the following steps:

in S311, the first data enhancement sampleFirstly adding Gaussian noise to each piece of label-free data, and then scaling the data added with the noise to obtain the label-free data; second data enhancement sample->Is obtained by smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise. A batch of unlabeled dataset is expressed as: x is x _train ＝{x ¹ ，x ² ，...，x ^N -where x e R ^L N is the batch size. In this embodiment, data for one batch is sampled from a standard gaussian distribution using the same noise and scaling factor.

S312: and training the data set for one round by using the enhanced sample corresponding to the unlabeled data set under the self-supervision learning framework, obtaining the weight of the local feature extractor after the current round of training, and uploading the weight to the server. In this embodiment, resNet is used to extract a feature representation of the enhanced sample. The convolution kernel size of Resnet is fixed to [3,1] and is limited to sliding only along the time dimension. The output dimension of the Resnet, i.e. the length of the feature representation, is set to 64.

The local feature extractor f outputs a first feature matrix during trainingAnd a first feature matrix-> T represents the length of the first feature matrix time dimension, i.e. the number of time steps. Feature extractor weights are optimized using a gradient descent algorithm, which is performed by the present embodiment using an Adam optimizer, with a learning rate set to 3e ^-4 . The loss function includes a first loss function and a second loss function, and the calculation formula of the first loss function loss1 is as follows:

wherein N is the batch size, alpha and beta are respectively a first super parameter and a second super parameter, and are used for adjusting l _c And/l _t Is added to the weight of the contribution of (a).For the indicator function, when the condition in brackets is met, the value is 1, otherwise, the value is 0; i and j respectively represent a first index and a second index of samples in the batch; l (L) _c For context contrast loss function, the loss function aims to reduce the feature distance between the enhanced sample pair output by the feature extractor and increase the feature distance between the enhanced sample and all other samples; l (L) _t For the time contrast loss function, the function further constrains the output of the feature extractor in terms of similarity of time steps. Specifically, the feature distance of the same time step position between the enhanced sample pair is reduced as much as possible, and the feature distance of different time step positions between the enhanced sample pair and the feature distance of different time steps of each sample are enlarged as much as possible. l (L) _c And/l _t Contrast loss terms designed according to the NT-Xent loss function concept, respectively use context-dependent and time-dependent construction supervision information between enhanced samples in the batch data to help the feature extractor benefit from the unlabeled data. For the first feature matrix->And a second feature matrix->By random clipping, cut into shorter sections,/->Representing a first clipping feature matrix, the first feature matrix being +_for the first feature matrix from the first starting position s1 to the first ending position e1>Matrix after clipping +_>Representing a second clipping feature matrix, the second feature matrix being +_for the first feature matrix from the first starting position s2 to the first ending position e2>Matrix after clipping +_>And->In [ s2:e1 ]]Partially overlapping, s1 and s2 represent first and second starting positions, respectively, and e1 and e2 represent first and second ending positions, respectively; t represents the length of the time dimension of the feature matrix, i.e. the number of time steps; />Representing a second clipping feature matrix->The characteristics of sample i at time step t +.>Representing a first clipping feature matrix->The characteristics of sample i at time step t, < >>Representing a first clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample j of (a);

the second loss function loss2 is used for constraining the distance between the feature extractor and the feature extractor of the previous training round, and the calculation formula of the second loss function loss2 is as follows:

wherein ,first feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix->First feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix->Representing features extracted by each client using global feature extractor weights received from the server at the beginning of the training round r.

In this embodiment, the feature to be clipped and retained represents a length of 32, and the starting position of clipping data of one batch is the same, and samples are taken from the uniform distribution U [0,32 ].

S32: after the server aggregates the local feature extractor weights of all the clients, obtaining global feature extractor weights and transmitting the weights to the local feature extractors of all the clients, wherein the local feature extractor weights of all the clients are used as initial weights of the next training round of the local feature extractors of all the clients;

In this embodiment, after the feature extractor training and the model weight uploading are completed by all three clients, the server aggregates the model weights to form new model weights, and sends the new model weights to the clients. The number of training times was set to 40.

in S4, each client performs the following steps:

s41: inputting the tag data set into a local feature extractor of the updated weight to obtain a feature matrix data set;

s42: and taking the feature matrix data set as the input of the support vector machine, and training the support vector machine to obtain the client classifier.

In this embodiment, the test set obtained in step S24 is actually the data to be diagnosed after being preprocessed. And classifying the test set data by using a fault diagnosis model to obtain a device diagnosis result.

Comparing the equipment diagnosis result with the actual result, and selecting the accuracy (Acc) and the macro average F1 score (MF 1) as evaluation indexes to measure the performance of the model. In this example, the index results are shown in table 1.

Table 1 is a model Performance evaluation Table

The evaluation result shows that the method can successfully diagnose the equipment, and has higher diagnosis accuracy, and the method is feasible and effective. Fig. 5, 6, and 7 illustrate confusion matrices for three client diagnostic results. It can be seen that the method of the present invention achieves extremely high diagnostic accuracy despite the large variance in data distribution for each client.

The above embodiment is an implementation of the present invention on a kesixi Chu Da bearing fault data set, but the implementation of the fault diagnosis method of the present invention is not limited to bearings, and any similar scheme that collects equipment operation data through sensors and performs equipment fault diagnosis according to the principles and ideas of the present invention should be regarded as the protection scope of the present invention.

Claims

1. The equipment fault diagnosis method based on federal self-supervision learning is characterized by comprising the following steps of:

2. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 1, wherein in S1, the feature extractor uses a convolutional neural network with residual connections.

3. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 1, wherein each client performs the steps of:

4. The method for diagnosing equipment failure based on federal self-supervised learning according to claim 1, wherein the step S3 is specifically:

5. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 4, wherein each client performs the steps of:

6. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 5, wherein in S311, the first data enhancement sampleFirstly adding Gaussian noise to each piece of label-free data, and then scaling the data added with the noise to obtain the label-free data; second data enhancement sample->Is obtained by smoothly warping the interval of time steps of each piece of unlabeled data and then applying noise.

7. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 4, wherein the local feature extractor outputs a first feature matrix during trainingAnd a first feature matrix->The feature extractor weights are optimized using a gradient descent algorithm, the loss function comprising a first loss function and a second loss function, the first loss function loss1 having the following formula:

wherein N is the batch size, alpha and beta are the first and second super parameters respectively,is an indication function; i and j respectively represent samples in a batchFirst and second indexes of the book; l (L) _c For the context contrast loss function, l _t For time contrast loss function->Representing a first clipping feature matrix,/a>Representing a second clipping feature matrix, s1 and s2 representing first and second starting positions, respectively, and e1 and e2 representing first and second ending positions, respectively; t represents the length of the feature matrix time dimension; />Representing a second clipping feature matrix->The characteristics of sample i at time step t +.>Representing a first clipping feature matrix->The characteristics of sample i at time step t, < >>Representing a first clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample i, of (b)>Representing a second clipping feature matrix->Sample j of (a);

the calculation formula of the second loss function loss2 is as follows:

wherein ,first feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix->First feature matrix extracted by local feature extractor respectively representing kth client kth round>And a first feature matrix-> Representing features extracted by each client using global feature extractor weights received from the server at the time of the r-th round of training.

8. The method for diagnosing a device failure based on federal self-supervised learning as set forth in claim 4, wherein in S32, the server aggregates the local feature extractor weights of all clients using a weighted average method, and the calculation formula is as follows:

9. The method for diagnosing a device failure based on federal self-supervised learning as recited in claim 1, wherein each client performs the steps of:

10. The method for diagnosing equipment failure based on federal self-supervised learning according to claim 1, wherein in S5, the sensor data of the equipment to be diagnosed is segmented by using a sliding window, normalized by using a maximum-minimum method, and then input into a failure diagnosis model of a corresponding client.