CN112924177B

CN112924177B - Rolling bearing fault diagnosis method for improved deep Q network

Info

Publication number: CN112924177B
Application number: CN202110360639.0A
Authority: CN
Inventors: 康守强; 刘哲; 王玉静; 王庆岩; 梁欣涛; 谢金宝; 兰朝凤
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2021-04-02
Filing date: 2021-04-02
Publication date: 2022-07-19
Anticipated expiration: 2041-04-02
Also published as: CN112924177A

Abstract

A fault diagnosis method for a rolling bearing of an improved deep Q network relates to the technical field of fault diagnosis of rolling bearings and is used for solving the problem that in the prior art, when deep reinforcement learning is applied to diagnosis of the faults of the rolling bearing, the diagnosis accuracy is too low due to unbalanced data distribution or variable load. The technical points of the invention comprise: taking the distance from a sample to a central point in a k-means algorithm as the bias of a return value, taking an unbalance ratio as a reference, constructing a personalized return function for a training set, and meanwhile realizing deep extraction of features through an improved residual error network (Resnet-18); the intelligent agent takes the new return function and the time-frequency diagram as input, executes the diagnosis action at each time step, judges and returns a return value; finally, the agent learns the fault diagnosis strategy under the unbalanced data. The method has excellent fault diagnosis results under the conditions of unbalanced data and variable load of the rolling bearing, and can be used for the actual operation of different fault diagnoses of the rolling bearing.

Description

Rolling bearing fault diagnosis method for improving deep Q network

Technical Field

The invention relates to the technical field of rolling bearing fault diagnosis, in particular to a rolling bearing fault diagnosis method for an improved deep Q network.

Background

Rolling bearings are widely used in industrial fields as one of important parts of rotary machines^[1,2]The fault diagnosis is beneficial to preventing equipment accidents^[3]. In the actual work of the rolling bearing, the rolling bearing is in a normal operation state most of the time, the collected vibration signal samples in the normal state are far more than those in a fault state, and a trained model is sensitive to most samples, so that few samples are difficult to identify. In recent years, the study of fault diagnosis of the unbalance of the vibration data of the rolling bearing is widely concerned by researchers, and has strong application value and practical significance for processing actual industrial data.

Under the condition of unbalanced data distribution, two main solutions exist^[4]: from the data perspective, the distribution of the training samples is balanced by resampling the training samples; from the algorithm perspective, the distribution of the training data set is not changed, and a small number of samples are paid more attention by adjusting the classification algorithm.

Methods of data angle include oversampling, undersampling, and mixed sampling. The oversampling method enables a training set to be balanced by adding a few samples, and a document [5] provides a K-information quantity neighbor domain oversampling method, so that the problem of unbalance of bearing fault samples is effectively solved, and a good effect is achieved; the under-sampling method realizes a balance training set by discarding part of most types of data, and in order to solve the balance problem between big data acquisition and diagnosis efficiency, the document [6] proposes to reduce the big data by the under-sampling method in the data acquisition stage, and verifies the effectiveness of the under-sampling method in the fault diagnosis of the rolling bearing; the mixed sampling is a combination of the two, the literature [7] introduces the overall distribution characteristics of a main curve and granulation distribution simulation data to perform reliable oversampling and undersampling, and provides an online sequential prediction method based on an extreme learning machine, and the accuracy in an unbalanced fault diagnosis task is as high as 95-97%. The methods achieve the purpose of improving the classification performance by changing the unbalanced distribution of data.

The method for the algorithm angle mainly comprises a classification threshold value adjusting method, Boosting-based ensemble learning, cost-based sensitive learning and the like. For the method of adjusting the classification threshold, the classification judgment of different output probabilities is changed by adjusting the threshold of the classification boundary of the classifier. Document [8] proposes a weighted softmax loss to solve the problem of unbalanced classification, and validation on three bearing data sets with different degrees of unbalance can effectively deal with the problem of classified unbalance. Based on the idea of ensemble learning, on the premise that a few types of samples are guaranteed to fully participate in training, a plurality of base classifiers are trained in a document [9], a rolling bearing life stage identification method with multiple classifiers integrated, weighted and evenly distributed and adapted is provided, F-score average is up to 0.73, and the few types of samples are effectively identified. Cost sensitive learning gives a greater misclassification cost to a few misclassification instances, which is a common skill. Document [10] combines a random undersampled balance training sample while designing a weighted loss function to optimize the distribution of unbalanced data, and verifies in a PHM2015 plant fault event data set that the accuracy is 2% -3% higher than that of other reference methods.

The data unbalance phenomenon is widely existed in engineering practice, but the fault diagnosis literature for the vibration data unbalance of the rolling bearing is less, and the fault diagnosis literature is a hot spot of current research. Meanwhile, the two solutions have certain limitations, and although the data set is balanced, the data angle improvement method can change the original data distribution and cannot accurately grasp the real situation of the data; the improved method of the algorithm angle is to enable the model to equivalently learn the unbalanced sample, and also belongs to the learning of the characteristic sample. The data imbalance problem is objective, and when the overall feature space of a few types of samples cannot be estimated, the improvement method of the data and the algorithm angle has certain defects. In view of the above problems, document [11] develops a new approach, and the classification problem is modeled as a process of continuous decision making of an agent (agent), and an unbalanced classification model based on a deep Q-network (DQN) is established by applying an exploration-utilization mechanism in deep reinforcement learning. Deep reinforcement learning is a mature framework, and has wide application in classification problems due to a unique feedback mechanism. Document [12] proposes a feature selection method based on deep reinforcement learning, which defines feature selection and classification as a continuous decision problem, determines whether to request a feature again through the selected features when an agent makes a decision each time, and verifies the effectiveness of the method in a plurality of public data sets such as mnist. In the document [13], aiming at the problem that data contains noise, deep reinforcement learning is used for screening high-quality sentence samples, and sample screening and relation classification of noise texts are realized. The literature [14] simulates the classification task into a sequential decision process in reinforcement learning for the first time, provides a classification task solution based on reinforcement learning, and achieves an accuracy rate of 87.4% in UCI eight medical disease data sets. In the document [15], aiming at the problem that the original deep neural network needs manual parameter adjustment and expert experience, an end-to-end diagnosis model is established by using a deep reinforcement learning algorithm DQN, and verification is performed on a data set of a rolling bearing and a hydraulic pump, so that the accuracy rate is between 90% and 94%.

The above documents use reinforcement learning as feature selection, essentially, deep learning is taken as a main part, reinforcement learning is taken as an auxiliary part, and lack of strength is slightly obvious when an unbalanced data set is faced; while document [11] directly simulates the classification task as a continuous decision in deep reinforcement learning, and adapts the model to the unbalanced data set by giving different return values. However, the return value is set only based on the imbalance ratio, only the imbalance between classes is considered, and the importance of data in the classes cannot be distinguished. Meanwhile, when the above document applies deep reinforcement learning to fault diagnosis of a rotating machine, the problems of variable load and imbalance are not discussed, and deep research is still required on feature extraction and return functions.

Disclosure of Invention

In view of the above problems, the present invention provides a rolling bearing fault diagnosis method for improving a deep Q network, so as to solve the problem in the prior art that when deep reinforcement learning is applied to diagnose a rolling bearing fault, the diagnosis accuracy is too low due to unbalanced data distribution or variable load.

A fault diagnosis method for a rolling bearing of an improved deep Q network comprises the following steps:

firstly, acquiring vibration training data and test data of a rolling bearing; the training data comprises samples of different states of the rolling bearing;

secondly, preprocessing the training data and the test data;

quantifying an incentive function in the depth Q network by utilizing K-means clustering to ensure that each sample in the training data has an own incentive value, thereby obtaining a new return function;

step four, inputting the training data and the return function as a deep Q network model, and performing deep reinforcement learning training to obtain an improved deep Q network model;

and fifthly, inputting the test data into the deep Q network model to obtain a fault diagnosis result of the rolling bearing.

Further, in the first step, the training data are collected under 4 load conditions, and the load types comprise 0hp, 1hp, 2hp and 3 hp; the different states include a normal state, an outer ring fault state, an inner ring fault state and a rolling body fault state.

Further, the preprocessing in the second step includes data enhancement, and short-time fourier transform is performed on the training data to obtain a two-dimensional time-frequency domain image.

Further, in the third step, in the process of clustering through the K mean value, the samples with clustering errors are removed, and new samples are obtained again until the clustering is correct.

Further, the specific process of the third step comprises:

step three, firstly, taking a central point of the K-means cluster as a basic point of return values of different categories, wherein a return value calculation formula of the central point is as follows:

wherein t represents time; s_tRepresents the sample at time t; a is_tRepresenting the action at the time t; y is_tA sample label representing time t; 1/rho E [0,1]]，ρ＝D_N/(D_F/9) denotes the imbalance ratio, D_NIs a majority of normal samples, D_FAre a minority of fault samples;

step two, then, within the category, the Euclidean distance between each sample and the center point of the cluster to which the sample belongs is used for quantifying the return function, and the specific steps comprise:

step three, two and one, clustering the training data into 10 clusters C ═ C₁,C₂,…,C₁₀Randomly choosing k cluster centers, i.e. { mu }₁,μ₂,…,μ_k}；

Step three, two, updating the cluster corresponding to the sample in the training data, wherein the minimum loss function is as follows:

wherein x represents an initial sample; mu.s_iRepresents a cluster C_iA center point of (a); k represents the number of categories of the cluster;

step three, two and three, updating each cluster center mu₁,μ₂,…,μ_k：

Step three, step four, traverse all cluster classifications and find the optimum solution of the minimum loss function, until the cluster label reaches the convergence precision, otherwise loop iteration step three two to step three two four;

step three, step two, finally determining k cluster center points, namely the reward function value center points corresponding to k classifications, and normalizing the distance between each sample and the center point;

step three, step two, step six, through comparing the reward function value that different categories correspond to and each sample in the same category and distance to the centre, quantify the reward function value; wherein, the quantization formula is:

wherein, Dis(s)_t) Representing the distance of each sample from the center point after normalization.

Further, in the third step, the Principal Component Analysis (PCA) is firstly adopted to reduce the dimension of the training data before clustering.

Further, in step four, the deep features are extracted by an improved Resnet-18 network, and the improvement of the Resnet-18 network is that: and outputting the Q values corresponding to the states and actions instead of outputting all the Q values through the full connection layer so as to accelerate the network training speed.

Further, the process of deep reinforcement learning training in step four includes: the experience of each time step, namely the current bearing fault state s, is preset_tFault diagnosis action taken a_tThe immediate reward obtained r_tAnd the next state s' is stored in the memory as memory, randomly sampled from the memory during training, and the loss function of the depth Q network is updated by using a gradient descent method; wherein the memory is a fixed length sequence.

And further, in the fifth step, performing final fault diagnosis on the rolling bearing through a Softmax classifier.

The beneficial technical effects of the invention are as follows:

the invention provides a method for constructing a return function in reinforcement learning by combining k-means and spatial distance, which quantifies the return value of each sample to obtain a more accurate return value; the modified Resnet-18 network is adopted to optimize the original DQN model, so that the original model is deeper and better in generalization, and the stability and the diagnosis accuracy of the model are improved; the improved DQN model can well solve the problem that data distribution of a vibration data normal state and a vibration data fault state is unbalanced, the G-meantotal score under the unbalanced and variable load condition reaches about 0.982, and the performance is excellent; when the data are extremely unbalanced, the accuracy rate reaches 97% -99%, and the part reaches 100%, and compared with the traditional unbalanced classification method, the accuracy rate is improved by 5% -8%.

The model provided by the invention is stable under the conditions of unbalance and variable load, the overall performance is superior to a DQN network before improvement and a traditional CNN network, and the model is superior to the original unbalanced solution based on data, and presents another thinking mode of deep reinforcement learning and the potential of deep reinforcement learning as a mature learning framework.

Drawings

The invention may be better understood by referring to the following description in conjunction with the accompanying drawings, in which like reference numerals are used throughout the figures to indicate like or similar parts. The accompanying drawings, which are incorporated in and form a part of this specification, illustrate preferred embodiments of the present invention and, together with the detailed description, serve to further explain the principles and advantages of the invention.

Fig. 1 is a schematic diagram of a training procedure of a DQN model;

FIG. 2 is a basic flow diagram of a Resnet-18 network after modification of the present invention;

FIG. 3 is a schematic flow diagram of the present invention;

FIG. 4 is a diagram illustrating pre-processing results of different state data according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a result of comparing accuracy indexes of the method of the present invention with those of three other methods (DQN model, traditional oversampling methods SMOTE, CNN method) in the first embodiment of the present invention;

FIG. 6 is a schematic diagram showing the result of comparison between the G-meantotal index in the method of the present invention and the G-meantotal index in the first embodiment of the present invention in three other methods (DQN model, SMOTE and CNN methods);

FIG. 7 is a graph of the training return for an imbalance ratio of 1 and a load of 0 according to an embodiment of the present invention;

fig. 8 is a visualization diagram of DQN network characteristics according to a first embodiment of the present invention; the method comprises the following steps of (a) representing a DQN network characteristic visualization graph before improvement, (b) representing a DQN network characteristic visualization graph after improvement;

fig. 9 is a schematic diagram of the result of comparison between the method of the present invention and the other three methods (DQN model, traditional oversampling methods SMOTE, CNN method) in the second embodiment of the present invention in the accuracy index;

FIG. 10 is a diagram illustrating the results of comparison between the G-meantotal index of the method of the present invention and the G-meantotal index of the other three methods (DQN model, SMOTE and CNN methods) in the second embodiment of the present invention;

fig. 11 is a visualization diagram of the improved DQN network characteristics in the second embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an actual implementation are described in the specification. In order to avoid obscuring the invention with unnecessary detail, only the device structures and/or processing steps that are germane to the solution according to the present invention are shown in the drawings, while other details that are not germane to the present invention are omitted.

The invention provides a rolling bearing fault diagnosis method for improving DQN, which simulates the fault diagnosis process into a sequential decision process in a DQN model, the model takes a two-dimensional image data set constructed by vibration signals and a return function designed according to a k-means algorithm as input, deep features are extracted through Resnet-18, diagnosis action is executed at each time step, the return value is judged and returned, and multi-state intelligent fault recognition of unbalanced rolling bearing vibration data is realized.

First, the Deep Q Network (DQN) theory is explained.

The core theory of the deep Q network is: the environmental state is perceived by an agent and the goal of maximum profit is achieved in the process of interacting with the environment.

In the judgment process of the intelligent agent, the strategy function pi receives a state sample s_tAnd returning to the action a of the state according to a certain probability P_t(type of failure).

π(a|s)＝P(a_t＝a|s_t＝s) (1)

The goal of the agent is to identify the training set samples as correctly as possible,the agent receives a positive reward R when the sample is correctly identified, achieving its goal G by maximizing the jackpot_t：

Where γ is a weighted value.

The decision made at time t is that there is a return only at time t +1, where γ determines the importance of future returns, and when γ equals 0, agent always looks at immediate returns; when γ is 1, agent always looks at long-term returns. The expected return V in state s is then:

equation (3), also known as the Bellman equation, exhibits a recursive relationship between current returns and future returns. At the same time, a Q function is introduced, returning the expected future reward of the action in that state:

Q^π(s_t,a_t)＝E[r_t+1+γQ^π(s_t+1,a_t+1)|s_t,a_t] (4)

wherein, the strategy sign pi is a strategy function, and the action a belongs to the action set A.

At strategy π, V(s) is the expectation of long-term return for all possible actions, the optimal strategy corresponds to optimal V and Q values, i.e.:

the Q function uses a table to store the (s, a, r, s') queue. When the dimension of the space state is large, the Q Learning algorithm and the Deep Learning are combined to form a Deep Q-Learning (DQN) algorithm^[16]. The DQN model adds 'depth' to Q-Learning, has qualitative improvement on the accuracy and is widely applied to many fields. The DQN training flow diagram is shown in fig. 1.

In the DQN model, a deep neural network is used to approximate an action value function Q, then the network is trained, and a network parameter theta is updated to minimize a loss function, wherein the loss function is shown as a formula (8):

L(θ_i)＝E_s,a,r,s′[(y_i-Q(s,a|θ_i))²] (8)

wherein, y_iIs defined as:

wherein s' is the next state of s; a 'is the action performed by the agent in state s'.

The loss function L is derived for the parameter θ as:

while obtaining the optimal Q function through the minimum loss function of the formula (10), the formula (8) obtains the maximum accumulated reward, and obtains the optimal classification strategy pi: S → A

In order to increase the depth of the network and reduce the overfitting problem of the model, the introduction of Resnet-18 network is proposed^[17]A DQN network model is built, a residual error network (Resnet-18) is used for fitting a Q function, and the network part is improved, wherein the specific change setting is shown in the following table 1. Meanwhile, in order to make the structure more convenient in calculation, all Q values are not output through the full connection layer, but the Q values corresponding to states and actions are output, and the network training speed is accelerated. Basic flow of modified networkThe block diagram is shown in fig. 2.

Table 1 parameter settings for residual networks

Then, a distance quantification reward function of the weighted samples includes: designing a reward function according to the unbalanced ratio of the training set and quantifying the reward function by balancing Euclidean distance of the samples.

1) Designing a reward function according to the imbalance ratio of the training set: the few classes of samples are difficult to identify correctly in the unbalanced data set, and in order to better identify the few classes of samples, the model should give it more attention, with a greater reward or penalty being obtained when the agent encounters the few classes of samples. The reward function is defined as shown in equation (11):

wherein 1/rho is equal to [0,1]]，ρ＝D_N/(D_F/9) denotes the imbalance ratio, D_NIs a majority of normal samples, D_FAre a minority sample of faults, y_tIs state s_tSample label below. When the agent correctly/incorrectly classifies most samples, the reward value is 1/-1; when the agent correctly/incorrectly classifies a few classes of samples, the reward value is ρ/- ρ.

2) The reward function is quantified by weighing the euclidean distance of the sample: in order to better identify a few types of samples, the invention quantifies the reward function using K-means clustering so that each sample has its own reward value. The K-means clustering algorithm is an unsupervised clustering algorithm, takes distance as a standard for measuring similarity between data objects, the similarity is inversely proportional to the distance, and finally the data are divided into K clusters.

Taking the central point of the K-means cluster as a basic point of return values of different classes, wherein the return value of the central point is shown as a formula (11); within the category, the return function is quantified by utilizing the Euclidean distance between each sample and the center point of the cluster to which the sample belongs, and the method comprises the following specific steps:

the training set has larger dimension of each picture, and a Principal Component Analysis (PCA) method is firstly adopted^[18]To perform dimensionality reduction on the data and then cluster the data set into 10 clusters C ═ C₁,C₂,…,C₁₀Randomly choosing k cluster centers, i.e. { mu }₁,μ₂,…,μ_k}. Updating the cluster corresponding to the sample, wherein the minimum loss function is as follows:

wherein s is_tFor training set samples, μ_iIs a cluster C_iK is the number of clusters.

Updating each cluster center mu₁,μ₂,…,μ_k：

And traversing all cluster classifications to find the optimal solution of the problems until the cluster label reaches the convergence precision, and otherwise, continuously repeating the steps of minimizing the loss function and updating the center of each cluster. And finally determining k cluster center points, namely the reward function value center points corresponding to the k classifications, and normalizing the distance between each sample and the center point.

And quantifying the reward function values by comparing the reward function values corresponding to different categories with the distance between each sample in the same category and the center.

The influence of the reward function designed by the invention on the fault diagnosis accuracy is analyzed as follows.

Suppose that multiple samples and few samples are denoted s, respectively⁺And s^-Their target Q values are respectively expressed as y⁺And y^-The target Q value can be expressed by equations (9) and (15):

y⁺＝(-1)^1-B(a＝y)z(s,a)+γmax_a'Q(s′,a′) (16)

y^-＝ρ(-1)^1-B(a＝y)z(s,a)+γmax_a'Q(s′,a′) (17)

where, b (x) is a decision function, and returns the decision results of the action and the tag.

Rewriting of loss function of deep Q network to multi-sample loss function L₊(θ_i) Sum few samples loss function L_-(θ_i) The derivatives are:

by bringing formulae (16), (17) into L₊(θ_i) And L_-(θ_i) Obtaining:

in the equation (20), the third term is associated with the minority class, and the second term is associated with the majority class. For unbalanced data set (N)>F) If ρ is 1, the direct prize value of the samples in the two categories is the same, but there are more samples in the majority of the categories, so the value in the second term is greater than the value in the second categoryThird, the trained model will be biased to a majority of classes; if rho is D_N/D_FRho is added with a few types of immediate returns, so that the influence on the network loss function is enhanced; in addition, the invention fully considers the class spacing and the class inner spacing of the majority class and the minority class, so that each sample has respective reward value on the premise of ensuring the unbalance ratio.

In summary, the flow of the fault diagnosis method for the unbalance of the vibration data of the rolling bearing according to the present invention is shown in fig. 3, and includes the following steps:

1) carrying out Short Time Fourier Transform (STFT) on vibration signals of multiple states (10 types including normal, inner ring different fault degrees, outer ring different fault degrees and rolling element different fault degrees) of a rolling bearing in a training set subjected to data enhancement to obtain a two-dimensional Time-frequency domain image, determining the central points of various samples by a k-means method, and endowing each sample with different return values by using Euclidean distance;

2) in order to perform screening and screening on the data generated in the step 1), the samples with wrong clustering are removed while clustering is performed by a k-means method, and a new sample is applied to the model until the clustering is correct;

3) aiming at the problem of sparse reward common in the DQN network, the experience(s) of each time step is_t,a_t,r_tS') that is the current bearing fault condition s_tFault diagnosis action taken a_tThe obtained immediate report r_tAnd the next state, when a memory is stored in the experience pool, the memory is a fixed length sequence. During training, randomly sampling from a memory pool, and calculating an updated model;

4) and simulating the fault diagnosis process of the rolling bearing as a sequential decision process in the DQN network, and when the model receives the STFT image of the vibration signal, the intelligent agent extracts deep features through an improved residual error network and then returns a specific reward value according to an improved return function. How the agent will diagnose each spectrogram after a certain number of iterations, since the residual network can extract the deep features of the picture, the correct recognition action will be made for the test set despite the difference in pixel distribution;

5) reinforcement learning is a process of trial and error, and it is important to use learned knowledge and perform reasonable trial and error, which is also called exploration-application. The deep reinforcement learning inherits the adaptability of the reinforcement learning in an unknown environment, so that the model has good generalization under various unbalanced proportions. And when the model training is finished, in order to reduce the exploration randomness of the model and enable the model to be more stable, the trained model parameters are fixed by taking the idea of transfer learning as reference, and the final fault diagnosis of the rolling bearing is carried out through a Softmax classifier.

Explanation of various parameters in deep reinforcement learning:

state S_i: the state of the environment depends on the training samples. In the training start phase, the agent will take the first sample x₁As an initial state s₁. At each time step, state s_tCorresponding to sample x_t；

Action (Action) A: the actions of the agent correspond to the labels of the training set. In the training start phase, the agent takes action to guess the class labels. For the 10 classification problem, a ═ {0,1,2,3,4,5,6,7,8,9}, where 0-8 represent fault classes, i.e., minority classes, and 9 represent normal classes, i.e., majority classes;

reward (Reward) R: rewards refer to feedback from the environment by which the quality of the agent's behavior can be measured. In order to better guide the intelligent agent to learn the optimal diagnosis strategy in the unbalanced data set, the absolute reward of the minority class samples is higher than that of the majority class samples on the whole. When the agent correctly or incorrectly identifies a few types of samples, the environment feeds back greater rewards or penalties to the agent;

transition probability (Transition probability) P: in this model, the transition probability P(s)_t+1|s_t,a_t) Is determined. The agent follows the sample sequence in the training set from the current state s_tTransition to the next state s_t+1；

Search rate (count factor) γ: γ ∈ [0,1] balances current and future return values;

episode (Episode): a scenario in reinforcement learning refers to the process of an agent going from an initial state to a final state according to a certain policy. In the model, a certain number of samples are randomly extracted from a training set, and an intelligent agent is set to be a plot from the beginning of diagnosis of a first sample to the end of diagnosis of a last sample;

policy (Policy) π_θ: strategy pi_θRefers to the mapping function pi: s → A, wherein_θ(S_t) Indicating that agent is in state s_tAction a of executing_t. In this model, strategy π_θThe classifier is considered to have a parameter theta.

The pseudo code for the environment simulation algorithm is as follows:

detailed description of the preferred embodiment

The vibration data of the rolling bearing used by the invention is provided by a bearing data center of the university of Keiss Cauchy, the data are collected by an acceleration sensor under 4 load conditions, the sampling frequency is 12kHz and 48kHz, the load types comprise 0hp, 1hp, 2hp and 3hp, and the rotation speed of the motor is changed between 1730rpm and 1797rpm according to different loads. The vibration signal includes 4 different health states: normal condition (N), outer ring fault (OR), inner ring fault (IR) and rolling element fault (B), 3 defect fault diameters were 0.007inch, 0.014inch and 0.021inch, respectively. Thus, the data set contains 10 bearing operating states, with 0hp as an example, and the STFT results for each state are shown in fig. 4.

In order to simulate the unbalance distribution of the vibration data acquired in the real situation, more sample data in one class and less sample data in the other class may appear. And setting the data sets according to different Imbalance ratios (IMRs) to perform an experiment of multi-state data distribution Imbalance. The unbalance ratio refers to a ratio of the number of failure data to the number of normal data. 10-state imbalance and variable load experiments were performed with the imbalance data set up as shown in Table 2.

Table 2 unbalanced experimental data set composition

The imbalance ratios in table 2 are 1, 2/3, 1/2, 1/10, respectively, and the number of samples in all training sets is 4950. When the IMR is 1, the number of the samples in the normal state and the fault state of the rolling bearing is 50% of the total number of the samples, the proportion of the normal state and the fault state of the data set is 1:1, and the fault state is divided into 9 types; when the IMR is 2/3, the number of samples in the fault state of the rolling bearing is reduced, the number of samples in the normal state is gradually increased, and the training set is a slightly unbalanced data set; when IMR is 1/2, the number of samples in normal state is 2 times the number of samples in fault state, and the training set is a moderate imbalance data set; when IMR is 1/10, the number of samples in normal state is 10 times the number of samples in fault state, and the training set is an extremely unbalanced data set. For ease of comparison, the number of samples in the test set for each imbalance ratio remained the same, for a total of 500 test samples for each condition 50.

The deep reinforcement learning hyper-parameter is set as follows: during interaction of the agent with the environment, the exploration probability epsilon is reduced from 1.0 according to the formula (21) to epsilon_min＝0.01。

Step is the current iteration number, and total is the total iteration number. The search rate γ is 0.99, the scenario K is 512, the number of iterations is 2000, and the method in equation (15) is used for the return value.

In order to comprehensively verify the effectiveness and generalization capability of the provided fault diagnosis model under data imbalance and conveniently compare with diagnosis results of other documents, two experimental schemes are set based on Tensorflow and Keras deep learning frames, and a GPU is used for accelerating calculation.

In an unbalanced scenario, the rolling bearing fault state accounts for a small percentage (assumed to be 1%), and if the model predicts all samples to be normal, the accuracy is 99%, but the G-mean value is 0. It can be seen that the accuracy rate cannot fully evaluate the performance of the model under unbalanced data; and the G-mean value simultaneously considers the true negative rate and the true positive rate of the sample, so that the quality of the model in the unbalanced data can be measured, and the larger the value is, the better the unbalanced diagnosis performance of the model is. The calculation formula of G-mean is shown in formula (22).

Wherein TP is a true positive case, FN is a false negative case, TN is a true negative case, and FP is a false positive case.

For the problem that G-mean index can only evaluate unbalanced binary classification [19 ]]An unbalanced multi-classification index G-meantotal is provided, namely, two classes C are arbitrarily selected from a plurality of classes_iAnd C_jCalculating G-mean (C) of classification results of the two types of samples_i,C_j) And (4) weighting and summing all the G-mean values according to the index, wherein the formula is shown in a formula (23).

Accuracy and G-meantotal are adopted as evaluation indexes due to the involvement of balanced and unbalanced datasets.

To verify the diagnostic ability of the proposed model under data imbalance, a training set of 4 different imbalance ratios (see table 2) was trained with rolling bearings working under the same load. Test set as described above, a total of 500 test samples; the results of comparison with the other three methods (DQN model, traditional oversampling methods SMOTE, CNN method) are shown in fig. 5, where the load 0 on the abscissa refers to the same load (load 0) used for both training and testing, and the other loads 123 are similar; the SMOTE method is a traditional oversampling method.

From the comparison of the accuracy rates in fig. 5, it can be seen that the improved DQN model of the present invention represents a great advantage under data set imbalance. When IMR is equal to 1, the accuracy of the improved DQN model is improved by 1% compared with that before improvement, and is improved by 5% -8% compared with the traditional SMOTE method; when the IMR is increased to 2/3 unbalance ratio, the improved model diagnosis result is integrally higher than 99%, even reaches 100%, and is improved by about 1% compared with the prior art, and is improved by 6% -8% compared with the traditional method; even when IMR is an extreme imbalance of 1/10, the failure diagnosis accuracy is concentrated between 99% and 100%, which is superior to SMOTE and CNN methods. In the unbalanced test, the CNN method can only diagnose most types of data in the training set, and the performance is not good under the unbalanced data set.

Because the data are unbalanced, the classification performance of the model evaluated by solely using the classification accuracy is not comprehensive enough, and the G-meantotal indexes of the experimental results are compared. Meanwhile, for the purpose of more obvious comparison, the G-meantotal indexes are compared only for the data types with the improved model accuracy rate lower than 99%. As shown in fig. 6, when the load is 0 and the IMR is 1, the G-mean total index is higher than the model before improvement and the SMOTE and CNN methods, further illustrating that the model of the present invention has excellent diagnostic capability of unbalanced data.

The accuracy of fig. 5 and the accuracy of fig. 6G-meanstotal collectively verify the validity of the model under unbalanced data from the perspective of experimental results. When the training return value curve of the load with the unbalance ratio of 1 and 0 is analyzed, the return value of the improved DQN model of the invention should be concentrated around 512/2 × 9+512/2 ≈ 2560 if the improved DQN model is in an optimal state at the end of training, as shown in FIG. 7.

To more intuitively prove the effectiveness of the method, taking 0 load and IMR 1 as an example, a t-distribution neighborhood embedding (t-SNE) algorithm is adopted^[20]And reducing the dimension of the characteristics of the last fully-connected layer of the DQN network before and after improvement to two dimensions and representing the two dimensions in the form of scattered points, as shown in fig. 8, fig. 8(a) is a visualization diagram of the characteristics of the DQN network before and after improvement, and fig. 8(b) is a visualization diagram of the characteristics of the DQN network after and after improvement.

By observing fig. 8, the model before improvement has the problems of wrong classification and unobvious decision boundaries of some categories (rectangular labeling), and has a certain promotion space; the improved model has obvious decision boundary and less wrong division phenomenon, and improves the accuracy of fault diagnosis.

Detailed description of the invention

The experimental result of the first specific embodiment shows that the model of the invention can well solve the problem of unbalanced distribution of vibration data, and in order to verify the generalization capability of the model provided by the invention, the model is tested on data under variable load and unbalance conditions, and the experimental result is shown in fig. 9.

Fig. 9 shows the classification accuracy of the rolling bearing variable load experiment with four unbalance ratios, and it can be seen that the method of the present invention has good classification accuracy regardless of whether the training set and the test set are single loads or multiple loads. Experiments are carried out by using 01_23 data, when IMR is 1, the classification accuracy rate is 99.4%, and when IMR is 1/2, the classification accuracy rate can also have 99% accuracy rate which is far higher than that of the SMOTE method; an experiment is carried out by using 023_1 data, and when IMR is 1, the classification accuracy is as high as 99.2%. Even in the case of extreme imbalance of IMR 1/10, the classification accuracy can reach 100%, which is more excellent than the SMOTE method.

Fig. 10 shows that the G-meantotal values of 0.988 and 0.989 are better than the conventional method and the method before improvement when the IMR is 1 and the IMR is 2/3, taking the 01 — 23 data as an example, under the unbalance index G-meantotal.

To visually prove the effectiveness of the proposed method, the 023 load training 1 load test, IMR 1/10 is taken as an example, and the t-SNE method is used to show the fault diagnosis visualization of the model under unbalanced and variable loads, as shown in fig. 11. As can be seen from fig. 11, only very individual samples could not be detected correctly under unbalanced and varying loads, and the generalization capability was excellent.

Therefore, even under the condition of extremely unbalanced data, the method provided by the invention can be well suitable for a variable environment, and has higher accuracy rate than that before improvement even if the data is unbalanced and the load is changed. Therefore, the method provided by the invention can well solve the problems of unbalanced data distribution and variable load of the normal state and the fault state of the vibration data.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

The invention provides funding projects for national science fund (51805120), Heilongjiang province natural science fund (LH2019E058) and Heilongjiang province general colleges and universities basic scientific research business special fund (LGYC2018JC 022).

The documents cited in the present invention are as follows:

[1] zhang Shuqing, Yu Shi Yu, Yao Yu Yong, etc. mechanical fault diagnosis method based on ICEEMD and AWOA ELM optimization [ J ] instrument and meter report, 2019,40(11): 172-assistant 180.

[2] Fault identification research [ J ] electronic measurement and instrument report, 2019,33(03): 176-183) based on improved depth model for generating confrontation network under the data set of Duckweed, Liu Jie and imbalance.

[3] Gongwenfeng, Cheng, Zhang Meiling, etc. the intelligent diagnosis method for tiny faults of motor bearings based on deep learning [ J ] the study and report of instruments and meters 2020,41(01):195 + 205.

[4] Dow, Guo Liang, Gao hong Li, etc. A method for classifying imbalance of mechanical failure data [ J ] Instrument and Meter report 2019,40(12):205 + 213.

[5] Huanghaisong, Weijian an, Nianzhu Peng, etc. the rolling bearing fault diagnosis based on the unbalanced sample characteristic oversampling algorithm and SVM [ J ] vibration and impact 2020,39(10):65-74+132.

[6]WANG H,KE Y,Luo G,et al.Compressed sensing of roller bearing fault based on multiple down-sampling strategy[J].Measurement Science and Technology,2015,27(2):025009.

[7]MAO W,HE L,YAN Y,et al.Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine[J].Mechanical Systems and Signal Processing,2017(83):450-473.

[8]JIA F,LEI Y,LU N,et al.Deep normalized convolutional neural network for imbalanced fault classification of machinery and its understanding via visualization[J].Mechanical Systems and Signal Processing,2018,110:349-367.

[9] Chenrenxiang, Wu Hao year, Yang Li Xia, etc. the multi-classifier integrated weighted balanced distribution adapted rolling bearing life stage identification [ J ] instrument and meter bulletin, 2019,40(10):66-73.

[10]WU Z,GUO Y,LIN W,et al.A weighted deep representation learning model for imbalanced fault diagnosis in cyber-physical systems[J].Sensors,2018,18(4):1096.

[11]LIN E,CHEN Q,QI X.Deep reinforcement learning for imbalanced classification[J].Applied Intelligence,2020:1-15.

[12]JANISCH J,PENVNY T,LISY V.Classification with costly features using deep reinforcement learning[C].Proceedings of the AAAI Conference on Artificial Intelligence.2019,33(1):3959-3966.

[13]FENG J,HUANG M,ZHAO L,et al.Reinforcement learning for relation classification from noisy data[C].Proceedings of the AAAI Conference on Artificial Intelligence.2018,32(1).

[14]WIERING M A,VAN HASSELT H,PIETERSMA A D,et al.Reinforcement learning algorithms for solving classification problems[C].2011IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning(ADPRL).IEEE,2011:91-96.

[15]DING Y,MA L,MA J,et al.Intelligent fault diagnosis for rotating machinery using deep Q-network based health state classification:A deep reinforcement learning approach[J].Advanced Engineering Informatics,2019(42):100977.

[16]FAN J,WANG Z,XIE Y,et al.A theoretical analysis of deep Q-learning.Learning for Dynamics and Control[C].PMLR,2020:486-489.

[17]WU S,ZHONG S,LIU Y.Deep residual learning for image steganalysis[J].Multimedia tools and applications 2018,77(9):10437-10453.

[18] Zulong, Song Chengyang, Zhouyou, Youyou, et al, fault diagnosis of rolling bearing based on VMD multi-feature fusion and PSO-SVM [ J ] mechanical design and research, 2019,35(06):96-104.

[19]ESPINDOLA R P,EBECKEN N F F.On extending f-measure and g-mean metrics to multi-class problems[J].WIT Transactions on Information and Communication Technologies,2005(35).

[20]ZHENG J,JIANG Z,PAN H.Sigmoid-based refined composite multiscale fuzzy entropy and t-SNE based fault diagnosis approach for rolling bearing[J].Measurement,2018(129):332-342.

Claims

1. A fault diagnosis method for a rolling bearing of an improved deep Q network is characterized by comprising the following steps:

step two, preprocessing the training data and the test data;

quantifying an incentive function in the depth Q network by utilizing K-means clustering to ensure that each sample in the training data has an own incentive value, thereby obtaining a new return function; the specific process comprises the following steps:

step three, taking a central point of the K-means cluster as a basic point of return values of different categories, wherein a return value calculation formula of the central point is as follows:

wherein t represents time; s_tRepresenting the state at the time t; a is_tRepresents the action at the time t; y is_tA sample label representing time t; 1/rho is in [0,1]]，ρ＝D_N/(D_F/9) denotes the imbalance ratio, D_NIs a majority of normal samples, D_FIs a minor faultA sample;

step two, within the category, the Euclidean distance between each sample and the cluster central point to which the sample belongs is used for quantifying the return function, and the method specifically comprises the following steps:

And step three, updating the cluster corresponding to the sample in the training data, wherein the minimized loss function is as follows:

step three, two, three, updating each cluster center mu₁,μ₂,…,μ_k：

Step three, step two, step four, traverse all cluster classifications and find the optimum solution of the minimum loss function, until the cluster label reaches the convergence precision, otherwise the step three, step two to step three, step two, step four of loop iteration;

step three, step two, step six, the reward function value is quantized by comparing the reward function value corresponding to different categories with the distance between each sample and the center in the same category; wherein, the quantization formula is:

wherein, Dis(s)_t) Representing the distance between each sample and the central point after normalization processing;

and step five, inputting the test data into an improved deep Q network model to obtain a fault diagnosis result of the rolling bearing.

2. The method for diagnosing the fault of the rolling bearing with the improved deep Q network is characterized in that in the step one, the training data are collected under 4 load conditions, and the load types comprise 0hp, 1hp, 2hp and 3 hp; the different states include a normal state, an outer ring fault state, an inner ring fault state and a rolling body fault state.

3. The method for diagnosing the fault of the rolling bearing with the improved deep Q network as claimed in claim 2, wherein the preprocessing in the second step comprises data enhancement, and the training data is subjected to short-time Fourier transform to obtain a two-dimensional time-frequency domain image.

4. The method for diagnosing the rolling bearing fault of the improved deep Q network according to claim 3, wherein in the third step, in the process of clustering through the K-means, the samples with the clustering errors are removed, and new samples are obtained again until the clustering is correct.

5. The method for diagnosing the fault of the rolling bearing with the improved deep Q network as claimed in claim 1, wherein in the third step, the Principal Component Analysis (PCA) is firstly adopted to reduce the dimension of the training data before clustering.

6. The method for diagnosing faults of rolling bearings with improved deep Q-network according to claim 1, characterized in that in step four the deep features are extracted by means of an improved Resnet-18 network, the improvement of which is characterized in that: and outputting the Q values corresponding to the states and actions instead of outputting all the Q values through the full connection layer so as to accelerate the network training speed.

7. The method for diagnosing the fault of the rolling bearing with the improved deep Q network according to claim 6, wherein the deep reinforcement learning training process in step four comprises the following steps: the experience of each time step, i.e. the current bearing fault state s, is predetermined_tFault diagnosis action taken a_tThe immediate reward r obtained_tAnd the next state s' is stored in the memory as memory, randomly sampled from the memory during training, and the loss function of the depth Q network is updated by using a gradient descent method; wherein the memory is a fixed length sequence.

8. The method for diagnosing the fault of the rolling bearing of the improved deep Q network is characterized in that in the fifth step, the final fault diagnosis of the rolling bearing is carried out through a Softmax classifier.