CN111475987B

CN111475987B - SAE and ON-LSTM-based gear residual life prediction method

Info

Publication number: CN111475987B
Application number: CN202010255897.8A
Authority: CN
Inventors: 秦毅; 阎昊冉; 项盛; 陈定粮; 奚德君
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2024-05-24
Anticipated expiration: 2040-04-02
Also published as: CN111475987A

Abstract

The invention relates to a gear residual life prediction method based ON SAE and ON-LSTM, belonging to the field of big data and intelligent manufacturing. The method utilizes SAE to extract gear health indexes, and uses a novel ON-LSTM neural network to predict the residual life of gears, and the hierarchical structures of input data and historical data are divided by a hierarchy divider in the ON-LSTM. And defining the positions of the largest elements in the output vectors of the main forgetting gate and the main input gate as hierarchical positions, so that the recurrent neural network is updated hierarchically. The invention predicts the residual service life of the gear through SAE feature extraction and ON-LSTM neural network, greatly reduces the network calculation amount, reduces the calculation time, and improves the prediction speed and accuracy.

Description

SAE and ON-LSTM-based gear residual life prediction method

Technical Field

The invention belongs to the field of big data and intelligent manufacturing, and relates to a gear residual life prediction method based ON SAE and ON-LSTM.

Background

Gears are widely used in mechanical equipment and are one of the most widely used mechanical parts. The gear has the unique advantages of high transmission efficiency, compact structure, good transmission smoothness, large bearing capacity, long service life and the like, so that the gear has strong and durable vitality. Under complex operating conditions and circumstances, gears are prone to failure, potentially leading to disasters in machine operation and even compromising personal safety. This is especially true for large or ultra-large equipment such as hydraulic generators, mine conveyor machines, helicopter power transmission systems, heavy machine tools, and the like. The service life of the in-service gear is predicted, the maintenance time of equipment can be effectively determined, the continuous and efficient production is ensured, the production efficiency is improved, the accident rate is reduced, and the occurrence of sudden accidents is prevented, so that the failure of the gear is predicted in advance, and the method has great significance for engineering production.

The commonly used life prediction methods of mechanical equipment are mainly divided into the following three types: 1) A model-based prediction method; 2) Data driving; 3) The first two methods are mixed. The model-based approach builds a physical model that describes the component degradation process. This method requires specific mechanical knowledge and is therefore less applicable. The data driven method derives a predictive model from conventionally collected monitored data. It is mainly based on statistical and machine learning methods, with the aim of finding the behaviour of the system. Thus, these methods provide a compromise between accuracy, complexity and applicability. The hybrid approach combines model-based and data-driven techniques. And establishing a model by using physical knowledge of the monitoring system, and learning and updating parameters by using a data driving technology. The combination of model-based and data-driven techniques makes the method accurate, but still requires specific physical knowledge and is computationally expensive. The machine learning-based method in the data driving method can overcome the problem that the degradation model is unknown, and meanwhile, the input of the built model is not limited to the state monitoring data, and can be various different types of data. The residual life prediction method based on the RNN can integrate the original learning sample and the new learning mode to realize sample retraining, not only can improve the accuracy of residual life prediction, but also has the characteristics of high convergence speed, high stability and the like, and plays an important role in the fields of reliability evaluation and residual life prediction. However, when long-term dependent degradation data is processed, the conventional RNN method suffers from the problem of gradient extinction or explosion, and the residual life prediction accuracy is seriously affected.

To solve this problem, long short-term memory (LTSM) networks have been developed, however LSTM has insufficient capabilities to handle long-term dependent degradation data. In the method for predicting the residual life of the gear, the actually measured gear degradation signal contains rich useful information, and the time domain and frequency domain characteristics of the gear vibration signal reflect the descending trend of the state and the health condition of the gear vibration signal to a certain extent. In order to comprehensively and accurately express the degradation process of the gear, it is necessary to calculate all the characteristics of the gear. However, the conventional LTSM network does not perform comprehensive analysis on the gear degradation data and does not perform mining on the sequence information of the neural network, so that the life prediction capability is poor. Therefore, in order to consider the influence of sequence information in the overall analysis of gear degradation data to thereby improve prediction accuracy, it is necessary to design a new neural network to replace the conventional LTSM neural network to predict the remaining life of gears.

Disclosure of Invention

In view of the above, the present invention aims to provide a gear residual life prediction method based ON SAE and ON-LSTM, which aims at the unordered characteristics of neurons of the conventional LSTM neural network, and digs the sequence information of the neurons by embedding a tree structure between the neurons. And then carrying out hierarchical updating on different sequence information, thereby reducing the calculated amount of the traditional LSTM neural network.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a gear residual life prediction method based ON SAE and ON-LSTM specifically comprises the following steps:

S1: collecting gear vibration signals with the time length of T at intervals of delta T until the gears fail, wherein the number of sampled gear vibration signal segments is n;

s2: processing the gear vibration signal by using a sparse self-encoder (sparse autoencoder, SAE) to obtain an n-dimensional characteristic vector V;

s3: selecting a feature matrix V1 consisting of the n1 sampling points as a training matrix;

S4: using least squares to function the target And (3) carrying out minimum, normalizing all elements in the vector V with the vector V1 through a formula V' _i＝av_i +b, wherein a and b respectively represent the weight and bias of a function used in the process of unifying all element vectors V1 in the vector V, and the specific size is determined through the minimization of the objective function;

S5: normalizing the vector V1 to obtain a normalized vector w= (W ₁,w₂,…,w_n1)^T;

S6: reconstruction matrix Wherein p is the number of units of the input layer of the neural network;

S7: taking the p rows in front of the matrix U as the input of an ON-LSTM (ON-LSTM) of a long-term memory neural network of ordered neurons, and taking the last row as the output of the ON-LSTM neural network to train the network;

s8: after training the matrix U, taking p outputs of the reciprocal as the network input of the ON-LSTM to obtain the output at the next moment;

S9: and repeating the step S8 for a certain number of times, comparing the output denormalized values with an actual characteristic value V '= (V' _p+1,v'_p+2,…,v'_n)^T), and when the output denormalized values exceed a set threshold value, multiplying the predicted sampling point number by the sum delta t+T of the interval time and the sampling time of the gear vibration signal to obtain the residual service life of the gear.

Further, in step S2, the SAE sparse automatic encoder has a structure of: let Y _t＝[y₁ y₂ … y_n be the original vibration signal for the entire time dimension, where Y _t＝[y_t,1 y_t,2 … y_t,m represents the input data at time t, where m represents the number of signal points acquired during each time period. Through the first layer network, a vector z _t with 50 dimensions can be obtained; through the second layer network, a 1-dimensional health index x _t can be obtained; then through the third layer network, a 50-dimensional vector z' _t can be obtained; through the last layer of network, an m-dimensional vector y _t' can be obtained. The least square method is used as an objective function to make the difference between y _t and y _t' as small as possible, and x _t is used as a health index at the time t to be applied to a subsequent prediction network.

Wherein z _t denotes an output of the first layer network, x _t denotes an output of the second layer network, z _t denotes an output of the hidden third layer network, y' _t denotes an output of the fourth layer network, σ denotes a sigmod activation function, W ₁ denotes a weight of the first layer network, W ₂ denotes a weight of the second layer network, W ₃ denotes a weight of the third layer network, W ₄ denotes a weight of the fourth layer network, b ₁ denotes a bias of the first layer network, b ₂ denotes a bias of the second layer network, b ₃ denotes a bias of the third layer network, and b ₄ denotes a bias of the fourth layer network.

Further, the structure of the ON-LSTM neural network is as follows: assuming that the input information of the whole time dimension is X _t＝[x₁x₂ … x_t ], wherein X _t＝[x_t,1 x_t,2 ... x_t,n represents input data at the time t, and h _t-1＝[h_t-1,1 h_t-1,2 ... h_t-1,m represents recursive data of the neural network at the time t-1; first, defining a hierarchy divider, determining a hierarchy position L1 of the input information x _t through a constructor F ₁, and determining a hierarchy position L2 of the history information h _t-1 through a constructor F ₂; with the division of hierarchical information, the structure of an ON-LSTM neural network is expressed as:

Where f _t denotes the output of the forget gate in the hidden layer, i _t denotes the output of the input gate in the hidden layer, o _t denotes the output of the output gate in the hidden layer, Representing the output of the hyperbolic tangent function in the hidden layer,/>Output representing main forgetting gate in hidden layer,/>Representing the output of a main input gate in the hidden layer, σ representing sigmod activation functions, tanh representing hyperbolic tangent functions, W _f representing the weight between the input layer and a forgetting gate in the hidden layer, W _i representing the weight between the input layer and an input gate in the hidden layer, W _o representing the weight between the input layer and an output gate in the hidden layer, W _c representing the weight between the input layer and the hyperbolic tangent functions in the hidden layer,/>Representing the weight between the main forgetting gate in the input layer and hidden layer,/>Representing the weight between the input gate and the main output gate in the hidden layer, U _f representing the weight between the forgetting gate in the hidden layer at the current time and the hidden layer output at the last time, U _i representing the weight between the input gate in the hidden layer at the current time and the hidden layer output at the last time, U _o representing the weight between the output gate in the hidden layer at the current time and the hidden layer output at the last time, U _c representing the weight between the hyperbolic tangent function in the hidden layer at the current time and the hidden layer output at the last time,/>Representing the weight between the main forgetting gate in the hidden layer at the current moment and the output of the hidden layer at the last moment,/>Representing the weight between the main input gate in the hidden layer at the current moment and the output of the hidden layer at the previous moment, b _f representing the bias of the linear combination in the forgetting gate in the hidden layer, b _i representing the bias of the linear combination in the input gate in the hidden layer, b _o representing the bias of the linear combination in the output gate in the hidden layer, b _c representing the bias of the linear combination in the hyperbolic tangent function in the hidden layer,/>Bias representing linear combinations in the main forgetting gate in hidden layer,/>Representing the bias of the linear combination in the main input gate in the hidden layer,/>Representing the multiplication operation symbol of the corresponding element, c _t-1 represents the output of the memory cell memory unit at time t-1, function cumax (…) = cumsum (softmax (…)), cumsum is the accumulation function,/>Representing middle level,/>Representing a low-level of the hierarchy,Representing a high level.

Further, in the structure of the ON-LSTM neural network, the hierarchical position L1 of the input information x _t is determined by the constructor F ₁, and the hierarchical position L2 of the history information h _t-1 is also determined by the constructor F ₂, where the expression is as follows:

L1＝F₁(x_t,h_t-1)

L2＝F₂(x_t,h_t-1)

Wherein F ₁ and F ₂ are defined as:

And outputting values of positions of maximum elements in the vector after nonlinear combination of the input data and the historical data. During the network training process, x _t+1,n is set to the query vector q _m; in the prediction phase, x _t,n is set to q _m.

Further, in the structure of the ON-LSTM neural network, the neural network hierarchy is divided by ordering the cell units by the required L1 and L2. When L2 is greater than or equal to L1, the historical data level is lower than the current input level, and the overlapped part of the levels isThe cell memory unit c _t is updated by the following rule:

When L2 < L1, the current input level is lower than the history level, and the cell memory unit c _t is updated by the following rule:

The invention has the beneficial effects that: the SAE adopted by the invention is different from the traditional time-frequency domain characteristic value, the health index can be extracted by utilizing a digital model, and the adopted ON-LSTM neural network aims at the unordered characteristic of the interlayer neurons of the common neural network, the sequence information of the interlayer neurons is mined by embedding a tree structure among the neurons, and then the hierarchical updating is carried out ON different level information, so that the high level information is not easy to replace, the updating rule of the middle level information is the same as that of the traditional long-short-term memory network (LSTM), and the low level information is easier to be influenced by the input data at the current moment. Through the use of sequence information ignored by other neural networks, the ON-LSTM has advantages over the traditional LSTM and variants thereof in the field of predicting the residual service life of gears, and the prediction accuracy of the network is greatly improved.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out in the specification.

Drawings

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in the following preferred detail with reference to the accompanying drawings, in which:

FIG. 1 is a block diagram of an SAE sparse automatic encoder;

FIG. 2 is a block diagram of an ON-LSTM neural network;

FIG. 3 is a schematic diagram of an hidden layer of an ON-LSTM neural network;

FIG. 4 is a schematic diagram showing a hierarchical cell unit update process based on layer sequence information, wherein (a) is a first update mode and (b) is a second update mode;

FIG. 5 is a flow chart of a method for predicting the remaining life of a gear based ON SAE and ON-LSTM according to the present invention;

FIG. 6 is a simulation diagram of failure threshold, training value, predicted value and actual value for 380 known sampling points;

FIG. 7 is a graph of predictive power versus known 380 fused eigenvalues for different models.

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the illustrations provided in the following embodiments merely illustrate the basic idea of the present invention by way of illustration, and the following embodiments and features in the embodiments may be combined with each other without conflict.

Referring to fig. 1 to 7, fig. 5 is a flowchart of a method for predicting the remaining life of a gear based ON SAE and ON-LSTM, wherein ON-LSTM aims at the disorder characteristic of inter-layer neurons of a general neural network, a tree structure is embedded between neurons to mine sequence information of the inter-layer neurons, and then hierarchical updating is performed ON different level information, so that high level information is not easy to replace, update rules of the middle level information are the same as those of a traditional long short term memory network (LSTM), and low level information is more easily affected by input data at the current moment. The ON-LSTM has advantages over conventional LSTM and its variants in the field of gear life prediction by using other neural networks that ignore sequence information. The prediction method based ON the ON-LSTM comprises the following steps:

s2: processing the gear vibration signal by using SAE to obtain an n-dimensional characteristic vector V;

As shown in fig. 1, the SAE sparse automatic encoder has the structure: let Y _t＝[y₁ y₂ … y_n be the original vibration signal for the entire time dimension, where Y _t＝[y_t,1 y_t,2 … y_t,m represents the input data at time t, where m represents the number of signal points acquired during each time period. Through the first layer network, a vector z _t with 50 dimensions can be obtained; through the second layer network, a 1-dimensional health index x _t can be obtained; then through the third layer network, a 50-dimensional vector z' _t can be obtained; through the last layer of network, an m-dimensional vector y _t' can be obtained. The least square method is used as an objective function to make the difference between y _t and y _t' as small as possible, and x _t is used as a health index at the time t to be applied to a subsequent prediction network.

As shown in fig. 2 to 4, the hierarchical structure of the input data and the history data is divided by a learnable hierarchy definer in the ON-LSTM. The position of the largest element in the output vector of the main forgetting gate and the main input gate is defined as a hierarchical position, and the division of the hierarchical structure is determined by the largest element, so that the tree hierarchical structure is embedded into the recurrent neural network. This means that the low-level information is more easily replaced by the input data at the next time; while high-level information is not easily replaced; the update rules of the middle level information follow the original LSTM. The deducing process of the ON-LSTM structure is as follows:

Assuming that the input information of the whole time dimension is X _t＝[x₁ x₂ … x_t ], wherein X _t＝[x_t,1 x_t,2 … x_t,n represents input data at the time t, and h _t-1＝[h_t-1,1 h_t-1,2 … h_t-1,m represents recursive data of the neural network at the time t-1; first, a hierarchy divider is defined, whose hierarchy position L1 is determined by the constructor F ₁ for the input information x _t, and whose hierarchy position L2 is also determined by the constructor F ₂ for the history information h _t-1.

L1＝F₁(x_t,h_t-1) (2)

L2＝F₂(x_t,h_t-1) (3)

Wherein F ₁ and F ₂ are defined as:

In the structure of the ON-LSTM neural network, the neural network hierarchy is divided by the required L1 and L2 ordering of the cell units. When L2 is greater than or equal to L1, the historical data level is lower than the current input level, and the overlapped part of the levels isThe cell memory unit c _t is updated by the following rule:

with the division of hierarchical information, a variant structure of an LSTM neural network is derived as follows:

Where f _t denotes the output of the forget gate in the hidden layer, i _t denotes the output of the input gate in the hidden layer, o _t denotes the output of the output gate in the hidden layer, Representing the output of the hyperbolic tangent function in the hidden layer,/>Output representing main forgetting gate in hidden layer,/>Representing the output of a main input gate in the hidden layer, σ representing sigmod activation functions, tanh representing hyperbolic tangent functions, W _f representing the weight between the input layer and a forgetting gate in the hidden layer, W _i representing the weight between the input layer and an input gate in the hidden layer, W _o representing the weight between the input layer and an output gate in the hidden layer, W _c representing the weight between the input layer and the hyperbolic tangent functions in the hidden layer,/>Representing the weight between the main forgetting gate in the input layer and hidden layer,/>Representing the weight between the input gate and the main output gate in the hidden layer, U _f representing the weight between the forgetting gate in the hidden layer at the current time and the hidden layer output at the last time, U _i representing the weight between the input gate in the hidden layer at the current time and the hidden layer output at the last time, U _o representing the weight between the output gate in the hidden layer at the current time and the hidden layer output at the last time, U _c representing the weight between the hyperbolic tangent function in the hidden layer at the current time and the hidden layer output at the last time,/>Representing the weight between the main forgetting gate in the hidden layer at the current moment and the output of the hidden layer at the last moment,/>Representing the weight between the main input gate in the hidden layer at the current moment and the output of the hidden layer at the previous moment, b _f representing the bias of the linear combination in the forgetting gate in the hidden layer, b _i representing the bias of the linear combination in the input gate in the hidden layer, b _o representing the bias of the linear combination in the output gate in the hidden layer, b _c representing the bias of the linear combination in the hyperbolic tangent function in the hidden layer,/>Bias representing linear combinations in the main forgetting gate in hidden layer,/>Representing the bias of the linear combination in the main input gate in the hidden layer,/>Representing the multiplication operation symbol of the corresponding element, c _t-1 represents the output of the memory cell memory unit at time t-1, function cumax (.+ -.) = cumsum (softmax (.+ -.)), cumsum is the summation function,/>Representing middle level,/>Representing a low-level of the hierarchy,Representing a high level.

And (3) experimental verification:

The experiment adopts a mode of first-stage transmission acceleration and second-stage transmission deceleration, so that the transmission ratio of the experiment gearbox is just 1:1. The experimental gear is made of 40Cr, the machining precision is 5 grades, the surface hardness is 55HRC, and the modulus is 5. In particular, the number of teeth of the large gear is 31, the number of teeth of the small gear is 25, and the width of the first stage transmission gear is 21mm. In the experiment, the torque is 1400 N.m, the rotation speed of the large gear is 500r/min, the lubricating oil quantity of the experimental gearbox is 4L/h, and the cooling temperature is 70 ℃. The mode of collecting data selects all data in the collection process. Because of the high torque, the first stage drive gearwheel fails to break after 814 minutes of operation.

On the premise of knowing 380 fusion characteristic points, the gear residual life is predicted, and the predicted result is shown in fig. 6. The present invention provides an ON-LSTM with higher prediction accuracy than conventional LSTMs, see fig. 7, where MAE (mean absolute error), NRMSE (standard root mean square error), score (scoring function of american society of electrical and electronic engineers) for life prediction performance) are three evaluation criteria. As can be easily seen from fig. 7, the ON-LSTM can fully grasp the health information contained in the input data due to the excavation of the layer sequence information, so that local optimization is easier to obtain, and high-precision prediction of the remaining service life of the gear is easier to realize.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present invention, which is intended to be covered by the claims of the present invention.

Claims

1. The method for predicting the residual life of the gear based ON SAE and ON-LSTM is characterized by comprising the following steps:

s2: processing the gear vibration signal by using SAE to obtain an n-dimensional characteristic vector V; wherein SAE represents a sparse self-encoder;

S4: using least squares to function the target The minimum value is then normalized to the vector V1 by the formula V' _i＝av_i +b, where a, b represent the weight and bias of the function used in the unification of the element vector V1 in the vector V ,V1＝(v1₁,v1₂,…,v1_n1)^T,V＝(v₁,v₂,...,v_n)^T;

S5: normalizing the vector V1 to obtain a normalized vector w= (W ₁,w₂,...,w_n1)^T;

S9: and repeating the step S8 for a certain number of times, comparing the output denormalized values with an actual characteristic value V '= (V' _p+1,v'_p+2,...,v'_n)^T), and when the output denormalized values exceed a set threshold value, multiplying the predicted sampling point number by the sum delta t+T of the interval time and the sampling time of the gear vibration signal to obtain the residual service life of the gear.

2. The method for predicting the remaining life of a gear based ON SAE and ON-LSTM as set forth in claim 1, wherein in step S2, the SAE is structured as follows: setting an original vibration signal of the whole time dimension as Y _t＝[y₁ y₂ … y_n, wherein Y _t＝[y_t,1 y_t,2 … y_t,m represents input data at the moment t, and m represents the number of signal points acquired in each time period; obtaining a vector z _t with 50 dimensions through a first layer network; obtaining a 1-dimensional health index x _t through a second-layer network; then, a 50-dimensional vector z' _t is obtained through a third layer network; obtaining an m-dimensional vector y' _t through a last layer of network; using a least square method as an objective function to make y _t and y' _t small in difference, and applying x _t as a health index at the time t to a subsequent prediction network;

Where z _t denotes the output of the first layer network, x _t denotes the output of the second layer network, z '_t denotes the output of the hidden third layer network, y' _t denotes the output of the fourth layer network, σ denotes sigmod the activation function, W ₁、W₂、W₃、W₄ denotes the weights of the first, second, third and fourth layer networks, respectively, and b ₁、b₂、b₃、b₄ denotes the bias of the first, second, third and fourth layer networks, respectively.

3. The method for predicting the remaining life of a gear based ON SAE and ON-LSTM as set forth in claim 1, wherein the ON-LSTM neural network has a structure of: setting the input information of the whole time dimension as X _t＝[x₁ x₂ … x_t ], wherein X _t＝[x_t,1 x_t,2 … x_t,n represents input data at the time t, and h _t-1＝[h_t-1,1 h_t-1,2 … h_t-1,m represents recursive data of the neural network at the time t-1; first, defining a hierarchy divider, determining a hierarchy position L1 of the input information x _t through a constructor F ₁, and determining a hierarchy position L2 of the history information h _t-1 through a constructor F ₂; with the division of hierarchical information, the structure of an ON-LSTM neural network is expressed as:

Where f _t denotes the output of the forget gate in the hidden layer, i _t denotes the output of the input gate in the hidden layer, o _t denotes the output of the output gate in the hidden layer, Representing the output of the hyperbolic tangent function in the hidden layer,/>Output representing main forgetting gate in hidden layer,/>Representing the output of the main input gate in the hidden layer, σ represents sigmod the activation function, tanh represents the hyperbolic tangent function, W _f represents the weight between the input layer and the forgetting gate in the hidden layer, W _i represents the weight between the input layer and the input gate in the hidden layer, W _o represents the weight between the input layer and the output gate in the hidden layer, W _c represents the weight between the hyperbolic tangent function in the input layer and the hidden layer,Representing the weight between the main forgetting gate in the input layer and hidden layer,/>Representing the weight between the input gate and the main output gate in the hidden layer, U _f representing the weight between the forgetting gate in the hidden layer at the current time and the hidden layer output at the last time, U _i representing the weight between the input gate in the hidden layer at the current time and the hidden layer output at the last time, U _o representing the weight between the output gate in the hidden layer at the current time and the hidden layer output at the last time, U _c representing the weight between the hyperbolic tangent function in the hidden layer at the current time and the hidden layer output at the last time,/>Representing the weight between the main forgetting gate in the hidden layer at the current moment and the output of the hidden layer at the last moment,/>Representing the weight between the main input gate in the hidden layer at the current moment and the output of the hidden layer at the previous moment, b _f representing the bias of the linear combination in the forgetting gate in the hidden layer, b _i representing the bias of the linear combination in the input gate in the hidden layer, b _o representing the bias of the linear combination in the output gate in the hidden layer, b _c representing the bias of the linear combination in the hyperbolic tangent function in the hidden layer,/>Bias representing linear combinations in the main forgetting gate in hidden layer,/>Representing the bias of the linear combination in the main input gate in the hidden layer,/>Representing the multiplication operation symbol of the corresponding element, c _t-1 represents the output of the memory cell memory unit at time t-1, function cumax (.+ -.) = cumsum (softmax (.+ -.)), cumsum is the summation function,/>Representing middle level,/>Representing a low level,/>Representing a high level.

4. A method for predicting the remaining life of a gear based ON SAE and ON-LSTM according to claim 3, characterized in that in the structure of the ON-LSTM neural network, the level position L1 is determined by the constructor F ₁ for the input information x _t, and the level position L2 is also determined by the constructor F ₂ for the history information h _t-1, expressed as follows:

L1＝F₁(x_t,h_t-1)

L2＝F₂(x_t,h_t-1)

Wherein F ₁ and F ₂ are defined as:

Outputting values of positions of maximum elements in the vector after nonlinear combination of the input data and the historical data; during the network training process, x _t+1,n is set to the query vector q _m; in the prediction phase, x _t,n is set to q _m.

5. The method for predicting gear residual life based ON SAE and ON-LSTM according to claim 4, wherein in the structure of the ON-LSTM neural network, the neural network hierarchy is divided by ordering the cell units by the L1 and L2; when L2 is greater than or equal to L1, the historical data level is lower than the current input level, and the overlapped part of the levels isThe cell memory unit c _t is updated by the following rule: