CN114298200B

CN114298200B - Abnormal data diagnosis method based on deep parallel time sequence relation network

Info

Publication number: CN114298200B
Application number: CN202111589040.0A
Authority: CN
Inventors: 凡时财; 杨淳; 邹见效; 徐红兵
Original assignee: Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Current assignee: Higher Research Institute Of University Of Electronic Science And Technology Shenzhen
Priority date: 2021-12-23
Filing date: 2021-12-23
Publication date: 2024-06-11
Anticipated expiration: 2041-12-23
Also published as: CN114298200A

Abstract

The invention discloses an abnormal data diagnosis method based on a depth parallel time sequence relation network, which is characterized in that characteristic data are collected and standardized under various abnormal working conditions of an industrial production system to obtain a training data matrix, then a characteristic vector sequence is extracted to obtain, the characteristic vector sequence is taken as input, a corresponding abnormal working condition serial number is taken as output to form a training sample, a DPTRN model is constructed, the method comprises a relation module, a decoupling position vector calculation module, a relation weight calculation module, a historical information vector calculation module, a vector splicing module and a multi-layer perceptron, the training sample is adopted to train the DPTRN model, and when the abnormal data diagnosis is required to be carried out on the industrial production system, the data matrix at the current moment is collected and the trained DPTRN model is input to obtain an abnormal data diagnosis result. The invention can improve the processing speed of time sequence data and ensure the detection performance of abnormal data.

Description

Abnormal data diagnosis method based on deep parallel time sequence relation network

Technical Field

The invention belongs to the technical field of industrial process abnormal data diagnosis, and particularly relates to an abnormal data diagnosis method based on a deep parallel time sequence relation network.

Background

With the increasing technology of modern industry, the scale of modern industry is becoming more and more complex. Abnormal data in the industrial process can not be recognized and solved in time, so that economic loss can be brought, and personnel life safety can be endangered in serious cases. Therefore, it is necessary to monitor industrial processes using robust anomaly data diagnostic techniques.

Traditional abnormal data diagnosis methods based on modeling have received more and more attention because the traditional abnormal data diagnosis methods based on modeling have high complexity, poor maintainability, low robustness and the like and cannot adapt to increasingly-growing modern industrial systems. The method based on data driving analyzes potential rules of a data mode according to historical data acquired in an industrial process, and obtains a data model with robustness and accuracy, so that abnormal data detection or fault diagnosis can be realized on new data.

Deep learning, which is the most hot data-driven method in recent years, has achieved a great deal of practical effort in the field of industrial fault detection and diagnosis. Compared with the traditional machine learning method, the deep learning can avoid a great deal of artificial characteristic engineering work, can automatically learn the potential high-dimensional expression of data, and has outstanding advantages on various evaluation indexes.

The time series data is data recorded in time series. Industrial processes generally have characteristics that evolve over time, and merely utilizing a single point-in-time characteristic does not adequately account for the evolving characteristics of the industrial process. The abnormal data diagnosis method based on the time sequence data can more fully utilize the historical information, learn the change characteristics of the industrial process along with the time evolution, and have strong characteristic extraction capability and abnormal data diagnosis capability.

The deep learning processing time sequence data generally adopts a neural network based on a cyclic neural network (Recurrent Neural Network, RNN), a Long short-Term Memory (LSTM) or a gate-controlled cyclic unit (Gate Recurrent Unit, GRU). However, these neural network structures extract features by serially processing time series data, and the data processing speed is limited, so that the requirements of rapid real-time diagnosis in industrial processes cannot be met.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide an abnormal data diagnosis method based on a deep parallel time sequence relation network, which combines relation characteristics among characteristic data at all moments by adopting a multi-layer perceptron (Multilayer Perceptron, MLP), and provides a deep parallel time sequence relation network (DEEP PARALLEL TIME SERIES Relationship Network, DPTRN) model on the basis, thereby realizing parallel processing of time sequence data, greatly improving the processing speed of the time sequence data and ensuring the detection performance of the abnormal data.

In order to achieve the above object, the method for diagnosing abnormal data based on a deep parallel time sequence relation network of the present invention comprises the following specific steps:

S1: under D abnormal working conditions of an industrial production system, collecting working data of various abnormal working conditions by a plurality of preset sensors, wherein the dimension of a feature vector at each sampling moment is M; the feature vector obtained at the t sampling moment under the d abnormal working condition is recorded as x _d(t),d＝1,2,…,D,t＝1,2,…,N_d,N_d to represent the number of the sampling moments under the d abnormal working condition; taking the characteristic vector X _d (t) as a row vector, and carrying out ascending arrangement according to sampling time to obtain an original training data matrix X _d:

S2: normalizing the original data matrix X _d to obtain a normalized training data matrix

S3: training data matrixFeature vector/>Dividing the time sequence into Q _d feature vector sequences/>, according to a preset time sequence length K

Where q=1, 2, …, Q _d,Q_d represents a training data matrixThe number of feature vector sequences obtained is divided, Representing a downward rounding;

s4: each feature vector sequence obtained in the step S3 As an input in a training sample, taking the corresponding abnormal working condition serial number d as an expected output, namely, forming a training sample;

s5: the method comprises the steps of constructing DPTRN models, including a relation module, a decoupling position vector calculation module, a relation weight calculation module, a historical information vector calculation module, a vector splicing module and a multi-layer perceptron, wherein:

The relation module is used for extracting an input characteristic vector sequence to obtain a preliminary relation weight vector and sending the preliminary relation weight vector to the relation weight calculation module, and the specific method is as follows:

the input characteristic vector sequence is recorded as Wherein Z (k) represents the kth M-dimensional feature vector of Z in the feature vector sequence;

The relation module comprises K-1 relation units, and the kth relation unit is used for calculating and obtaining the preliminary relation weight between the characteristic vector z (K') and the characteristic vector z (K) K' =1, 2, …, K-1, thereby obtaining a preliminary relational weight vectorEach relationship unit comprises a vector stitching unit and a multi-layer perceptron unit, wherein:

the vector splicing unit is used for splicing the feature vector z (K ') and the feature vector z (K), obtaining a spliced feature vector C (K ') and sending the spliced feature vector C (K ') to the multi-layer perceptron, wherein the expression of the feature vector C _k′ is as follows:

Wherein contact () represents vector concatenation;

the multi-layer perceptron unit receives the spliced feature vector C (K '), and processes the received feature vector C (K') to obtain a preliminary relation weight of the feature vector z (K) and the feature vector z (K)

The decoupling position vector calculation module is used for extracting decoupling position vectors of the historical moment and the current moment from the input characteristic vector sequence Z and sending the decoupling position vectors to the relation weight calculation module; the decoupling position vector calculation module comprises a position coding module and a position vector decoupling module, wherein:

the position coding module is used for respectively generating corresponding M-dimensional position codes PE (k) for each feature vector Z (k) in the feature vector sequence Z and sending the corresponding M-dimensional position codes PE (k) to the position vector decoupling module;

The position vector decoupling module calculates, according to the position code PE (K) of each feature vector z (K), the position codes DPE (K ') after decoupling the feature vector z (K ') at the previous K-1 historical moments, thereby obtaining the decoupled position vector dpe= [ DPE (1), DPE (2), …, DPE (K-1) ] at the previous K-1 historical moments, and the calculation formula of the position code DPE (K ') is as follows:

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

wherein, the "" -represents the inner product, Representing a location vector query matrix,/>Representing a position vector key value matrix;

the relation weight calculation module is used for calculating a relation weight vector according to the preliminary relation The final relation weight vector rw= [ RW (1), RW (2), …, RW (K-1) ] is calculated by decoupling the position vector dpe= [ DPE (1), DPE (2), …, DPE (K-1) ], wherein/>Then, the relation weight vector RW is sent to a historical information vector calculation module;

the history information vector calculation module is used for processing the first K-1 eigenvectors Z (K') in the eigenvector sequence Z according to the relation weight vector RW to obtain eigenvectors Represents the vector outer product and then for K-1 eigenvectors/>Carrying out summation pooling to obtain a historical information vector HI, and then sending the obtained historical information vector HI to a vector splicing module;

the vector splicing module is used for splicing the historical information vector HI and the characteristic vector Z (K) in the characteristic vector sequence Z, and sending the obtained spliced vector Con to the multi-layer perceptron;

the multi-layer perceptron is used for processing the spliced vector Con to obtain an abnormal working condition serial number corresponding to the input characteristic vector sequence;

S6: training the DPTRN model constructed in the step S5 by adopting the training sample obtained in the step S4 to obtain a trained DPTRN model;

s7: when the abnormal data diagnosis is needed to be carried out on the industrial production system, the same working data acquisition method as that in the step S1 is adopted to obtain M-dimensional feature vectors X (T-K) at the current moment T and the previous K-1 moments, and a data matrix X _T is formed:

The same method in the step S3 is adopted to carry out standardization processing on the data matrix X _T so as to obtain a standardized data matrix Matrix/>, dataInputting the DPTRN model trained in the step S6 to obtain an abnormal data diagnosis result.

The invention discloses an abnormal data diagnosis method based on a depth parallel time sequence relation network, which is characterized in that characteristic data are collected and standardized under various abnormal working conditions of an industrial production system to obtain a training data matrix, then a characteristic vector sequence is extracted to obtain the characteristic vector sequence, the characteristic vector sequence is taken as input, a corresponding abnormal working condition serial number is taken as output to form a training sample, a DPTRN model is built, the method comprises a relation module, a decoupling position vector calculation module, a relation weight calculation module, a historical information vector calculation module, a vector splicing module and a multi-layer perceptron, the training sample is adopted to train the DPTRN model, and when the abnormal data diagnosis of the industrial production system is required, the data matrix at the current moment is collected and obtained, and a trained DPTRN model is input to obtain an abnormal data diagnosis result.

The DPTRN model is constructed based on the multi-layer perceptron, and can capture the relation between each historical moment and the current moment in the time sequence data, so that the DPTRN model has the capability of processing data in parallel. Compared with the traditional method of extracting time sequence data features in a serial mode by using RNN, LSTM, GRU and other models, the invention greatly improves the data processing efficiency. In addition, by means of decoupling position coding, relation weight and other technologies, the abnormal data diagnosis capability of the invention is ensured.

Drawings

FIG. 1 is a flow chart of an embodiment of the anomaly data diagnostic method based on a deep parallel timing relationship network of the present invention;

FIG. 2 is a block diagram of a DPTRN model in the present invention;

FIG. 3 is a block diagram of a relational unit in the present invention;

Fig. 4 is a block diagram of a decoupled position vector calculation module according to the present invention.

Detailed Description

The following description of the embodiments of the invention is presented in conjunction with the accompanying drawings to provide a better understanding of the invention to those skilled in the art. It is to be expressly noted that in the description below, detailed descriptions of known functions and designs are omitted here as perhaps obscuring the present invention.

Examples

FIG. 1 is a flow chart of an embodiment of the anomaly data diagnosis method based on the deep parallel time sequence relation network of the present invention. As shown in fig. 1, the specific steps of the anomaly data diagnosis method based on the deep parallel time sequence relation network of the invention comprise:

S101: collecting training data:

Under D abnormal working conditions of the industrial production system, working data of various abnormal working conditions are collected by a plurality of preset sensors, and the dimension of the feature vector at each sampling moment is M, namely the feature data at each sampling moment is M. And recording the characteristic vector obtained at the t sampling time under the d abnormal working condition as x _d(t),d＝1,2,…,D,t＝1,2,…,N_d,N_d to represent the number of the sampling time under the d abnormal working condition. Taking the characteristic vector X _d (t) as a row vector, and carrying out ascending arrangement according to sampling time to obtain an original training data matrix X _d:

s102: training data normalization:

To facilitate subsequent data processing, the original data matrix X _d is normalized to obtain a normalized training data matrix

The standardized calculation formula in this embodiment is:

Where x _d (t) (M) represents the mth feature data in the feature vector x _d (t), m=1, 2, …, M, Representing normalized values of the feature data X _d (t) (m), mean (X _d (m)) represents the average value of the mth feature data in all feature vectors of the original data matrix X _d, std (X _d (m)) represents the covariance of the mth feature data in all feature vectors of the original data matrix X _d.

By adopting the above formula, each characteristic data can be expressed as a form with a mean value of 0 and a variance of 1.

S103: data timing:

in order to improve the accuracy of abnormal data diagnosis, the invention adopts time sequence data as the input data of an abnormal data diagnosis model, thus requiring training data matrix Divided into time series data. In order to avoid data leakage among samples, the invention adopts a time sequence interception mode without cross, and the specific method comprises the following steps:

training data matrix Feature vector/>Dividing the time sequence into Q _d feature vector sequences/>, according to a preset time sequence length K

Where q=1, 2, …, Q _d,Q_d represents a training data matrixThe number of feature vector sequences obtained is divided, Representing a rounding down.

S104: obtaining a training sample:

Each feature vector sequence obtained in step S103 And taking the corresponding abnormal working condition serial number d as expected output as input in a training sample, namely forming the training sample.

S105: constructing DPTRN models:

In order to realize abnormal data diagnosis, a DPTRN model is constructed in the invention. FIG. 2 is a block diagram of a DPTRN model in the present invention. As shown in fig. 2, the DPTRN model in the present invention includes a relationship module, a decoupling position vector calculation module, a history information vector calculation module, a vector splicing module, and a multi-layer sensor, and each component module is described in detail below.

The relation module comprises K-1 relation units, and the kth relation unit is used for calculating and obtaining the preliminary relation weight between the characteristic vector z (K') and the characteristic vector z (K) K' =1, 2, …, K-1, thereby obtaining a preliminary relational weight vectorFig. 3 is a block diagram of a relationship unit in the present invention. As shown in fig. 3, each relationship unit in the present invention includes a vector concatenation unit and a multi-layer perceptron unit, where:

The vector splicing unit is used for splicing the characteristic vector z (K ') and the characteristic vector z (K), obtaining a spliced characteristic vector C (K') and sending the spliced characteristic vector C (K ') to the multi-layer perceptron unit, wherein the expression of the characteristic vector C (K') is as follows:

where contact () represents vector concatenation.

Therefore, in the invention, the characteristic vector of the time K and the historical time K' are subjected to subtraction operation and addition operation, and then the operation result is spliced with the original two characteristic vectors, so that the capability of the model for mining the data relationship can be improved.

The multi-layer perceptron unit receives the spliced feature vector C (K '), and processes the received feature vector C (K') to obtain a preliminary relation weight of the feature vector z (K) and the feature vector z (K)The MLP module is a common neural network, and the specific principles and processes thereof are not described herein.

In this embodiment, in order to reduce the size of DPTRN models and difficulty in model training, the weight parameters are shared by the multi-layer perceptron units in the K-1 relationship units, that is, the weight is shared for all the historical moments.

From the above, in order to highlight the importance of a particular history, and prevent the important history from being smoothed, the time relation unit does not softmax normalize the final RW _pre, which means that the sum of the RWs _pre at each history is not 1.

The decoupling position vector calculation module is used for extracting decoupling position vectors of the historical moment and the current moment from the input characteristic vector sequence Z and sending the decoupling position vectors to the historical information vector calculation module. The decoupling position vector is introduced to enable the DPTRN model to take into account the relationships of the time nodes when processing the time series data in parallel. Fig. 4 is a block diagram of a decoupled position vector calculation module according to the present invention. As shown in fig. 4, the decoupling position vector calculation module includes a position encoding module and a position vector decoupling module, where:

The position coding module is used for respectively generating corresponding M-dimensional position codes PE (k) for each feature vector Z (k) in the feature vector sequence Z, and sending the M-dimensional position codes PE (k) to the position vector decoupling module. The position coding generally uses the absolute position coding method in the BERT model in the position coding module in this embodiment.

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

wherein, the "" -represents the inner product, Representing a trainable location vector query matrix,/>Representing a trainable location vector key value matrix.

In the conventional method, the position codes are usually added to the feature vectors in a live manner, but noise is easy to introduce, so that the invention sets a position vector query matrix and a position vector key value matrix, solves the problem of noise caused by the position codes, and simultaneously further learns the relation between each historical moment and the current moment, and improves the network convergence efficiency and robustness.

The relation weight calculation module is used for calculating a relation weight vector according to the preliminary relationThe final relation weight vector rw= [ RW (1), RW (2), …, RW (K-1) ] is calculated by decoupling the position vector dpe= [ DPE (1), DPE (2), …, DPE (K-1) ], wherein/>The relation weight vector RW is then sent to the history information vector calculation module.

The history information vector calculation module is used for processing the first K-1 eigenvectors Z (K') in the eigenvector sequence Z according to the relation weight vector RW to obtain eigenvectors Represents the vector outer product and then for K-1 eigenvectors/>And carrying out summation pooling (sumpooling) to obtain a historical information vector HI, and then sending the obtained historical information vector HI to a vector splicing module.

The vector splicing module is used for splicing the historical information vector HI and the characteristic vector Z (K) in the characteristic vector sequence Z, and sending the obtained spliced vector Con to the multi-layer perceptron.

The multi-layer perceptron is used for processing the spliced vector Con to obtain an abnormal working condition serial number corresponding to the input characteristic vector sequence.

S106: training DPTRN a model:

And training the DPTRN model constructed in the step S105 by adopting the training sample obtained in the step S104 to obtain a trained DPTRN model.

In order to improve the convergence speed and convergence stability of the invention, the embodiment adopts an Adam optimization strategy in training DPTRN models, and the strategy has the advantages of high calculation efficiency, small memory requirement and stable gradient propagation. In addition, in order to improve the robustness and generalization of the model, the model introduces L2 regularization and Dropout strategies in the training process, so that the possible over-fitting tendency in the training process is avoided.

S107: diagnosis of abnormal data:

When the abnormal data diagnosis needs to be carried out on the industrial production system, the same working data acquisition method as that in the step S101 is adopted to obtain M-dimensional feature vectors X (T-K) at the current moment T and the previous K-1 moments, so as to form a data matrix X _T:

The data matrix X _T is normalized by the same method in step S103 to obtain a normalized data matrix Matrix/>, dataInputting the DPTRN model trained in the step S106 to obtain an abnormal data diagnosis result.

In order to better illustrate the technical effects of the invention, the invention is difficult to experiment by adopting a specific example. In this embodiment, two datasets are used, a TE chemical process dataset and a KDDCUP dataset, respectively.

The TE chemical process is a real chemical process. The TE chemical process comprises five main units: the reactor, condenser, compressor, separator and stripper have been widely used for verification of various abnormal data diagnosis methods because of their complex internal mechanisms. The whole TE chemical process mainly comprises 22 continuous process measurement variables, 19 component measurement variables and 12 operation variables, and can simulate normal working conditions and 20 abnormal working conditions.

KDDCUP is a competition in the field of annual machine learning and data mining organized by ACM and SIGKDD, KDDCUP99 is the dataset used in 1999 competition. The data set collects 9 weeks of network connection data from a simulated united states air force local area network, and has both normal and attack types. The dataset had 41 features, 9 of which were discrete features, the remaining features being continuity features. Since the data set is acquired strictly in time series, it is widely used for the study of the time series data method.

In the experiment, an abnormal data diagnosis method based on 4 models is adopted, wherein the 4 models are MLP, LSTM+MLP, bi_LSTM+MLP and 1DCNN+MLP respectively. In order to ensure that the experimental result is only affected by the feature extraction part, the classification layers of the network structures adopt MLPs with the same structure. Table 1 is a structural information table of 4 models in this experiment.

TABLE 1

Furthermore, on the basis of the DPTRN model employed in the present invention, two models of DPTRN _a using no position vector, DPTRN _b using a position vector but not decoupling were set as comparison methods. On performance parameters, single sample training time, reasoning time, recall rate, accuracy and F1 value in the experiment are used as evaluation parameters.

Table 2 is a table of anomaly data diagnostic performance parameters for TE chemical process datasets for the present invention and 6 comparative methods.

TABLE 2

Table 3 is a table of the anomaly data diagnostic performance parameters for the KDDCUP data set for the present invention and for the 6 comparison methods.

TABLE 3 Table 3

As shown in tables 2 and 3, under the condition of approximate parameter quantity, the invention can have better abnormal data diagnosis effect under the condition of ensuring less training time and reasoning time expenditure due to the characteristic of parallel computation.

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The abnormal data diagnosis method based on the deep parallel time sequence relation network is characterized by comprising the following steps of:

The relation module comprises K-1 relation units, and the kth relation unit is used for calculating and obtaining the preliminary relation weight between the characteristic vector z (K') and the characteristic vector z (K) Thereby obtaining the preliminary relation weight vectorEach relationship unit comprises a vector stitching unit and a multi-layer perceptron unit, wherein:

Wherein contact () represents vector concatenation;

dpe(k′)＝[PE(k′)·Pos_query]⊙[PE(K)·Pos_key]

2. The abnormal data diagnosis method according to claim 1, wherein the standardized calculation formula in step S2 is:

3. The abnormal data diagnosis method according to claim 1, wherein the multi-layered sensor units among the K-1 relationship units in step S5 share weight parameters.

4. The abnormal data diagnosis method according to claim 1, wherein the position encoding module in step S5 employs an absolute position encoding method in a BERT model.