CN113988268B

CN113988268B - Heterogeneous multi-source time sequence anomaly detection method based on unsupervised full attribute graph

Info

Publication number: CN113988268B
Application number: CN202111296531.6A
Authority: CN
Inventors: 陈景龙; 冯勇; 訾艳阳
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-11-03
Filing date: 2021-11-03
Publication date: 2024-04-05
Anticipated expiration: 2041-11-03
Also published as: CN113988268A

Abstract

The invention discloses a heterogeneous multi-source time sequence anomaly detection method based on an unsupervised full attribute graph, which comprises the steps of obtaining heterogeneous multi-source time sequence historical data under a healthy running state from a plurality of monitored devices, and obtaining a heterogeneous multi-source time sequence historical sample through preprocessing; constructing a graph node characteristic matrix, combining multi-source time sequence data and priori knowledge to generate an adjacency matrix, and converting a heterogeneous multi-source time sequence historical sample into a graph structure sample to serve as a training set; constructing an unsupervised full-attribute graph anomaly detection model, and optimizing the model by using a training set; heterogeneous multi-source time sequence data of equipment to be analyzed are obtained, preprocessing, graph structure sample conversion and data enhancement operation are carried out, and a test set is obtained; and carrying out abnormal reasoning on the test set by using the optimized model, and determining an abnormal detection result according to the threshold value. The invention embeds the multi-source time sequence data into the graph convolution operation, and provides an effective and universal scheme for heterogeneous multi-source time sequence anomaly detection on different devices.

Description

Heterogeneous multi-source time sequence anomaly detection method based on unsupervised full attribute graph

Technical Field

The invention relates to the field of time sequence anomaly detection, in particular to a heterogeneous multi-source time sequence anomaly detection method based on an unsupervised full attribute map.

Background

The time sequence data is a main expression form of a mechanical equipment signal, carries equipment operation health state information, and can realize abnormality detection of equipment parts, components and systems by means of a time sequence analysis technology. In the anomaly detection of major equipment, in order to improve the characterization capability of monitoring data, a plurality of sensors of the same or different types are generally adopted to obtain multi-source time sequence data which comprehensively reflects the running state of equipment, so that the reliability of the anomaly detection result is improved. However, the characteristics of redundancy, high nonlinearity and the like of the characteristics of the multi-source data prevent effective fusion of the time series characteristics of the multi-sensor, bring difficulty to comprehensive evaluation of equipment states, and reduce the credibility and the interpretability of the multi-source time sequence anomaly detection result. Furthermore, the multi-source timing of different monitored objects is often heterogeneous, which presents challenges for the versatility and portability of anomaly detection algorithms. Therefore, there is a need to develop an efficient and versatile heterogeneous multi-source timing anomaly detection method.

Traditional time sequence anomaly detection algorithms (such as isolated forests, single-class support vector machines, local outlier factors, k-neighbor algorithms and the like) are low in efficiency and difficult to comprehensively utilize multi-source information when processing multi-source time sequences, and the inferred anomaly detection results are unreliable. In recent years, deep learning technology with feature automatic extraction capability is applied to the field of anomaly detection, and adaptive fusion algorithms of multi-source time sequence features, such as multi-channel convolutional neural networks, multi-feature long-short-term memory networks, multi-channel automatic encoders and the like, are developed. However, multi-source fusion strategies are difficult to formulate due to poor deep learning interpretability; the fixed Euclidean space input structure also makes the model unable to flexibly process heterogeneous multi-source input, thereby restricting the universality of the model. Therefore, it is necessary to study an abnormality detection method capable of effectively processing heterogeneous multi-source time series data.

Disclosure of Invention

The invention aims to provide a heterogeneous multi-source time sequence anomaly detection method based on an unsupervised full attribute graph, so as to overcome the defects of the prior art.

In order to achieve the above purpose, the invention adopts the following technical scheme:

the heterogeneous multi-source time sequence anomaly detection method based on the unsupervised full attribute map comprises the following steps:

step 1: heterogeneous multi-source time sequence historical data under a healthy running state are obtained from a plurality of monitored devices, and heterogeneous multi-source time sequence historical samples are obtained by preprocessing the heterogeneous multi-source time sequence historical data;

step 2: converting the heterogeneous multi-source time sequence historical sample in the step 1 into a graph structure sample as a training set;

step 3: constructing an unsupervised full-attribute graph anomaly detection model;

step 4: optimizing the model constructed in the step 3 by using the training set in the step 2;

step 5: acquiring multi-source time sequence data of equipment to be analyzed, preprocessing according to the step 1, converting into a graph structure sample according to the step 2, and applying data enhancement operation as a test set;

step 6: and (3) carrying out abnormal reasoning on the test set in the step (5) by using the optimized model in the step (4), and determining an abnormal detection result according to the threshold value.

Further, the preprocessing in step 1 includes a sliding time window operation and a normalization operation, wherein the sliding time window operation divides the multi-source time sequence historical data of the long sequence into multi-source time sequence historical samples with fixed length of L, and time window overlapping is not allowed; the normalization operation formula is as follows:

in the method, in the process of the invention,j source, j=1, S for the j source in the i-th heterogeneous multi-source timing history sample prior to normalization _i ，S _i Is x' _i Multiple source number of->For the normalized ith heterogeneous multi-source timing history sample, I.I ₂ Is 2 norms.

Further, the steps of2, converting the heterogeneous multi-source time sequence history sample into a graph structure sample, wherein the graph structure sample is expressed as g= (x) by constructing a graph node feature matrix and an adjacent matrix ^V ,A)，x ^V The characteristic matrix is a graph node characteristic matrix, and A is an adjacent matrix;

for the ith graph structure sample G _i Has S _i A plurality of nodes corresponding to the sources of the heterogeneous multi-source time sequence sample, wherein the characteristic matrix of the nodes of the sample is expressed asEquivalent to the pretreated multisource timing history sample +.>The adjacency matrix is expressed as +.>The specific steps for constructing the adjacency matrix are as follows:

first, for samples of consistent structure in the training set, the distance between any two sources is calculated as follows:

in the method, in the process of the invention,represents the ith multi-source timing history sample jth ₁ Source x _i (j ₁ ) And j th ₂ Source x _i (j ₂ ) Is a distance of (2);

second, a distance matrix is generatedThe following are provided:

wherein D (j) ₁ ,j ₂ ) Is the j th matrix D ₁ Line j ₂ Values of columns, N ^tr A number of samples that are structurally uniform;

then, all distances in the matrix D are arranged in order from large to smallAnd calculates an initial adjacency matrix a' as follows:

wherein τ _D The distance threshold is the J-th largest distance, i.e. τ _D ＝D _(J) ；

And finally, combining the prior knowledge of the monitored equipment, adjusting the A' according to expert experience, increasing the relation between the graph nodes corresponding to the related sources, and reducing the relation between the graph nodes corresponding to the uncorrelated sources to obtain an adjacency matrix A of the graph structure sample.

Further, in the step 3, the main body of the unsupervised full-attribute graph anomaly detection model is a variogram self-encoder, and comprises a graph encoder and an graph encoder;

the graph encoder comprises three graph convolution layers, an activation function and a re-parameterization function, and the encoding process is formulated as follows:

z＝f _rep (μ,σ)

wherein H is _E For picture convolution in picture encoderThe output of the layers, u and sigma, are the graph node feature matrix x, respectively ^V The mean and variance of the implicit spatial features of (c), z is a parameterized implicit spatial feature, respectively three picture scroll layers in the encoder, f _relu For ReLU activation function, f _rep For the parameterized function, the formulation is:

f _rep (μ,σ)＝μ+σ⊙ε

wherein ε -N (0,I) are random variables distributed in Gaussian, and the addition is Hadamard product;

the picture decoder comprises two picture convolution layers, an activation function, and the decoding process is formulated as follows

In the method, in the process of the invention,representing the reconstructed graph node feature matrix, +.>Respectively representing two picture scroll layers in the decoder.

Further, the graph encoder and graph volume stacking in the graph encoder include two operations: message aggregation and status update;

the message aggregation process is as follows:

in the method, in the process of the invention,convolving layer for the first graphImplicit state of the middle node v->For the neighbor node set of node v, +.>Representing the number of neighbor nodes of node v, +.>The output at node w is laminated for the (l-1) th picture volume, +.>Is a trainable weight matrix;

the status update procedure is as follows:

wherein f _tanh For the activation function of Tanh,is a trainable weight matrix, +.>For the updated state of node v, +.>The output at node v is layered for the (l-1) th graph volume.

Further, the model optimization objective in step 4 is to minimize the training loss as follows:

in the method, in the process of the invention,for mathematical expectations, KL [ q (z|x) ^V ,A)||p(z)]For posterior distribution q (z|x) ^V KL divergence between a) and a priori distribution p (z), lovp (a|z) being the distribution of the adjacency matrix reconstruction, lovp (x) ^V Z) is the distribution of graph node feature matrix reconstruction.

Further, the data enhancement applied to the graph structure sample in step 5 specifically includes horizontal flipping, vertical flipping, and center flipping operations performed on the components in the feature matrix of each graph node, which are respectively formulated as:

in the method, in the process of the invention,representing the node characteristic matrix +_>The kth component, f of the jth node feature vector in (a) _h 、f _v 、f _c Respectively representing the horizontal overturning, the vertical overturning and the center overturning operation of the graph structure sample;

graph structure sample G _i The data enhancement performed is expressed as:

further, step 6 performs abnormal reasoning on the test set to obtain abnormal state evaluation scores of each sample of the test set, which are expressed as follows:

wherein s is _AD (G _i ) Representing the diagram structural sample G _i KL (μ) _i ,σ _i ) For the KL loss calculated from the mean and variance of the i-th sample hidden space representation,reconstructing a node feature matrix for an ith sample,/->A reconstructed adjacency matrix for the ith sample, denoted as

Wherein z is _i For the re-parameterized hidden space representation of the i-th graph structure sample,is z _i Transposed, sigma _s Is an S-type activation function.

Further, the abnormality detection result of step 6 is expressed as:

where τ is a threshold value of the abnormal state evaluation score, s _AD (f _h (G _i ))、s _AD (f _v (G _i ))、s _AD (f _c (G _i ) Graph structure sample G with horizontal overturn, vertical overturn and center overturn respectively _i Is a anomaly score of (2).

Compared with the prior art, the invention has the following beneficial technical effects:

1) The heterogeneous multi-source time sequence data is processed by adopting the graph neural network, the sources are corresponding to the graph nodes, the adjacency matrix is established by combining the priori knowledge, and the characteristics of the multi-source time sequence data are effectively fused by utilizing the connection among the nodes through the message transmission in the graph network layer;

2) The invention applies the graph network to the self-encoder, and proposes to utilize the node characteristic matrix reconstruction loss, the adjacency matrix reconstruction loss, the KL divergence and other optimization models to force the models to extract the common factors of the normal samples so as to realize the effective identification of the abnormal samples;

3) The heterogeneous multi-source time sequence anomaly detection method based on the unsupervised full attribute graph can process isomorphic and heterogeneous multi-source time sequence data simultaneously, greatly improves the universality of the method, and can flexibly select multi-source time sequence data with different structures in training and testing stages without redesigning a model.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of an adjacency matrix construction process in accordance with the method of the present invention;

FIG. 3 is a schematic diagram of an anomaly detection model constructed by the method of the present invention;

fig. 4 is a statistical chart of anomaly detection results of an embodiment of the method of the present invention, wherein (a) is a result distribution of the method of the present invention when the input source is 10, and (b) is a result distribution of the method of the present invention when the input source is 8.

Detailed Description

The present application is described in further detail below in conjunction with the drawings and examples to provide a better understanding of the present invention to those skilled in the art. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. For convenience of description, only parts related to the related invention are shown in the drawings. It should also be noted that, without conflict, the embodiments and features of the embodiments in the present application may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

As shown in fig. 1, the invention provides a heterogeneous multi-source time sequence anomaly detection method based on an unsupervised full attribute map, which comprises the following steps:

step 1: heterogeneous multi-source time sequence historical data under a healthy running state are obtained from a plurality of monitored devices, and a heterogeneous multi-source time sequence historical sample is obtained through preprocessing.

As can be seen from fig. 1, the multi-source time sequence historical data can be from the same device to obtain isomorphic multi-source time sequence data, or from different devices to obtain heterogeneous multi-source time sequence data, and these are all historical data and can be used as training.

Preprocessing heterogeneous multi-source time sequence historical data comprises the following steps: sliding time window operation and normalization operation. The sliding time window operation divides the multi-source time sequence historical data of the long sequence into multi-source time sequence historical samples with fixed length of L, so that sample information is independent, and time window overlapping is not allowed; the normalization operation may be formulated as follows:

in the method, in the process of the invention,for the jth source, S in the ith heterogeneous multi-source time sequence history sample before normalization _i Is x' _i Multiple source number of->For the normalized ith heterogeneous multi-source timing history sample, I.I ₂ Is 2 norms.

The method aims at avoiding model reasoning deviation caused by overlarge data difference of sensors in different modes, and can reduce the influence of sample amplitude by dividing a long-time sequence into short samples and then carrying out normalization processing.

Step 2: and (3) converting the heterogeneous multi-source time sequence historical sample in the step (1) into a graph structure sample serving as a training set.

The graph structure sample conversion comprises two steps of constructing a graph node characteristic matrix and an adjacent matrix. The graph structure sample is represented as g= (x ^V ,A)，x ^V For the graph node feature matrix, a is the adjacency matrix. For the ith sample G _i Has S _i Each node corresponds to each source of the heterogeneous multi-source time sequence sample, and the characteristic matrix of the graph node can be expressed asEquivalent to the pretreated multisource timing history sample +.>The adjacency matrix can be expressed as +.>The matrix contains graph structure samples G _i A relationship between any two nodes. The construction can be seen in fig. 2, and the specific steps are as follows:

first, for all isomorphic multi-source timing history samples, the distance between any two sources is calculated as follows:

in the method, in the process of the invention,represents the ith multi-source timing history sample jth ₁ Source x _i (j ₁ ) And j th ₂ Source x _i (j ₂ ) Is a distance of (3).

Second, a distance matrix is generatedThe following are provided:

wherein D (j) ₁ ,j ₂ ) Is the j th matrix D ₁ Line j ₂ Values of columns, N ^tr A structurally consistent number of samples in the training set.

wherein τ _D The distance threshold is the J-th largest distance, i.e. τ _D ＝D _(J) 。

And finally, combining the prior knowledge of the monitored equipment, adjusting the A' according to expert experience, increasing the relation between the graph nodes corresponding to the related sources, and reducing the relation between the graph nodes corresponding to the uncorrelated sources to obtain an adjacency matrix A of the graph structure sample. As shown in fig. 2, the links between node 1 and node 2, node 4 and node 5 are added based on expert experience, as it is inferred from the monitoring device structure that there should be links between these nodes.

Step 3: and constructing an unsupervised full attribute map anomaly detection model.

As shown in FIG. 3, the body of the unsupervised full attribute map anomaly detection model is a variogram self-encoder, and comprises a map encoder and a graphic encoder. Based on the variational self-encoder and the graph neural network, we propose a full-attribute graph self-encoder structure for anomaly detection of heterogeneous multi-source timing.

z＝f _rep (μ,σ)，

wherein H is _E For the output of the graph roll layer in the graph encoder, u and sigma are the graph node feature matrix x respectively ^V The mean and variance of the hidden space representation of (c), z is a parameterized hidden space representation,respectively three picture scroll layers in the encoder, f _relu For ReLU activation function, f _rep For the parameterized function, one can formulate:

f _rep (μ,σ)＝μ+σ⊙ε，

wherein ε to N (0,I) are random variables distributed in Gaussian, and ε to N are Hadamard products.

In the above modules, the picture scroll lamination is the main structure, which includes two operations: message aggregation and status update. The message aggregation process is as follows:

in the method, in the process of the invention,for the implicit state of node v in the first layer of the graph volume, +.>For the neighbor node set of node v, +.>Representing the number of neighbor nodes of node v, +.>Output of the layer for the (l-1) th picture volume, +.>Is a trainable weight matrix.

The status update procedure is as follows:

Step 4: and (3) optimizing the model constructed in the step (3) by using the training set in the step (2).

Model optimization objectives consist of three parts: the graph node characteristic reconstruction loss, the adjacency matrix reconstruction loss and the hidden space represent the distribution error, and can be formulated as follows:

in the method, in the process of the invention,for mathematical expectations, KL [ q (z|x) ^V ，A)||p(z)]For posterior distribution q (z|x) ^V KL divergence between a) and a priori distribution p (z), lovp (a|z) being the distribution of the adjacency matrix reconstruction, lovp (x) ^V Z) is the distribution of graph node feature matrix reconstruction.

Step 5: and (3) acquiring multi-source time sequence data of the equipment to be analyzed, preprocessing according to the step (1), converting into a graph structure sample according to the step (2), and applying data enhancement operation as a test set.

The data enhancement applied to the graph structure samples is based on the continuous characteristic of the time sequence, the characteristics of the graph structure samples cannot be changed when the graph structure samples are turned over, and the robustness of the model can be improved through multi-angle sample input, so that the graph structure samples are insensitive to the amplitude. The method specifically comprises the operations of horizontal overturning, vertical overturning and center overturning of the characteristic vector of each graph node, and the operations are respectively formulated as:

in the method, in the process of the invention,representing the node characteristic matrix +_>The kth component, f of the jth node feature vector in (a) _h 、f _v 、f _c Respectively representing the horizontal turning, the vertical turning and the center turning operation of the graph structure sample.

Graph structure sample G _i The data enhancement performed can be expressed as:

Performing abnormal reasoning on the test set to obtain abnormal state evaluation scores of all samples of the test set, wherein the abnormal state evaluation scores are expressed as follows:

wherein s is _AD (G _i ) Representing the diagram structural sample G _i KL (μ) _i ，σ _i ) For the KL loss calculated from the mean and variance of the i-th sample hidden space representation,reconstructing a node feature matrix for an ith sample,/->The reconstructed adjacency matrix for the ith sample can be expressed as

The final anomaly detection result can be expressed as:

The invention is described in further detail below in connection with specific examples:

in the present embodiment, three abnormality detection scenarios are considered: 1) Isomorphic multisource data of the same device; 2) Isomorphic multisource data for different devices; 3) Heterogeneous multi-source data for different devices. Under the scene 1), 500 historical samples and 2000 test samples from the same mechanical equipment are adopted, and multi-source time sequence data are acquired by 10 sensors; scene 2) 2000 historical samples and 3000 test samples from 7 mechanical devices are adopted, and multi-source time sequence data are acquired by 8 sensors; scenario 3) using 2500 historical samples, 5000 test samples from 8 mechanical devices, multi-source time series data was acquired by 10 or 8 sensors. The method provided by the invention is used for preprocessing the sample, converting the sample into a graph structure sample, and respectively performing training test under three scenes to obtain an abnormality detection result shown in table 1. It can be seen that the method provided by the invention can realize accurate anomaly detection in three different scenes, the false alarm rate is not more than 5%, the anomaly detection rate can reach more than 99%, and the AUC is 1.0, which indicates that the method has high sensitivity to anomaly data. In addition, from the experimental result of the scene 3), the method provided by the invention has universality for heterogeneous multi-source time sequence anomaly detection of different devices. In addition, fig. 4 (a) and (b) respectively show the distribution of the anomaly detection results of two types of multi-source time sequences in the scene 3), which shows the effectiveness of the method provided by the invention for anomaly detection of heterogeneous multi-source time sequence data.

TABLE 1 heterogeneous multisource timing anomaly detection results

While the foregoing describes illustrative embodiments of the present invention to facilitate an understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited to the scope of the embodiments, but is to be construed as protected by the accompanying claims insofar as various changes are within the spirit and scope of the present invention as defined and defined by the appended claims.

Claims

1. The heterogeneous multi-source time sequence anomaly detection method based on the unsupervised full attribute map is characterized by comprising the following steps of:

the preprocessing comprises a sliding time window operation and a normalization operation, wherein the sliding time window operation divides multi-source time sequence historical data of a long sequence into multi-source time sequence historical samples with fixed length L, and time window overlapping is not allowed; the normalization operation formula is as follows:

in the method, in the process of the invention,j source, j=1, S for the j source in the i-th heterogeneous multi-source timing history sample prior to normalization _i ，S _i Is x' _i Multiple source number of->For the normalized ith heterogeneous multi-source timing history sample, I.I ₂ Is 2 norms;

converting the heterogeneous multi-source time sequence history sample into a graph structure sample, wherein the graph structure sample comprises a graph node characteristic matrix and an adjacent matrix, and is expressed as G= (x) ^V ,A)，x ^V The characteristic matrix is a graph node characteristic matrix, and A is an adjacent matrix;

second, a distance matrix is generatedThe following are provided:

Finally, combining the prior knowledge of the monitored equipment, adjusting A' according to expert experience, increasing the relation between the graph nodes corresponding to the relevant sources, and reducing the relation between the graph nodes corresponding to the irrelevant sources to obtain an adjacency matrix A of the graph structure sample;

the main body of the unsupervised full-attribute graph anomaly detection model is a variogram self-encoder, and comprises a graph encoder and a graph encoder;

z＝f _rep (μ,σ)

wherein H is _E For the output of the graph roll layer in the graph encoder, u and sigma are the graph node feature matrix x respectively ^V The mean and variance of the implicit spatial features of (c), z is a parameterized implicit spatial feature, respectively three picture scroll layers in the encoder, f _relu Activating functions for ReLUNumber f _rep For the parameterized function, the formulation is:

f _rep (μ,σ)＝μ+σ⊙ε

In the method, in the process of the invention,representing the reconstructed graph node feature matrix, +.>Respectively representing two picture scroll layers in a decoder;

the graph encoder and graph convolution layer in the graph encoder include two operations: message aggregation and status update;

the message aggregation process is as follows:

in the method, in the process of the invention,for the implicit state of node v in the first layer of the graph volume, +.>For the neighbor node set of node v, +.>Representing the number of neighbor nodes of node v, +.>The output at node w is laminated for the (l-1) th picture volume, +.>Is a trainable weight matrix;

the status update procedure is as follows:

wherein f _tanh For the activation function of Tanh,is a trainable weight matrix, +.>For the updated state of node v,laminating the output at node v for the (l-1) th graph volume;

model optimization targets were to minimize training loss as follows:

in the method, in the process of the invention,for mathematical expectations, KL [ q (z|x) ^V ,A)||p(z)]For posterior distribution q (z|x) ^V KL divergence between a) and a priori distribution p (z), lovp (a|z) being the distribution of the adjacency matrix reconstruction, lovp (x) ^V Z) is a graph node specialDistribution of the reconstruction of the sign matrix;

2. The method for detecting heterogeneous multi-source time sequence anomaly based on an unsupervised full attribute map according to claim 1, wherein the data enhancement applied to the map structure sample in step 5 specifically includes horizontal flipping, vertical flipping and center flipping operations performed on components in the feature matrix of each map node, which are respectively formulated as:

graph structure sample G _i The data enhancement performed is expressed as:

3. the heterogeneous multi-source time sequence anomaly detection method based on the unsupervised full attribute map according to claim 2, wherein the anomaly state evaluation score of each sample of the test set is obtained by performing anomaly reasoning on the test set in step 6, and is represented as follows:

Wherein z is _i For the ith figureA re-parameterized hidden space representation of the structure sample,is z _i Transposed, sigma _s Is an S-type activation function.

4. The heterogeneous multi-source timing anomaly detection method based on an unsupervised full attribute map according to claim 2, wherein the anomaly detection result in step 6 is expressed as: