CN117725543A

CN117725543A - Multi-element time sequence anomaly prediction method, electronic equipment and storage medium

Info

Publication number: CN117725543A
Application number: CN202410179705.8A
Authority: CN
Inventors: 李静; 刘畅; 王静; 丁建立
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2024-02-18
Filing date: 2024-02-18
Publication date: 2024-03-19
Anticipated expiration: 2044-02-18
Also published as: CN117725543B

Abstract

The invention relates to the field of computer technology application, in particular to a multivariate time sequence anomaly prediction method, electronic equipment and a storage medium, comprising the following steps: inputting the multi-element time sequence X of the monitored server which needs to be predicted currently into a data embedding module of a multi-element time sequence abnormal prediction model to obtain a corresponding data embedding result; inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; and determining whether the monitored server is abnormal in w time stamps after the prediction time and an index of the abnormality based on at least X and a decoding result. The invention can accurately predict whether the monitored server generates abnormality or not and the specific abnormal indexes in w time stamps after the prediction time.

Description

Multi-element time sequence anomaly prediction method, electronic equipment and storage medium

Technical Field

The present invention relates to the field of computer technology, and in particular, to a method for predicting multiple time series anomalies, an electronic device, and a storage medium.

Background

With the rapid development of computing technology, complex systems (e.g., social networks, cloud computing, water quality detection, etc.) have become more complex and sensitive. To ensure the reliability of these systems, various performance counters and/or sensors are widely used to closely monitor the status of the running objects (e.g., servers, services). These monitoring data are collected at equal time intervals and form a Multiple Time Series (MTS); each monitoring indicator forms a univariate time series. If some error or failure occurs in the system, such as network overload, application error, hardware failure, the monitoring data will be abnormal (e.g., surge, steep rise or fall). System failures can lead to service outages, data loss, and significant economic loss.

The current focus is mainly on identifying abnormal behavior (i.e., abnormal detection) in the MTS, which helps the operator find and recover from a fault after it occurs. However, when an anomaly is detected, a fault has occurred and the reliability of the system has been reduced. In contrast, predicting an anomaly before the anomaly actually occurs may inform the operator to take action in advance. In addition, the operator may be aided in improving the efficiency of solving the underlying problem by locating a set of most abnormal indicators to account for the predicted anomalies. Thus, with adequate predictive performance, MTS anomaly prediction and interpretation can significantly improve the reliability of complex systems.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

the embodiment of the invention provides a multi-element time sequence anomaly prediction method, which comprises the following steps:

s100, inputting a multi-element time sequence X= (Xt-w+1, xt-w+2, … …, xi, … … and Xt) of a monitored server to be predicted into a data embedding module of a multi-element time sequence abnormal prediction model so as to perform position embedding and space embedding on data in X, and obtaining a corresponding data embedding result; wherein Xi is a multi-element time sequence corresponding to the ith timestamp before the prediction time t, xi= { Xi1, xi2, … …, xis, … …, xin }, i has a value of t-w+1 to t, t is the prediction time, and w is the prediction window size; xis is the value corresponding to the s-th monitoring index in Xi, the value of s is 1 to n, and n is the number of the monitoring indexes of the monitored server.

S200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are connected in sequence and have the same structure, wherein each encoding module at least comprises a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module which are connected in sequence; the first coding module is also connected with the data embedding module.

S300, inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; the decoder at least comprises a first linear layer, a dimension relation learning module and a second linear layer which are sequentially connected, wherein the first linear layer is connected with the third coding module; the decoding result comprises a first result obtained by the first linear layer, an inter-dimension relation dependency matrix obtained by the inter-dimension relation learning module and a second result obtained by the second linear layer.

S400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the inter-dimension relation dependency matrix and the second result.

The multivariate time sequence anomaly prediction model is a model which is obtained based on sample data training of a monitored server.

The invention has at least the following beneficial effects:

according to the method, the time dependence and the inter-dimensional dependence in the multi-element time sequence are jointly learned by adopting the transducer and the graph annotation force network framework, so that the prediction result can be more accurate.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a multivariate time series anomaly prediction method according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

For the multivariate time series MTS, the inventors of the present invention noted that three factors lead to more complex predictions and interpretations of MTS anomalies:

(1) There is a lack of sufficient samples of anomaly markers. In long-term stable operation of complex systems, there are relatively few failure events, and abnormal samples in the monitoring stream are typically covered by large amounts of regulatory data. It is impractical and expensive to manually identify and mark anomalies.

(2) Complex patterns of MTS in modern systems. With the increasing number of monitoring metrics, the multi-component time series has been transformed into a high-dimensional time series with complex patterns. This requires that the predictive model not only identify the abnormal changes of each monitored indicator over the long term, but also mine the relationships between these indicators.

(3) Subtle changes before abnormal behavior. The fault (or anomaly) does not occur suddenly, but gradually affects the monitoring indicator, such as a memory leak fault. In order to accurately identify the warning signal of an impending anomaly in a timely manner, it is necessary to capture subtle changes in the MTS that are not readily detectable.

In view of the above problems, the present invention provides a multi-component time series anomaly prediction scheme, which aims to predict whether anomalies will occur in the monitored server in the upcoming time, and how to find a set of most relevant indexes to explain the upcoming anomalies. The scheme learns both time and dimensional dependencies in MTS through a transducer and a graph-annotation network and adopts a multitasking resistance training strategy to amplify differences between normal and abnormal.

Further, as shown in fig. 1, the method for predicting multi-element time series anomalies provided by the embodiment of the invention may include the following steps:

s100, the multivariate time sequence X= (X) of the monitored server needing prediction currently _t-w+1 ,X _t-w+2 ,……,X _i , ……,X _t ) Inputting the data into a data embedding module of the multivariate time sequence abnormal prediction model to perform position embedding and space embedding on the data in X to obtain a corresponding data embedding result; wherein X is _i To predict the multiple time series corresponding to the ith timestamp before time t, X _i ={x _i1 ，x _i2 ，……，x _is ，……，x _in The value of i is t-w+1 to t, t is the prediction time, and w is the prediction window size; x is x _is Is X _i The value of s is 1 to n, and n is the number of the monitoring indexes of the monitored server.

In the embodiment of the invention, the monitoring index of the monitored server can be a parameter representing the performance of the monitored server, the condition occurring inside the monitored server can be better known through the monitoring index, and all indexes comprehensively represent the state of the monitored server. The monitoring index of the monitored server may be determined based on actual situations, for example, the monitoring index of the monitored server may include broadband usage, CPU usage, network IO, disk usage, writing amount of the hard disk, and the like, and the present invention is not particularly limited. In the embodiment of the invention, the monitoring index of the monitored server can be obtained based on the existing mode, for example, the monitoring index is obtained by sampling the monitoring index by a sampler built in the monitored server.

In the embodiment of the present invention, the intervals before two adjacent time stamps are the same, and the specific interval may be set based on actual needs, for example, may be 5 minutes or 1 hour.

Multidimensional time series have two dependencies: time dependence, representing the correlation of data in each index along with time; spatial dependence characterizes the correlation between different index data at the same time. An unexpected point or subsequence that violates these dependencies is often considered an exception.

The invention predicts MTS abnormality based on sliding window. For each timestamp t, the monitoring data collected in its first w timestamps is constructed as one sample, i.e. x= (X) _t-w+1 ,X _t-w+2 ,……,X _i , ……,X _t ）。

Further, in the embodiment of the present invention, in order to be able to learn the time dependency and the inter-index dependency in the monitored data, the present invention embeds two kinds of information: (a) time embedding along time; (b) monitoring data space embedding between metrics. In addition, to reduce the effects of outlier data that may be mixed into the training data, the present invention introduces an additional causal convolution to smooth the raw input data. Specifically, the data embedding result is obtained based on the embedding layer, and specifically the following conditions can be satisfied:

Embedding（X）=Cov（PE（X）+SE（X）+Cacov（X））。

wherein, the data Embedding result with the Embedding (X) being X is PE (X) represents time or sequence information of adding each monitoring index in the time dimension of X, SE (X) represents information between adding indexes in each time stamp, and SE (X) =w _se ×CosSim（X），W _se For a learnable weight matrix of size n×w, cosSim (X) is a matrix composed of cosine similarities between different monitoring indexes in X, and each element in CosSim (X) isCorresponding to cosine similarity between two monitoring indexes. Cacov (X) denotes performing causal convolution operations on X for smoothing input data, thereby improving the ability of the graph transformer to capture normal modes in MTS. Cov () represents a one-dimensional convolution operation for converting embedded data into a high-dimensional space (i.eWei (dimension)>>n), which enables the graph transformer to learn the local patterns useful for capturing the underlying structure of the MTS data.

S200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are sequentially connected and have the same structure, wherein each encoding module at least comprises three sub-layers which are sequentially connected, and the sub-layers are a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module respectively; the first coding module is also connected with the data embedding module, namely, the input of the first coding module is the output of the data embedding module, the output of the first coding module is used as the input of the second coding module, and the output of the second coding module is used as the input of the third coding module.

In the embodiment of the invention, residual connection and layer standardization are applied to each sub-layer of each coding module. To facilitate residual connection, the number of dimensions of each sub-layer is set toThe same number of dimensions as the embedded layer.

In embodiments of the present invention, the multi-headed attention mechanism module may be an 8-headed multi-headed attention mechanism module to capture the richer time dependence from multiple angles. It is known to those skilled in the art that the specific operation of the multi-head attention mechanism module may be prior art, i.e. dividing the input data into multiple heads, performing self-attention calculation on each head, and finally splicing the results of the multiple heads to obtain the final output. The expression is as follows:

MultiHeadAtt（Q，K，V）=Contact（H ₁ ，H ₂ ，……，H ₈ ）。

wherein MultiHeadAtt (Q, K, V) is the output of the multi-head attention mechanism module, H _p For the output of the p-th sub-head attention mechanism module, p takes on a value of 1 to 8,H _p =softmax(（Q _p K ^T _p /d ^1/2 ）V _p ) Q, K, V are different linear transformations of the input of the multi-headed attention mechanism module, specifically a query matrix, a key matrix, and a value matrix, which are convolved with different convolution kernels. Q (Q) _p 、K _p And V _p For different linear transformations of the input of the p-th sub-head attention mechanism module, contat () represents a concatenation operation. Softmax () is an activation function, representing the normalization operation.

In the embodiment of the invention, the attention is paid to d ^1/2 The term scales the weights to reduce the variance of the weights, facilitating stable training.

In order to solve the problem of excessive simplification weight propagation caused by dot products in the attention mechanism, the invention adds a new propagation mechanism to learn time dependence. Specifically, the graph attention is performed on the input time series according to a horizontal relation matrix, and the output result HGAT (Fh) of the horizontal attention module may satisfy the following condition:

HGAT（Fh）=GAT（V，Softmax（QK ^T ））；

wherein Q, K and V are respectively a query matrix, a key matrix and a value matrix obtained by convolving the input Fh of the horizontal attention module with different convolution kernels, and K ^T A transposed matrix of K; softmax () is the activation function, softmax (QK ^T ) Representation pair QK ^T And carrying out normalization operation to form a discrete distribution in each row. GAT () is a mechanism for executing graph attention, GAT (V, softmax (QK) ^T ) Specifically according to Softmax (QK) ^T ) The graph attention mechanism is performed on V to obtain the output of the sub-layer, which includes a dimension conversion process by multiple heads, i.e., d to d/2, to d.

In the embodiment of the invention, in order to better extract the characteristics from the time sequence, the feedforward neural network is optimized so that the characteristics can be extracted on multiple scales. The invention uses 3 convolutions with different kernel sizes and activation function results, and the output result MFFN (Fm) of the multi-scale feedforward feedback meets the following conditions:

MFFN（Fm）=Contat（S ₁ ，S ₂ ，S ₃ ）×W ^o wherein the j-th extraction result S _j =sigmoid（Conv _j （Fm））+tanh（Conv _j (Fm)), j has a value of 1 to 3. Wherein Contat () represents a splicing operation, W ^o For the projection parameters of the full connection layer Linear, sigmoid () and tanh () are activation functions, the activation distribution of different sigmoids is different, and the activation distribution of different tanh is different. Conv _j (Fm) means convolving the input Fm of the multi-scale feed-forward feedback with a j-th convolution kernel.

In the embodiment of the invention, the relationship between dimensions is represented by the relationship learning module between dimensions by adopting a directed graph, and the output result DRLM (Fv) meets the following conditions:

DRLM（Fv）= Softmax(CosSim（TCN(Q)）)；

the Softmax (CosSim (TCN (Q))) represents a dependency relation matrix among dimensions, and captures the dependency learned among different indexes, wherein the size is n×n. Softmax () is an activation function, TCN () is an execution time convolution network, and Q is a query matrix obtained by convolving an input Fv of the inter-dimensional relationship learning module with different convolution kernels.

It is known to those skilled in the art that the structure of the feedforward neural network may be an existing network structure.

S400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the inter-dimension relation dependency matrix and the second result. Further, S400 may specifically include:

s401, obtaining an abnormality determination value S (X) = |X-X2 corresponding to X ⁰ ∣∣·（KL（M，M ⁰ ) A) is provided; wherein X2 ⁰ As a result of the second outcome of the process, |||| means || an absolute value function of the absolute value, i.e., |X-X2 ⁰ || represents X and X2 ⁰ Mean square error between. Represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function, M is a matrix of cosine similarities between different monitor indices in X, M ⁰ Is a dependency matrix among dimensions.

In the embodiment of the invention, only X2 is used in the calculation formula of the abnormal judgment value of X ⁰ Is not necessarily completely close to the input. Meanwhile, errors of the inter-dimension dependency relationship are included in the score, so that early change signals caused by context abnormality and the inter-dimension dependency relationship abnormality can be effectively captured.

S402, if S (X) > S0, determining that the monitored server is abnormal in w time stamps after the prediction time t, and executing S403; otherwise, determining that the monitored server cannot be abnormal in w time stamps after the predicted time t; s0 is a preset abnormal judgment value threshold value.

S403, obtaining the abnormal judgment value IS of the S-th monitoring index _s =k1×∣∣Y _s -Y ⁰ _s ∣∣+k2×KL（M，M ⁰ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein Y is _s Is the index vector corresponding to the s-th monitoring index in X, Y _s =（x _（t-w+1）s ，……，x _is ，……，x _ts ），Y ⁰ _s Is the sum Y of the second result _s A corresponding result; k1 and k2 are a first preset weight and a second preset weight, respectively, and may be empirical values, in one exemplary embodiment k1=0.001, k2=1.

S404, IS ₁ To IS _n And sequencing according to the sequence from big to small, and taking monitoring indexes corresponding to the first K sequenced abnormal judgment values as indexes for generating the abnormality.

In the embodiment of the present invention, K may be set based on actual needs, for example, k=3.

In the embodiment of the invention, the multivariate time series anomaly prediction model is a model trained based on sample data of a monitored server, and can comprise an embedded layer, an encoder and a decoder. The specific structure of the embedded layer, encoder and decoder can be seen from the foregoing description. The structure of the multivariate time series anomaly prediction model facilitates learning potential dependencies from deep multiscale features. Because of its structural features (lack of recursion), the transducer ignores the position and structural information of the original time series when applied directly to the time series. This may reduce its ability to capture sequence information and dependencies in the MTS data. Thus, a data embedding module is added to alleviate this problem.

The training process of the multivariate time series anomaly prediction model can be specifically obtained based on the following steps:

s1, acquiring a sample set; the sample data in the sample set may be a historical multi-dimensional time series of the monitored object.

S2, inputting the sample data of the current batch into the current multivariate time sequence anomaly prediction model for training to obtain a corresponding prediction result. The length of the sample data for each batch is w.

Those skilled in the art will appreciate that the specific implementation of S2 may refer to the specific implementations of S100 to S300 described above. The prediction result may be a decoding result.

S3, acquiring a current loss function value of the current multi-element time sequence abnormal prediction model based on the prediction result of the current batch and the corresponding real result, judging whether the current loss function value accords with a preset model training ending condition, if so, executing the step S5, otherwise, executing the step S4.

In the embodiment of the invention, in order to be more comprehensiveThe anomaly is effectively predicted, the reconstruction error is not directly used, and the two-stage multitasking countermeasure loss is adopted as an optimization target. In the first stage, X1 is actuated ⁰ 、X2 ⁰ Approximating the original input X and driving the relationship matrix close to the original matrix. In the second stage, the final output is optimized away from the input and the difference of the relationship matrix is enlarged, so that the relationship matrix is free from the limitation of the original matrix and the difference between the normal sample and the abnormal sample is enlarged. That is, the multivariate time series anomaly prediction model uses a two-stage loss function during training, wherein the first stage loss function L1= |X-X1 ⁰ ∣∣+α×（KL（M，M ⁰ ）+β×∣∣X-X2 ⁰ ||, second stage loss function l2= |x-X1 ⁰ ∣∣-α/2×（KL（M，M ⁰ ）-β/2×∣∣X-X2 ⁰ And |. Wherein X1 ⁰ As a first result, X2 ⁰ As a result of the second outcome of the process, |||| means || an absolute value function of the absolute value, represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function,the degree of matching of the dependency between the dimensions learned by the loss metric model with the initial relationship. M is a matrix formed by cosine similarity between different monitoring indexes in X, M ⁰ And alpha and beta are super parameters for adjusting weights of different losses respectively for the dependency relation matrix among the dimensions. When->，/><When 0, the optimization aim is to enlarge the difference between the normal and the abnormal samples, so that the difference between the normal samples and the abnormal samples is 10% -30%.

In the embodiment of the present invention, the preset model training ending condition may be set based on actual needs, for example, L1 or L2 may not change any more in a set time period, for example, in a continuous 3-wheel training process, and the training may be ended.

S4, updating parameters of a current time sequence abnormality prediction model based on the current loss function value, and taking sample data of a next batch as sample data of the current batch to execute S2.

And S5, taking the current multivariate time sequence anomaly prediction model as a target multivariate time sequence anomaly prediction model.

In one embodiment of the present invention, S0 may be an empirical value.

In another embodiment of the invention, S0 may be derived based on a test set. Specifically, first, the test set is processed according to the steps S100 to S400, so as to obtain corresponding abnormal judgment values at all the prediction moments. Then, the minimum value of 5% of the data sorted from high to low is set as S0.

In summary, the multivariate time series anomaly prediction model used in the invention utilizes the variant of the transducer, the horizontal drawing attention module and the relationship learning module between dimensions to model the time dependency relationship and the relationship between dimensions simultaneously so as to conduct anomaly prediction, adopts a multitask countermeasure training strategy to expand the difference between normal time points and abnormal time points, can improve the performance of the model, and further improves the performance in the aspects of anomaly prediction, anomaly detection and anomaly interpretation.

According to embodiments of the present invention, the present invention also provides an electronic device, a readable storage medium and a computer program product.

In an exemplary embodiment, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in the above embodiments.

In an exemplary embodiment, the readable storage medium may be a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the above embodiment.

In an exemplary embodiment, the computer program product comprises a computer program which, when executed by a processor, implements the method according to the above embodiments.

Electronic devices are intended to represent various forms of user terminals, various forms of digital computers, such as desktop computers, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

In one exemplary embodiment, the electronic device may include a computing unit that may perform various suitable actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) or a computer program loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device may also be stored. The computing unit and the RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

Further, a plurality of components in the electronic device are connected to the I/O interface, including: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central Processing Units (CPUs), graphics Processing Units (GPUs), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processors, controllers, microcontrollers, and the like. The computing unit performs the various methods and processes described above, such as the multivariate time series anomaly prediction method. For example, in some embodiments, the multivariate time series anomaly prediction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. When the computer program is loaded into RAM and executed by the computing unit, one or more steps of the multivariate time series anomaly prediction method described above may be performed. Alternatively, in other embodiments, the computing unit may be configured to perform the multivariate time series exception prediction method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present invention can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for multivariate time series anomaly prediction, the method comprising the steps of:

s100, the multivariate time sequence X= (X) of the monitored server needing prediction currently _t-w+1 ,X _t-w+2 ,……,X _i , ……,X _t ) Inputting the data into a data embedding module of the multivariate time sequence abnormal prediction model to perform position embedding and space embedding on the data in X to obtain a corresponding data embedding result; wherein X is _i For the multiple corresponding to the ith timestamp before the predicted time tTime series, X _i ={x _i1 ，x _i2 ，……，x _is ，……，x _in The value of i is t-w+1 to t, t is the prediction time, and w is the prediction window size; x is x _is Is X _i The value of s is 1 to n, and n is the number of the monitoring indexes of the monitored server;

s200, inputting the data embedding result into an encoder of a multi-element time sequence abnormal prediction model to obtain a corresponding encoding result; the encoder comprises first to third encoding modules which are connected in sequence and have the same structure, wherein each encoding module at least comprises a multi-head attention mechanism module, a horizontal drawing attention module and a multi-scale feedforward network module which are connected in sequence; the first coding module is also connected with the data embedding module;

s300, inputting the coding result into a decoder of a multi-element time sequence abnormal prediction model to obtain a corresponding decoding result; the decoder at least comprises a first linear layer, a dimension relation learning module and a second linear layer which are sequentially connected, wherein the first linear layer is connected with the third coding module; the decoding result comprises a first result obtained by a first linear layer, an inter-dimension relation dependency matrix obtained by the inter-dimension relation learning module and a second result obtained by a second linear layer;

s400, determining whether the monitored server is abnormal or not and an index of the abnormality in w time stamps after the prediction time t at least based on the X, the relation dependency matrix among the dimensions and the second result;

2. The method of claim 1, wherein the data embedding result satisfies the following condition:

Embedding（X）=Cov（PE（X）+SE（X）+Cacov（X））；

embedding X data with the result of Embedding and PE (X) representing adding each in the time dimension of XMonitoring time or sequence information of the indicators, SE (X) representing information between adding indicators at each time stamp, SE (X) =w _se ×CosSim（X），W _se The weight matrix with the size of n multiplied by w is a matrix formed by cosine similarity among different monitoring indexes in the X, wherein CosSim (X) is a matrix formed by cosine similarity among different monitoring indexes in the X; cacov (X) denotes performing a causal convolution operation on X, and Cov () denotes a one-dimensional convolution operation.

3. The method according to claim 1, wherein the output result HGAT (Fh) of the horizontal attention module satisfies the following condition:

HGAT（Fh）=GAT（V，Softmax（QK ^T ））；

wherein Q, K and V are respectively a query matrix, a key matrix and a value matrix obtained by convolving the input Fh of the horizontal attention module with different convolution kernels, and K ^T A transposed matrix of K; softmax () is the activation function and GAT () is the execution graph attention mechanism.

4. The method according to claim 1, characterized in that the output result MFFN (Fm) of the multi-scale feed-forward feedback satisfies the following condition:

MFFN（Fm）=Contat（S ₁ ，S ₂ ，S ₃ ）×W ^o wherein the j-th extraction result S _j =sigmoid（Conv _j （Fm））+tanh（Conv _j (Fm)) and j has a value of 1 to 3;

wherein Contat () represents a splicing operation, W ^o For projection parameters, sigmoid () and tanh () are activation functions, conv _j (Fm) means convolving the input Fm of the multi-scale feed-forward feedback with a j-th convolution kernel.

5. The method of claim 1, wherein the output of the inter-dimensional relationship learning module is DRLM (Fv) satisfying the following condition:

DRLM（Fv）=Softmax(CosSim（TCN(Q)）)；

softmax (CosSim (TCN (Q))) represents the inter-dimension dependency matrix, softmax () is the activation function, cosSim () is the cosine similarity calculated between different rows, TCN () is the execution time convolution network, and Q is the query matrix obtained by convolving the input Fv of the inter-dimension relationship learning module with different convolution kernels.

6. The method of claim 5, wherein S400 specifically comprises:

s401, obtaining an abnormality determination value S (X) = |X-X2 corresponding to X ⁰ ∣∣·（KL（M，M ⁰ ) A) is provided; wherein X2 ⁰ As a result of the second outcome of the process, |||| denotes an absolute value function, represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function, M is a matrix formed by cosine similarity between different monitoring indexes in X, M ⁰ Is a dependency relation matrix among dimensions;

s402, if S (X) > S0, determining that the monitored server is abnormal in w time stamps after the prediction time t, and executing S403; otherwise, determining that the monitored server cannot be abnormal in w time stamps after the predicted time t; s0 is a preset abnormal judgment value threshold value;

s403, obtaining the abnormal judgment value IS of the S-th monitoring index _s =k1×∣∣Y _s -Y ⁰ _s ∣∣+k2×KL（M，M ⁰ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein Y is _s Is the index vector corresponding to the s-th monitoring index in X, Y _s =（x _（t-w+1）s ，……，x _is ，……，x _ts ），Y ⁰ _s Is the sum Y of the second result _s A corresponding result; k1 and k2 are a first preset weight and a second preset weight respectively;

7. The method of claim 1, wherein the multivariate time series anomaly prediction model employs a two-stage loss function during training, wherein the first orderSegment loss function l1= |x-X1 ⁰ ∣∣+α×KL（M，M ⁰ ）+β×∣∣X-X2 ⁰ ||, second stage loss function l2= |x-X1 ⁰ ∣∣-α/2×KL（M，M ⁰ ）-β/2×∣∣X-X2 ⁰ ∣∣；

Wherein X1 ⁰ As a first result, X2 ⁰ As a result of the second outcome of the process, |||| denotes an absolute value function, represents the dot product, KL () represents the bi-directional Kullback-Leibler divergence function, M is a matrix formed by cosine similarity between different monitoring indexes in X, M ⁰ The alpha and the beta are super parameters respectively, which are dependency relation matrixes among dimensions.

8. An electronic device comprising a processor and a memory;

the processor is adapted to perform the steps of the method according to any of claims 1 to 7 by invoking a program or instruction stored in the memory.

9. A non-transitory computer-readable storage medium storing a program or instructions that cause a computer to perform the steps of the method of any one of claims 1 to 7.