WO2024000852A1 - 数据处理方法、装置、设备及存储介质 - Google Patents

数据处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2024000852A1
WO2024000852A1 PCT/CN2022/120467 CN2022120467W WO2024000852A1 WO 2024000852 A1 WO2024000852 A1 WO 2024000852A1 CN 2022120467 W CN2022120467 W CN 2022120467W WO 2024000852 A1 WO2024000852 A1 WO 2024000852A1
Authority
WO
WIPO (PCT)
Prior art keywords
layer
data set
prediction
training data
detection model
Prior art date
Application number
PCT/CN2022/120467
Other languages
English (en)
French (fr)
Inventor
梁永富
熊刚
江旻
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2024000852A1 publication Critical patent/WO2024000852A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • This application relates to the field of data processing technology, involving but not limited to data processing methods, devices, equipment and storage media.
  • Meirc time series operation and maintenance indicators
  • business indicators such as transaction volume, success rate, interface time consumption, etc.
  • system service indicators for example Central processing unit (CPU), memory (mem), flash memory (DISK), input error (in/out, IO), etc.
  • CPU Central processing unit
  • mem memory
  • DISK flash memory
  • input error in/out, IO
  • the interaction between indicators is complex, and the health status of systems and services is often determined by a series of indicators. Mining a certain Meirc indicator can provide information at a single level, but determining system anomalies only through single indicator anomalies often results in too many false alarms. Multi-index operation and maintenance timing detection can provide a more comprehensive understanding of the entire operating system services.
  • multi-dimensional indicator anomaly detection often directly uses deep generation models such as Generative Adversarial Network (GAN), using original multi-index data as the overall input of the model, and constructs encoder and decoder calculations based on probability density estimation and generated samples.
  • GAN Generative Adversarial Network
  • Reconstruction error use reconstruction error to identify anomalies.
  • the related algorithm is based on the assumption that the original input obeys a certain distribution, and uses the overall original data as input. It is difficult to directly model high-dimensional random vectors; distribution approximation and conditional independence need to be added to simplify the model, but this will weaken the model. The original data representation effect ultimately leads to insufficient robustness and versatility of the model; on the other hand, related technologies do not pay attention to the internal relationships of the data, so the accuracy of anomaly detection is low.
  • This application provides a data processing method, device, equipment, and storage medium with high accuracy in abnormality detection and good versatility.
  • the method includes: obtaining a pre-training data set; the pre-training data set is an n ⁇ k ⁇ m matrix, where m represents the corresponding value of the pre-training data set.
  • the number of periods, the k represents the number of indicators corresponding to the training data set, and one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1;
  • the pre-training data set is divided longitudinally through the pre-processing layer of the anomaly detection model to obtain a first training data set;
  • the first training data set is an n ⁇ m ⁇ k matrix;
  • the k represents the first The number of first samples included in the training data set, and one first sample is used to represent the value of an indicator in n ⁇ m time dimensions;
  • the pre-training data set is horizontally divided through the pre-processing layer to obtain a second training data set;
  • the second training data set is an m ⁇ k ⁇ n matrix;
  • the n represents the second training data
  • the number of second samples included in the set, and one second sample is used to represent the values of k indicators corresponding to one moment in m time periods;
  • An anomaly detection model is trained based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine detection parameters of detection data within a time period; the detection parameters are To determine whether the detection data is abnormal.
  • This application provides a data processing device, which includes:
  • Obtaining unit configured to obtain a pre-training data set;
  • the pre-training data set is an n ⁇ k ⁇ m matrix, where the m represents the number of periods corresponding to the pre-training data set, and the k represents the The number of indicators corresponding to the training data set, one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1;
  • the first pre-processing unit is configured to longitudinally segment the pre-training data set through the pre-processing layer of the anomaly detection model to obtain a first training data set;
  • the first training data set is an n ⁇ m ⁇ k matrix;
  • the k represents the number of first samples included in the first training data set, and one first sample is used to represent the value of an indicator in n ⁇ m time dimensions;
  • the second pre-processing unit is configured to horizontally segment the pre-training data set through the pre-processing layer to obtain a second training data set;
  • the second training data set is an m ⁇ k ⁇ n matrix;
  • the n represents the number of second samples included in the second training data set, and one second sample is used to represent the values of k indicators corresponding to one moment in m time periods;
  • a training unit configured to train an anomaly detection model based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine detection parameters of detection data within a time period; The detection parameters are used to determine whether the detection data is abnormal.
  • This application also provides an electronic device, including: a memory and a processor.
  • the memory stores a computer program that can be run on the processor.
  • the processor executes the program, the above data processing method is implemented.
  • This application also provides a storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the above data processing method is implemented.
  • the data processing method, device, equipment and storage medium provided by this application include: obtaining a pre-training data set; the pre-training data set is an n ⁇ k ⁇ m matrix, where the m represents the pre-training data The number of periods corresponding to the set, the k represents the number of indicators corresponding to the training data set, and one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1; by The preprocessing layer of the anomaly detection model performs longitudinal segmentation on the pre-training data set to obtain a first training data set; the first training data set is an n ⁇ m ⁇ k matrix; the k represents the first training data set The number of first samples included in the data set.
  • One first sample is used to represent the value of an indicator in n ⁇ m time dimensions; the pre-training data set is horizontally processed through the pre-processing layer. Split to obtain a second training data set; the second training data set is a matrix of m ⁇ k ⁇ n; the n represents the number of second samples included in the second training data set, one second sample Used to characterize the values of k indicators corresponding to one moment in m time periods; train an anomaly detection model based on the first training data set and the second training data set to obtain a target detection model; the target detection model
  • the detection parameters are used to determine the detection data within a period of time; the detection parameters are used to determine whether the detection data is abnormal.
  • the pre-training data is divided vertically and horizontally to obtain the first training data set and the second training data set, and then through the first training data set and the second training data
  • the anomaly detection model is trained together to obtain the target detection module.
  • a second training data set A sample of corresponds to a moment, so training based on the second training data set can pay attention to the relationship between moments; therefore, the obtained target detection model has higher anomaly detection accuracy when detecting anomalies; on the other hand, the entire model
  • the application scenarios are not limited, and there are no restrictions on the type of indicators, the number of indicators, the number of samples, etc., so the target detection model has good versatility.
  • Figure 1 is an optional structural schematic diagram of a data processing system provided by an embodiment of the present application.
  • FIG. 2 is an optional flow diagram of the data processing method provided by the embodiment of the present application.
  • FIG. 3 is an optional flow diagram of the data processing method provided by the embodiment of the present application.
  • FIG. 4 is an optional flow diagram of the data processing method provided by the embodiment of the present application.
  • FIG. 5 is an optional flow diagram of the data processing method provided by the embodiment of the present application.
  • Figure 6 is an optional flow diagram of the data processing method provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of an optional framework structure of the data processing process provided by the embodiment of the present application.
  • Figure 8 is an optional schematic diagram of the original data provided by the embodiment of the present application.
  • Figure 9 is an optional schematic diagram of normalized data provided by the embodiment of the present application.
  • Figure 10 is an optional schematic diagram of the normalized and intercepted data provided by the embodiment of the present application.
  • Figure 11 is an optional data schematic diagram after vertical segmentation provided by the embodiment of the present application.
  • Figure 12 is an optional data schematic diagram after horizontal segmentation provided by the embodiment of the present application.
  • Figure 13 is an optional flow chart for determining the attention coefficient provided by the embodiment of the present application.
  • Figure 14 is an optional schematic diagram of the output features of the graph attention layer between indicators provided by the embodiment of the present application.
  • Figure 15 is an optional schematic diagram of the output features of the graph attention layer for different moments provided by the embodiment of the present application.
  • Figure 16 is an optional flow diagram of splicing provided by the embodiment of the present application.
  • FIG. 17 is an optional schematic diagram of the principles of the GRU layer provided by the embodiment of the present application.
  • FIG. 18 is an optional structural schematic diagram of the GRU layer provided by the embodiment of the present application.
  • Figure 19 is an optional structural schematic diagram of determining loss provided by the embodiment of the present application.
  • Figure 20 is an optional structural schematic diagram of the detection process provided by the embodiment of the present application.
  • Figure 21 is an optional structural schematic diagram of a data processing device provided by an embodiment of the present application.
  • Figure 22 is an optional structural schematic diagram of an electronic device provided by an embodiment of the present application.
  • first ⁇ second ⁇ third involved are only used as an example to distinguish different objects, and do not represent a specific ordering of the objects, and have no restriction on the sequence. It will be understood that “first ⁇ second ⁇ third” may interchange specific orders or sequences where permitted, so that the embodiments of the application described here can be used in other ways than those illustrated or described here. Implemented sequentially.
  • Embodiments of the present application may provide data processing methods and devices, equipment and storage media.
  • the data processing method can be implemented by a data processing device, and each functional entity in the data processing device can be implemented by hardware resources of electronic equipment, such as processors and other computing resources, communication resources (such as used to support the implementation of optical cables, cellular and other various communication) collaborative implementation.
  • the data processing method provided by the embodiment of the present application is applied to a data processing system, and the data processing system includes a first device.
  • the first device is configured to: obtain a pre-training data set; the pre-training data set is an n ⁇ k ⁇ m matrix, where m represents the number of periods corresponding to the pre-training data set, and k represents The number of indicators corresponding to the training data set, one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1; the preprocessing layer of the anomaly detection model is used to The training data set is divided longitudinally to obtain a first training data set; the first training data set is an n ⁇ m ⁇ k matrix; the k represents the number of first samples included in the first training data set, One of the first samples is used to represent the value of an indicator in n ⁇ m time dimensions; the pre-training data set is horizontally divided through the pre-processing layer to obtain a second training data set; The second training data set is an m ⁇ k ⁇ n matrix; the n represents the number of second samples included in the second training data set, and one second sample is used to represent the k indicators corresponding to
  • an anomaly detection model is trained based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine the detection data within a time period Detection parameters; the detection parameters are used to determine whether the detection data is abnormal.
  • the data processing system may also include a second device.
  • the second device is used to collect historical data, obtain a historical data set, and send the historical data set to the first device, so that the first device obtains a pre-training data set based on the historical data set.
  • first device and the second device may be integrated on the same electronic device, or may be independently deployed on different electronic devices.
  • the structure of the data processing system can be shown in Figure 1 , including: a first device 10 and a second device 20 . Among them, data can be transmitted between the first device 10 and the second device 20 .
  • the first device 10 is configured to: obtain a pre-training data set; the pre-training data set is an n ⁇ k ⁇ m matrix, where m represents the number of periods corresponding to the pre-training data set, so The k represents the number of indicators corresponding to the training data set, and one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1; through the preprocessing layer of the anomaly detection model, The pre-training data set is divided vertically to obtain a first training data set; the first training data set is an n ⁇ m ⁇ k matrix; the k represents the first sample included in the first training data set The number of the first sample is used to represent the value of an indicator in n ⁇ m time dimensions; the pre-training data set is horizontally divided through the pre-processing layer to obtain the second training data set ; The second training data set is a matrix of m ⁇ k ⁇ n; the n represents the number of second samples included in the second training data set, and one second sample is used to represent k
  • An anomaly detection model is trained based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine detection parameters of detection data within a time period; the detection parameters are To determine whether the detection data is abnormal.
  • the first device 10 may be a server, a computer, or other electronic device with relevant data processing capabilities.
  • the second electronic device 20 is used to collect historical data, obtain a historical data set, and send the historical data set to the first device, so that the first device can obtain a pre-training data set based on the historical data set.
  • the second electronic device 20 may include: a mobile terminal device (such as a mobile phone, a tablet computer, etc.), or a non-mobile terminal device (such as a desktop computer, a server, etc.) and other electronic devices with relevant data processing capabilities.
  • a mobile terminal device such as a mobile phone, a tablet computer, etc.
  • a non-mobile terminal device such as a desktop computer, a server, etc.
  • embodiments of the present application provide a data processing method, which is applied to a data processing device; wherein the data processing device can be deployed in the first device 10 in Figure 1 .
  • the data processing process provided by the embodiments of the present application is described with the electronic device as the execution subject.
  • Figure 2 illustrates a schematic flow chart of an optional data processing method.
  • the data processing method may include but is not limited to S201 to S204 shown in Figure 2.
  • the electronic device obtains the pre-training data set.
  • S201 can be implemented as follows: the electronic device obtains a historical data set, normalizes and intercepts the historical data set, and obtains a pre-training data set.
  • the MAXMIN normalization method can be used to map the values of historical data to [0, 1].
  • the embodiment of the present application does not limit the specific process of interception, and it can be configured according to actual needs.
  • a sliding window n of length 90 and a sliding step d of length 50 can be used to intercept the original data to obtain a pre-training data set.
  • the pre-training data set is an n ⁇ k ⁇ m matrix, where m represents the number of periods corresponding to the pre-training data set, k represents the number of indicators corresponding to the training data set, and one period includes n moments; n is greater than 1, k is greater than 1, m is greater than 1.
  • the pre-training data set is a 90 ⁇ 7 ⁇ 27 matrix; this matrix can represent the intercepted data of 27 periods, the corresponding duration of one period is 90 minutes, and the data indicators include 7 indicators.
  • the electronic device performs longitudinal segmentation on the pre-training data set through the pre-processing layer of the anomaly detection model to obtain the first training data set.
  • the first training data set is an n ⁇ m ⁇ k matrix; k represents the number of first samples included in the first training data set, and one first sample is used to represent the selection of an indicator in n ⁇ m time dimensions. value.
  • S202 can be implemented as follows: the electronic device performs longitudinal segmentation of the pre-training data set through the pre-processing layer of the anomaly detection model. During segmentation, the indicator type is used as the segmentation point, segmentation is performed between the two indicators, and then the data of multiple periods are fused. , thus obtaining the first training data set.
  • one first sample corresponds to one indicator.
  • the electronic device performs horizontal segmentation on the pre-training data set through the pre-processing layer to obtain a second training data set.
  • the second training data set is a matrix of m ⁇ k ⁇ n; n represents the number of second samples included in the second training data set, and one second sample is used to represent the values of k indicators corresponding to one moment in m time periods. value
  • S203 can be implemented as follows: the electronic device performs horizontal segmentation on the pre-training data set through the pre-processing layer of the anomaly detection model. During segmentation, different moments are used as segmentation points, segmentation is performed between two moments, and then the data of multiple periods are fused. , thereby obtaining the second training data set.
  • one second sample corresponds to one moment.
  • the electronic device trains an anomaly detection model based on the first training data set and the second training data set to obtain a target detection model.
  • the electronic device inputs k first samples in the first training data set and n second samples in the second training data set to the anomaly detection model.
  • the anomaly detection model outputs corresponding detection parameters, and the electronic device performs detection based on the output. Parameters adjust the parameters in the anomaly detection model. When the parameters in the anomaly model meet the requirements, the target detection model is obtained.
  • the embodiments of this application are not limited to specific training methods and can be configured according to actual needs.
  • direction propagation training can be performed based on the loss function.
  • the target detection model is used to determine the detection parameters of the detection data within a time period; the detection parameters are used to determine whether the detection data is abnormal.
  • detection parameters may include one or more of the following: predicted values and reconstruction probabilities.
  • the data processing solution provided by the embodiment of the present application includes: obtaining a pre-training data set; the pre-training data set is an n ⁇ k ⁇ m matrix, where the m represents the number of periods corresponding to the pre-training data set, The k represents the number of indicators corresponding to the training data set, and one period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1; through the preprocessing layer of the anomaly detection model
  • the pre-training data set is divided vertically to obtain a first training data set; the first training data set is a matrix of n ⁇ m ⁇ k; the k represents the first sample included in the first training data set.
  • the number of samples, one of the first samples is used to represent the value of an indicator in n ⁇ m time dimensions; the pre-training data set is horizontally divided through the pre-processing layer to obtain the second training data set; the second training data set is a matrix of m ⁇ k ⁇ n; the n represents the number of second samples included in the second training data set, and one second sample is used to represent the corresponding The values of k indicators in m time periods; an anomaly detection model is trained based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine the Detection parameters of the detection data; the detection parameters are used to determine whether the detection data is abnormal.
  • the pre-training data is divided vertically and horizontally to obtain the first training data set and the second training data set, and then the first training data set and the second training data are The anomaly detection model is trained together to obtain the target detection module.
  • the pre-training data is divided vertically and horizontally to obtain the first training data set and the second training data set, and then the first training data set and the second training data are The anomaly detection model is trained together to obtain the target detection module.
  • a second training data set A sample of corresponds to a moment, so training based on the second training data set can pay attention to the relationship between moments; therefore, the obtained target detection model has higher anomaly detection accuracy when detecting anomalies; on the other hand, the entire model
  • the application scenarios are not limited, and there are no restrictions on the type of indicators, the number of indicators, the number of samples, etc., so the target detection model has good versatility.
  • This process may include but is not limited to the following methods 1 to 3.
  • the anomaly detection model includes in sequence: a preprocessing layer, a first Graph Attention Network (GAT) layer, a second GAT layer and a prediction layer; correspondingly, the electronic device is based on the first training data set and the second training The data set trains the preprocessing layer, the first graph neural network GAT layer, the second GAT layer and the prediction layer in the anomaly detection model.
  • GAT Graph Attention Network
  • the anomaly detection model includes in sequence: a preprocessing layer, a first graph neural network GAT layer, a second GAT layer, a splicing layer, and a prediction layer; correspondingly, the electronic device is trained based on the first training data set and the second training data set.
  • the preprocessing layer, the first graph neural network GAT layer, the second GAT layer, the splicing layer and the prediction layer in the anomaly detection model is trained based on the first training data set and the second training data set.
  • the anomaly detection model includes in sequence: a preprocessing layer, a first graph neural network GAT layer, a second GAT layer, a splicing layer, a gated loop unit GRU layer, and a prediction layer; correspondingly, the electronic device is based on the first training data set and the second training data set to train the preprocessing layer, the first graph neural network GAT layer, the second GAT layer, the splicing layer, the gated recurrent unit GRU layer and the prediction layer in the anomaly detection model.
  • the method 1 anomaly detection model includes in sequence: a preprocessing layer, a first graph neural network GAT layer, a second GAT layer, and a prediction layer; correspondingly, the electronic device trains anomalies based on the first training data set and the second training data set.
  • the processes of the preprocessing layer, the first graph neural network GAT layer, the second GAT layer and the prediction layer in the detection model are explained. As shown in Figure 3, the process may include but is not limited to S2041 to S2045.
  • the electronic device inputs k first samples in the first training data set into the first GAT layer, and obtains k ⁇ n first features through processing of the first GAT layer.
  • a first feature is used to characterize the relationship between the value of the indicator corresponding to the first feature and the values of the k-1 indicators. Simply put, the first feature can reflect the relationship between indicators.
  • S2041 may be implemented as follows: the electronic device inputs k first samples in the first training data set into the first GAT layer, and through the processing of the second GAT layer, generates n first features for one first sample, traversing k first samples, thus obtaining k ⁇ n first features.
  • the dimension of the first feature is m.
  • the electronic device inputs n second samples in the second training data set to the second GAT layer, and obtains n ⁇ k second features through processing of the second GAT layer.
  • a second feature is used to characterize the relationship between the value of the moment corresponding to the second feature and the values of n-1 moments. Simply put, the second feature can reflect the relationship between moments.
  • S2042 may be implemented as follows: the electronic device inputs n second samples in the second training data set into the second GAT layer, and through the processing of the second GAT layer, generates k second features for one second sample, and traverses n The second sample is used to obtain n ⁇ k second features.
  • the dimension of the second feature is m.
  • the electronic device splices at least the k ⁇ n first features and the n ⁇ k second features to obtain a prediction data set.
  • the prediction data set is an n ⁇ s ⁇ m matrix; s is greater than or equal to k.
  • the embodiments of this application do not limit the specific processing process for obtaining the prediction data set, and can be configured according to actual needs.
  • the process of obtaining the prediction data set may include: splicing process.
  • the process of obtaining the prediction data set may include: splicing process and GRU layer filtering process.
  • the electronic device inputs the prediction data set to the prediction layer, and obtains the first prediction result through processing by the prediction layer.
  • the embodiments of this application do not limit the specific content of the prediction layer and the specific content of the first prediction result, and they can be configured according to actual needs.
  • the prediction layer may include a fully connected layer; the corresponding first prediction result includes a prediction value.
  • the prediction layer may include a variational auto-encoder (VAE) layer; the corresponding first prediction result includes reconstruction probability.
  • VAE variational auto-encoder
  • the prediction layer may include a fully connected layer and a VAE layer; the corresponding first prediction result includes a prediction value and a reconstruction probability.
  • the electronic device adjusts parameters in the anomaly detection model based on the first prediction result to obtain a target detection model.
  • S2045 may be implemented as: the electronic device adjusts relevant parameters in the preprocessing layer, the first graph neural network GAT layer, the second GAT layer and the prediction layer in the anomaly detection model based on the first prediction result, thereby obtaining the target detection model.
  • the first feature (relationship between indicators) and the second feature (relationship between moments) can be extracted through the first GAT layer and the second GAT layer; therefore, the target detection model obtained based on the first feature and the second feature has relatively good performance. High accuracy.
  • the S2041 electronic device inputs the k first samples in the first training data set into the first GAT layer, and obtains k ⁇ through the processing of the first GAT layer.
  • This process may include but is not limited to S20411 and S20412 described below.
  • the electronic device treats each row of the first sample as a first node and obtains n first nodes.
  • the first sample is used to characterize the value of an indicator in n ⁇ m time dimensions.
  • each row of the first sample represents the value of an indicator at one time and in different periods.
  • the electronic device treats each row as a first node, thereby obtaining n first nodes.
  • the electronic device performs a first process on each of the n first nodes to obtain a first feature corresponding to the first node.
  • n first features can be obtained.
  • a first node is taken as an example to describe the first processing process.
  • the first processing may include, but is not limited to, A to C below.
  • the electronic device determines the similarity coefficients between the first node and the n first nodes, and obtains the n similarity coefficients.
  • the embodiment of the present application does not limit the specific method for determining the similarity coefficient, and it can be configured according to actual needs.
  • the electronic device converts the n similarity coefficients into n attention coefficients.
  • the similarity coefficients need to be normalized before the weighted summation of the attention coefficients, that is, the similarity coefficients need to be converted into attention coefficients.
  • the softmax normalized exponential function can be used for conversion.
  • the electronic device determines the first feature corresponding to the first node based on the data corresponding to the n nodes and the n attention coefficients.
  • the electronic device performs a product operation on the data corresponding to the n nodes and the attention coefficient corresponding to each data, and uses the sum of the product operations as the first feature.
  • the processing method based on the attention coefficient between nodes is simple and accurate.
  • the process of S2044 in which the electronic device inputs the prediction data set to the prediction layer and obtains the first prediction result through processing by the prediction layer is explained.
  • the process may include but is not limited to S20441 to S20443.
  • the prediction layer includes a fully connected layer and a variational autoencoder VAE layer, and the prediction data set includes m prediction samples.
  • the electronic device inputs each of the m prediction samples into the fully connected layer, and obtains m sets of predicted values through processing by the fully connected layer.
  • a set of prediction values is obtained after processing by the fully connected layer; a set of prediction values includes: the predicted values of the k indicators at the next moment; the next moment is the predicted value of the period corresponding to the predicted sample The next moment.
  • the fully connected layer is used for each prediction sample, based on the data of k indicators for each prediction sample in one period, to predict the predicted values of the k indicators at the next moment in the period.
  • S20441 can be implemented as follows: the electronic device inputs each prediction sample among the m prediction samples into the fully connected layer.
  • the fully connected layer processes one sample to obtain a set of prediction values, and traverses the m prediction samples to obtain m sets of prediction values. .
  • the electronic device inputs each of the m prediction samples to the VAE layer, and obtains m sets of reconstruction probabilities through processing by the VAE layer.
  • a set of reconstruction probabilities is obtained after processing by the VAE layer; a set of reconstruction probabilities includes: the reconstruction probabilities of the k indicators at the next moment.
  • the VAE layer is used for each prediction sample, based on the data of k indicators for each prediction sample in a period, to predict the reconstruction probability of the k indicators at the next moment in the period.
  • S20442 can be implemented as follows: the electronic device inputs each prediction sample among the m prediction samples to the VAE layer, and the VAE layer obtains a set of reconstruction probabilities after processing one sample, and traverses the m prediction samples to obtain m sets of reconstruction probabilities. .
  • the electronic device determines that the first prediction result includes: the m set of prediction values and the m set of reconstruction probabilities.
  • the prediction layer includes a fully connected layer and a VAE layer, which determines the prediction results through two dimensions and has the characteristics of high accuracy.
  • This process may include but is not limited to S20451 to S20453 described below.
  • the prediction layer includes a fully connected layer and a variational autoencoder VAE layer, and the first prediction result includes the m sets of prediction values and the m sets of reconstruction probabilities.
  • the electronic device determines the first loss function corresponding to the fully connected layer and the second loss function corresponding to the VAE layer; the first loss function is different from the second loss function.
  • the embodiments of this application do not limit the specific function descriptions of the first loss function and the second loss function, and they can be configured according to actual needs.
  • the first loss function may include:
  • LOSS prediction represents the loss function corresponding to the fully connected layer; x n, i represents the actual value of the ith indicator variable at the nth moment; x ⁇ n, i represents the predicted value of the ith indicator variable at the nth moment .
  • the second loss function may include:
  • the electronic device determines m target losses based on at least the m sets of prediction values, the m sets of reconstruction probabilities, the first loss function, and the second loss function.
  • a target loss is determined for a set of predicted values and the reconstruction probability corresponding to the set of predicted values.
  • a set of prediction values and a set of reconstruction probabilities correspond to the same period.
  • the electronic device brings each set of predicted values and the corresponding set of reconstruction probabilities into the first loss function and the second loss function, and calculates the A loss value and a second loss value; then the first loss value and the second loss value are summed to obtain a target loss; the electronic device traverses m groups of prediction values and m groups of reconstruction probabilities, thereby obtaining m target losses.
  • the electronic device adjusts parameters in the anomaly detection model based on the m target losses to obtain the target detection model.
  • S20453 can be implemented as follows: the electronic device adjusts the parameters of the prediction layer in the anomaly detection model based on m target losses, and obtains the target detection model when the target losses meet the requirements.
  • S20453 can be implemented as follows: the electronic device gradually adjusts the parameters of each layer in the anomaly detection model based on m target losses, and obtains the target detection model when the target losses meet the requirements. .
  • the impact between the fully connected layer and the VAE layer can be taken into account, and the resulting target detection model is more accurate.
  • the anomaly detection model includes: preprocessing layer, first GAT layer, second GAT layer and prediction layer, which has the characteristics of simple implementation and high processing efficiency;
  • anomaly detection model includes: preprocessing layer, first graph neural network GAT layer, second GAT layer, splicing layer and prediction layer; correspondingly, the electronic device is based on the first training data set and the second training The process of the preprocessing layer, the first graph neural network GAT layer, the second GAT layer, the splicing layer and the prediction layer in the data set training anomaly detection model is explained.
  • the process of S2043 electronic device splicing at least the k ⁇ n first features and the n ⁇ k second features to obtain the prediction data set is different.
  • the process may include but is not limited to A1 or A2.
  • the electronic device inputs the k ⁇ n first features, the n ⁇ k second features and the pre-training data set to the splicing layer, and obtains spliced data through processing by the splicing layer.
  • the prediction data set is the spliced data; s is equal to 3 times k.
  • the electronic device inputs k ⁇ n first features and n ⁇ k second features to the splicing layer, and obtains spliced data through processing by the splicing layer.
  • the prediction data set is spliced data; s is equal to 2 times k.
  • the anomaly detection model includes: preprocessing layer, first GAT layer, second GAT layer, splicing layer and prediction layer; by adding the processing of the splicing layer, the preprocessed data can be further combined to obtain the target model During detection, the accuracy is high.
  • anomaly detection model includes: preprocessing layer, first graph neural network GAT layer, second GAT layer, splicing layer, gated recurrent unit (Gated Recurrent Unit, GRU) layer and prediction layer; corresponding , the electronic device trains the preprocessing layer, the first graph neural network GAT layer, the second GAT layer, the splicing layer, the gated recurrent unit GRU layer and the prediction layer in the anomaly detection model based on the first training data set and the second training data set.
  • GRU Gated recurrent Unit
  • the electronic device splices at least the k ⁇ n first features and the n ⁇ k second features to obtain a prediction data set.
  • the process may include but is not limited to B1 and B2.
  • the electronic device inputs the k ⁇ n first features, the n ⁇ k second features and the pre-training data set to the splicing layer, and obtains spliced data through processing by the splicing layer.
  • the splicing data is an n ⁇ 3k ⁇ m matrix.
  • the electronic device inputs the spliced data into the GRU layer, and uses the GRU layer to filter the interference of indicator dimensions in the spliced data to obtain the prediction data set.
  • s is less than 3 times k.
  • the embodiment of the present application filters the interference of the index dimension in the spliced data through the GRU layer, thereby obtaining predictions. data.
  • s is generally made greater than or equal to k, and s is less than 3k.
  • the anomaly detection model includes: preprocessing layer, first GAT layer, second GAT layer, splicing layer, gated loop unit GRU layer and prediction layer; because the processing of the splicing layer can improve the accuracy of detection, and The GRU layer can further increase processing speed.
  • the data processing method provided by the embodiment of the present application can also detect the detection data through the target detection model to determine whether an abnormality occurs.
  • the data processing method may also include but is not limited to the following S205 to S208.
  • the electronic device obtains detection data of k indicators within the first period.
  • the first period is any period.
  • the electronic device inputs the detection data into the target detection model, and obtains the detection parameter values of the k indicators at the second moment through processing by the target detection model.
  • the second time is the next time of the first period.
  • the electronic device determines the total score corresponding to the detection data based on the detection parameter values of the k indicators at the second moment.
  • the embodiment of the present application does not limit the specific method of determining the total score corresponding to the detection data based on the detection value, and it can be configured according to actual conditions.
  • the electronic device determines that the detection data is abnormal.
  • the embodiment of this application does not limit the value of the score threshold, and it can be configured according to actual needs.
  • the score threshold may be 1.
  • the detection data is determined to be normal.
  • the target detection model of the scenario can be obtained, thereby achieving anomaly detection for the scenario.
  • the process may include determining a total score corresponding to the detection data through a first formula
  • the first formula includes:
  • the Score represents the total score corresponding to the detection data
  • xi represents the actual value of the i-th indicator at the second moment
  • x′ i represents the predicted value of the i-th indicator at the second moment.
  • the p′ i represents the reconstruction probability value of the i-th indicator at the second moment
  • the ⁇ represents the preset coefficient.
  • may be 0.8.
  • Determining the total score corresponding to the detection data in this way has a high accuracy rate.
  • the data processing method provided by the embodiment of this application can also perform abnormal location. As shown in Figure 6, this process may include but is not limited to the following S209 and S210.
  • the electronic device determines the abnormal scores of the k indicators at the second moment.
  • the electronic device determines that the abnormal scores of k indicators at the second moment include: the score of the first indicator at the second moment is 0.046884; the score of the second indicator at the second moment is 0.409688; and the score of the third indicator at the second moment.
  • the score of the fourth indicator at the second moment is 0.449229; the score of the fourth indicator at the second moment is 0.021445; the score of the fifth indicator at the second moment is 0.013142; the score of the sixth indicator at the second moment is 0.437159; the score of the seventh indicator at the second moment is 0.051018.
  • the electronic device performs abnormal positioning based on the abnormal scores of the k indicators at the second moment.
  • the electronic device determines that the second indicator is abnormal, the third indicator is abnormal, and the sixth indicator is abnormal.
  • the electronic device can also perform more detailed abnormal analysis based on the abnormal first indicator, the third abnormal indicator, and the sixth abnormal indicator.
  • abnormality positioning can be performed intuitively, the abnormality positioning process is simple to implement, and the accuracy of abnormality positioning is also high.
  • GCN Graph Attention Network
  • GAT can allocate weights based on the importance of nodes and assign different weights to each neighbor to avoid the situation where GCN treats all neighbor nodes equally during convolution.
  • GAT can be used to obtain the neighborhood characteristics of each node and assign different weights to different nodes in the neighborhood; in this way, there is no need to use high-computational matrix operations, and there is no need to know the specific structure of the graph object in advance, which has strong Luxuriousness Greatness and applicability.
  • each operation and maintenance time series indicator is regarded as a separate feature to construct nodes in the graph object.
  • GAT is used to model the correlation between different features and the time dependence within each time series, capturing the indicator characteristics and temporal relationships of multi-dimensional time series.
  • Gated Recurrent Unit a variant of Long Short-Term Memory (LSTM) based on long short-term memory network, only retains update gates and reset gates to solve the problem of long LSTM training time, many parameters, and complex calculations High degree of problem.
  • various gate functions are used to mine the timing change patterns of relatively long intervals and delays in the time series, extract dependency information in the time dimension, and solve the problem of recurrent neural network (RNN) problems in training.
  • RNN recurrent neural network
  • time series operation and maintenance indicators output by industry standards.
  • various indicators of different businesses, applications, systems, and clusters need to be monitored, specifically including business indicators (transaction volume, success rate, interface Time consuming, etc.) and system service indicators (CPU, MEM, DISK, IO, etc.).
  • business indicators transaction volume, success rate, interface Time consuming, etc.
  • system service indicators CPU, MEM, DISK, IO, etc.
  • the interaction between indicators is complex, and the health status of systems and services is often determined by a series of indicators. Mining a certain indicator can provide information at a single level, but determining system anomalies only through single indicator anomalies often results in too many false alarms.
  • Multi-indicator operation and maintenance timing detection can explore the interaction between various components of the system.
  • the topological relationship information between indicators and the monitoring data of the indicators themselves are used as input to form system-level representation information, which can provide a more comprehensive understanding of the entire operating system services.
  • GAN Generative Adversarial Network
  • the deep generative model is based on the assumption that the original input obeys a certain distribution and takes the overall original data as input. It is difficult to directly model high-dimensional random vectors. Related algorithms need to increase distribution approximation and conditional independence to simplify the model, which will weaken the original data representation effect and ultimately lead to insufficient robustness and versatility of the model.
  • the deep generative model only outputs the reconstruction error to identify system abnormalities.
  • the algorithm is weak in analysis and difficult to perform abnormal correlation analysis and location based on the original model output.
  • Deep generative models use the ability of deep neural networks to approximate arbitrary functions to model the complex distribution of encoders and decoders.
  • the model structure is complex and the algorithm complexity is high.
  • anomaly detection with large data scale and fast data flow, detection takes a long time and is difficult to meet the needs of real-time anomaly detection.
  • Use operation and maintenance indicators as graph object anomaly detection nodes, and anomaly detection is converted into graph object mining.
  • the multi-dimensional operation and maintenance indicator data is separated horizontally (different operation and maintenance indicators at a single time) and vertically (a single indicator operation and maintenance sequence at different times).
  • the internal characteristics and inter-time series dependencies of the indicator data are mined based on GAT. relation. Separation processing can preserve the original high-dimensional representation from the perspective of time series dependencies and indicator types.
  • GAT does not need to know the a priori relationships and abnormal structures between input indicators in advance. It learns the time dependence and abnormal correlation between indicators, and can dynamically identify anomalies in different scenarios and adapt to changes in scenarios.
  • this method not only retains the high-dimensional characteristics of the original data, but also adapts to different anomaly types. Since the original data is often input as a whole in related technologies, without preprocessing such as data separation, the high-dimensional feature representation effect is poor; at the same time, it needs to be based on input distribution assumptions, and the model has poor versatility. Therefore, this embodiment of the present application can effectively solve the problems of weakened high-dimensional data representation effect and poor model versatility in existing solutions.
  • Analyzing related technologies for the split mining of multi-dimensional time series, the anomaly detection method in related technologies is to horizontally separate multi-dimensional operation and maintenance time series indicators, convert them into multiple single-dimensional time series, and use corresponding methods according to the characteristics of each dimensional time series indicator.
  • Anomaly detection algorithms in the domain According to different scenarios and monitoring indicator types, different anomaly detection rules and algorithms are constructed based on the experience of historical operation and maintenance experts. Development and post-maintenance costs are high and require a deep understanding of the business. Moreover, it ignores the cross-relationship between indicators and the overall characteristics of anomaly detection, and cannot vertically analyze the correlation between indicators in each dimension. That is, split mining can mine the abnormal status of each indicator, but it cannot achieve cross-analysis; the monitoring rule setting is one-sided and The false alarm rate is high.
  • the anomaly detection method in related technologies is to use deep generative models such as generative adversarial networks (GAN) alone, use multi-dimensional time series indicators as the overall input to generate the model and reconstruct its output, and determine the reconstruction probability or Whether the input of the reconstruction error is abnormal data.
  • GAN generative adversarial networks
  • Holistic mining based on multi-dimensional time series indicators only determines the system status by reconstructing the output, and does not explicitly mine the relationship between different time series; that is, holistic mining can mine the overall abnormal state of multi-dimensional indicators, but the fault location corresponding to the abnormal state cannot be Provides the abnormal effects of each indicator.
  • the anomaly detection methods in related technologies cannot obtain the direct potential interrelationships of the time series, and it is difficult to analyze the impact of each indicator corresponding to the anomaly, which is not conducive to subsequent fault location and problem repair.
  • the process may include processing by an offline module 71 and processing by a real-time module 72 .
  • the processing process (equivalent to the training process) of the offline module 71 may include but is not limited to the following S711 to S719.
  • GRU layer mines long-term timing dependency characteristics
  • VAE-based reconstruction module (equivalent to VAE layer) processing
  • the offline module 71 is used to: train historical multi-dimensional operation and maintenance indicator data, input into two parallel graph attention layers (GAT) after data preprocessing and data separation, capture characteristics between multi-dimensional indicators and time within a single indicator time series poke relationship.
  • GAT graph attention layers
  • the offline module 71 is used to: train historical multi-dimensional operation and maintenance indicator data, input into two parallel graph attention layers (GAT) after data preprocessing and data separation, capture characteristics between multi-dimensional indicators and time within a single indicator time series poke relationship.
  • GAT parallel graph attention layers
  • the processing process (equivalent to the detection process) of the real-time module 72 may include but is not limited to the following S721 to S725.
  • the multi-dimensional anomaly detection model (equivalent to the target detection model) is loaded;
  • the real-time module 72 is used to: collect real-time operation and maintenance indicators, load multi-dimensional anomaly detection models for detection, calculate anomaly detection scores to determine system health, and feed back to operation and maintenance personnel to verify the accuracy of alarms. Operation and maintenance personnel confirm the abnormal status based on the alarm content and locate the cause of the abnormality through the abnormal score of each feature.
  • This process can include but is not limited to:
  • Step 1 Data preprocessing
  • Step 2 Construct a graph attention network based on GAT
  • Step 3 Mining long-term time series data based on GRU
  • Step 4 Construction of joint anomaly detection model based on prediction and reconstruction.
  • step 1 the process of data preprocessing in step 1 is explained.
  • the data preprocessing part mainly includes: data normalization and data interception.
  • data normalization can reduce the amount of model calculations
  • index data interception can convert the normalized multi-dimensional data into the data form required for unsupervised model training.
  • the specific processing flow may include but is not limited to the following steps 11 and 12.
  • the embodiment of the present application uses the maximum-minimum (MAX-MIN) normalization method to process the original data, and after normalizing the data, the original index maps the data value to [0, 1].
  • MAX-MIN maximum-minimum
  • the specific processing results are as follows: The method can refer to the following formula 1.
  • x′ is the output after data normalization
  • x x is the original sequence value
  • max is the maximum value of the original sequence
  • min is the minimum value of the original sequence.
  • Step 12 Interception of indicator data.
  • Multi-dimensional operation and maintenance indicator anomaly detection requires intercepting the original sequence and converting astronomical level data into a multi-sample multi-dimensional indicator matrix to conform to the input form of the non-monitoring learning model.
  • the normalized data uses a sliding window with a length of n of 90 and a sliding step of 50 to generate an input detection model of a fixed-length sequence, which can provide better feedback on system anomalies.
  • the main task of this application can be summarized as the multi-dimensional indicator anomaly monitoring module input sequence X ⁇ R n*k*m and generate a two-dimensional output vector y ⁇ 0, 1 ⁇ .
  • k is the dimension of the operation and maintenance indicator itself, that is, the indicator type.
  • k is 7.
  • m is the number of recombination samples and the number of training set samples.
  • the operation and maintenance indicator data is collected at the minute level, that is, the original sequence length in a single day is 1440, and the number of recombined samples
  • the final preprocessed matrix is R 90*7*27 ; among them, as shown in Figure 10, the intercepted data is the GAT graph attention layer input as X ⁇ R 90*7*27 high-dimensional data matrix.
  • This embodiment of the present application uses various indicators as graph object nodes, builds a graph attention layer based on GAT to learn the structural characteristics between each indicator node and the characteristics of the node itself, introduces an attention mechanism to assign different weights to adjacent nodes, and reduces the number of non-key indicators. Attention. Through the weighted summation of the characteristics of each GAT node, the distinction of the importance of indicator correlation is realized.
  • step 2 can include:
  • Step 21 Split the input data matrix
  • Step 22 General construction of graph attention layer.
  • This embodiment of the present application builds a graph attention network based on GAT, which requires horizontal and vertical separation of the multi-dimensional indicator time series after data preprocessing.
  • separating data it can be processed from two dimensions: time series dependence and indicator type; thus retaining the representational form of the original high-dimensional data.
  • the separated data can be directly used for modeling the graph attention layer.
  • Horizontal separation can separate the preprocessed data into different operation and maintenance indicator sequences at a single time; vertical separation can separate the preprocessed data into a single indicator and operation and maintenance sequence at different times.
  • the horizontally separated operation and maintenance indicator sequences at a single time can be used as graph object nodes to build a graph attention layer oriented to temporal dependencies within the indicators;
  • the vertically separated operation and maintenance indicator sequences at different times for a single indicator can be used as graph object nodes to build a graph oriented Graph attention layer for feature relationships between indicators.
  • the graph attention layer oriented to feature relationships between indicators and the graph attention layer oriented to temporal dependencies within indicators are both constructed based on the general GAT layer processing. By controlling different inputs, the relationships between time series features and temporal dependencies are dynamically learned respectively.
  • GAT introduces the attention mechanism attention into the graph neural network, and when updating the embedding (Embedding) feature of a node in the graph object to complete dimensionality reduction, every vertex i on the graph participates in the attention operation.
  • GAT can model the node relationships in the graph. For a graph object with k node features ⁇ v 1 , v 2 , v 3 ....v k ⁇ , vi is the feature vector of node i itself.
  • the output feature h i of each node i of the GAT layer can be expressed by Formula 2.
  • a ij is the attention coefficient between node i and node j;
  • is the activation function, such as the sigmoid function.
  • the process may include but is not limited to the first to third descriptions below.
  • the core principles include: learning appropriate W (model-shared learning weight) through training data to effectively capture the correlation between nodes i, j.
  • the calculation process of the similarity coefficient e ij mainly includes but is not limited to the following 1) to 3).
  • F represents the dimension of the input node, that is, the input dimension after data separation.
  • the F dimension is 27 ⁇ 7; for the graph attention oriented to feature relationships between indicators, the F dimension is 90 ⁇ 27.
  • This embodiment of the present application can be based on the commonly used numpy matrix processing python library and use the linalg.eig method to calculate the feature vector of each node.
  • the numpy library is suitable for processing large amounts of data and high-dimensional indicators, simplifying the calculation process and providing real-time model detection capabilities.
  • ⁇ ) operation is to splice the transformed features, and the dimension after splicing is 1 ⁇ 2F′.
  • the a(.) function represents a single-layer feedforward neural network whose parameters are a ⁇ T.
  • LeakyRelu is generally used as the nonlinear activation function.
  • a ⁇ T is the attention kernel, whose dimension is 2F′ ⁇ 1. Its main function is to map the spliced features to real numbers, that is Get the final node similarity coefficient
  • This method uses LeakyRelu as the activation function to calculate the attention coefficient. The main reason is to prevent GAT from removing the node's own information during the normalization process of each neighbor node before aggregating neighbor information.
  • the normalized attention coefficient and its corresponding feature are weighted to serve as the final output feature h′ i of each node, and each node i is obtained by integrating the new features h′ i of the domain.
  • the specific calculation formula is as follows, where a ij is the attention coefficient between index node i and node j.
  • v j is the own characteristic of node j. Specifically, for the graph attention layer oriented to the characteristics between indicators, v j is the specific value of different types of operation and maintenance indicators at a certain time; for the time series dependence oriented within the indicator The graph attention layer, whose v j is the specific value of a certain operation and maintenance indicator at different times.
  • this embodiment splits the original sequence data matrix vertically and treats the multi-dimensional time series as a complete graph object.
  • Each node represents a certain indicator feature and each edge represents two the relationship between corresponding features.
  • the input dimension of the graph attention layer oriented to the characteristics between indicators is k ⁇ n
  • the relationship can be measured by learning the attention score a ij through the graph attention mechanism, and obtaining the graph attention layer output h′ i through weighted summation.
  • this embodiment horizontally splits the original sequence data matrix, and treats the operation and maintenance indicator characteristics at each moment in the sliding window as a complete graph object.
  • the timestamp is used as a node, and the node x t is the feature vector of all operation and maintenance indicators at time t.
  • Each edge represents the relationship between indicator values at different times.
  • the temporal dependence of the time series is mined through the GAT attention layer.
  • the input dimension of the graph attention layer oriented to the temporal dependence within the indicator is n ⁇ k.
  • the mutual relationship between nodes (between each moment) can be measured by learning the attention score a ij through the graph attention mechanism, and the graph attention layer output is obtained through weighted summation. h′ i .
  • this embodiment of the present application introduces multi-head attention expansion to make the overall model combine the output of the feature-oriented graph attention layer, the output of the timing-oriented graph attention layer, and the data after data preprocessing Spliced into an n ⁇ 3k ⁇ m matrix, that is, each row represents a 3k-dimensional feature vector for each timestamp.
  • the first point is to specify the mining mode based on data splitting.
  • Deep generative models are based on holistic anomaly detection and do not explicitly specify which metrics to learn and the underlying interrelationships over time.
  • GAN converts its encoder optimization problem into minimizing the KL divergence between the generated sample distribution and the real data distribution by approximately simulating the real data distribution.
  • GAN needs to assume that the real data conforms to a certain distribution, and generates certain pattern samples through joint learning of the encoder and decoder.
  • This embodiment of the present application utilizes the preprocessing of data splitting, based on the two major modes of operation and maintenance a priori mode, inter-indicator and intra-time series, to explicitly convert non-linear mining into structure mining based on graph attention, and performs operation and maintenance anomaly detection in operation and maintenance.
  • the characterization effect on the focused indicator characteristics and time series dependence is more significant and accurate.
  • the second point is to fuse information based on multi-head attention.
  • this embodiment of the present application is based on a multi-head attention mechanism that fuses information from different sources as input to the GRU.
  • Introducing original data can make up for the weakening of long-term time series patterns caused by data splitting and reduce the loss of original high-dimensional data representation.
  • step 3 the process of mining long-term time series data based on GRU (equivalent to the GRU layer) in step 3 is explained.
  • the input dimension is n ⁇ 3k ⁇ m, which contains redundant short-term interference information.
  • this embodiment of the present application uses GRU to capture the interdependence between long-term series and extract dependency information in the time dimension.
  • the GRU network has a simple structure. It mines medium and long-term time series data and inputs it into the anomaly detection module to avoid glitches and other short-term sequences from interfering with the anomaly detection module.
  • Step 3 may specifically include but is not limited to the following steps 31 and 32.
  • Step 31 Construct the GRU network structure.
  • the input at time t is the network input value x t at the current time
  • the GRU layer output hidden state at the previous time is h t-1
  • the candidate memory content hidden layer at the current time is h′ t .
  • the gate control state is obtained through the hidden state h t-1 of the previous moment and the network input value x t of the current moment, and the gate function is used to complete the transmission, reset and update of information.
  • r is the gate that controls information reset
  • z is the gate that controls information update.
  • the update gate controls the extent to which the state information of the previous moment is retained to the current state, to avoid bringing in all historical information and causing the gradient to disappear.
  • the reset gate controls which information of the previous state needs to be forgotten and which information needs to be written to the current candidate hidden layer h′ t .
  • Step 32 GRU network model construction.
  • This embodiment of the present application builds a GRU medium and long-period network model based on the commonly used neural network module pytorch, calling the torch.nn.gru method to accept the three-dimensional sequence tensor expanded by multi-head attention.
  • the GRU network construction process during model training may include: Not limited to: GRU network layer construction, GRU network layer input and GRU network layer output.
  • the GRU layer output output_dim is a three-dimensional time series [seq_len, batch_size, output_size] including medium and long period dependencies.
  • [seq_len, batch_size, output_size] [90 ⁇ 27 ⁇ 10 ].
  • step 4 the process of building the joint anomaly detection model based on prediction and reconstruction in step 4 is explained.
  • the long-term time series features mined by GRU can well characterize the original multi-dimensional indicator sequence and are input into the anomaly detection module.
  • Prediction-based anomaly detection is often used to extract time-granularity operation and maintenance indicator features for future time prediction and anomaly detection; reconstruction-based anomaly detection is good at capturing the overall data distribution of time series. Therefore, this embodiment of the present application is based on multi-scenario adaptation optimization.
  • the medium and long-period timing of GRU mining depends on the parallel input of data.
  • the two are combined into an overall anomaly detection model for joint optimization.
  • the overall structure of the anomaly detection module is as follows. Training in the anomaly detection module is similar to black-box training.
  • the training and anomaly identification processing flow of the anomaly detection module may include but is not limited to: processing of the anomaly detection module based on prediction and processing of the anomaly detection module based on reconstruction.
  • the original multi-dimensional indicator data is processed based on the graph attention layer and GRU.
  • this embodiment of the present application uses a fully connected layer with a simple structure and low training overhead.
  • the indicator value at the next moment is predicted.
  • the mean square error function is used to predict the loss function LOSS, where x n, i is the actual value of the i-th indicator variable at the n-th moment, x ⁇ n, i is the predicted value of the i-th indicator variable at the n-th moment, and the Euclidean distance is used as the similarity degree measurement.
  • the loss function can be calculated through a formula, and the optimization process is the same as the fully connected network optimization process.
  • LOSS prediction represents the predicted loss function of the anomaly detection module
  • x n, i represents the actual value of the ith indicator variable at the nth moment
  • x ⁇ n, i represents the predicted value of the ith indicator variable at the nth moment.
  • the reconstruction-based anomaly detection module mainly learns the potential features of sequence data.
  • This embodiment of the present application builds a reconstruction anomaly detection module based on the variational autoencoder VAE improved by the autoencoder.
  • VAE uses neural networks to model the encoder (Encoder) and decoder (Decoder) respectively.
  • the encoder maps the input sequence x into a low-dimensional multivariate Gaussian distribution that is independent of each other in different dimensions. Learn the hidden feature z and generate a latent space layer.
  • the decoder reconstructs the probability distribution p ⁇ (x
  • the determination process of the VAE target loss function can include:
  • the VAE goal is to approximate the reconstructed distribution p ⁇ (x) of the decoding module to the original sequence distribution.
  • the final target loss function can be shown in Formula 6.
  • the VAE objective function optimization process can include:
  • VAE objective function optimization process is based on common neural network approximate fitting and Monte Carlo method estimation, and the variational evidence lower bound
  • VAE sets the hidden variable z as a simple prior distribution in order to optimize the objective function.
  • Set p ⁇ (z) to the standard normal distribution N (0, I), which approximates the posterior distribution
  • the mean is ⁇ I and the variance is the normal distribution of And use neural network fitting.
  • L 2 can be obtained by the following formula 9.
  • the real-time anomaly detection module of this embodiment of the present application will load the asynchronously trained joint anomaly detection model, input real-time data, and output the predicted values ⁇ x′ i
  • i 1, 2, based on the prediction module for the anomaly detection model at a certain moment, respectively. ...k ⁇ and the reconstruction probability ⁇ p′ i
  • i 1, 2....k ⁇ based on the reconstruction module. Predict the actual value of the next timestamp in a deterministic way based on the prediction module, which is sensitive to the randomness mining of time series. Based on the reconstruction module, the global data distribution is captured in the form of random variables and the noise that destroys the periodicity in the time series is ignored.
  • This process may include, but is not limited to, calculation and determination of abnormal scores, abnormality verification and location analysis.
  • this embodiment of the present application comprehensively considers the abnormal performance of each indicator feature, combines the output of the two abnormal modules, calculates the abnormal score s i of each indicator feature and sums it up as the final abnormality discrimination score Score , if the final abnormality discrimination score determination result is greater than a certain threshold, it is determined to be abnormal.
  • the calculation of the abnormality judgment score can refer to the following formula 10.
  • Score represents the final anomaly discrimination score
  • is a hyperparameter, which is mainly used to combine prediction-based and reconstruction-based probabilities. Its value is used to grid search the optimal parameters through the training set.
  • load the anomaly detection model of offline asynchronous training output the predicted value and reconstruction probability value of each indicator at time T 1 , calculate the anomaly score s i of each indicator through Formula 10, and sum up to obtain the final anomaly.
  • Score 1.428525. Since 1.428525 is greater than the threshold 1, it is determined that the system is abnormal at time T 1 .
  • this embodiment of the present application mainly consists of three core parts: the graph attention module, the GRU module, and the joint anomaly detection module.
  • the graph attention module After the multi-dimensional operation and maintenance indicator data is pre-processed by data interception and separation, joint anomaly detection is performed using prediction and reconstruction based on mining learning for inter-indicator characteristics and intra-indicator time series.
  • this embodiment of the present application introduces an attention mechanism to build a graph neural network based on GAT, and splits the original multi-dimensional index sequence horizontally and vertically to form nodes in the graph object.
  • This embodiment can retain the original high-dimensional data form and directly build the graph attention layer based on GAT.
  • the attention mechanism is used to assign weights to different indicators, and the model dynamically learns temporal dependencies and potential abnormal correlations in feature types.
  • this embodiment of the present application is based on the GRU mining graph neural network output of relatively long intervals and delays in the timing change rules to avoid the impact of indicator glitches on the overall anomaly detection model.
  • the characteristics of GRU such as its simple structure are used to accelerate the convergence of the model, simplify the model structure, and improve the real-time processing efficiency of anomaly detection.
  • this embodiment of the present application is based on joint anomaly detection based on prediction and VAT reconstruction, which can be suitable for operation and maintenance indicators of various types and scenarios, and the joint determination of anomalies can improve the robustness of the detection model.
  • anomaly scores for each indicator classification it makes up for the lack of abnormal correlation analysis in existing multi-dimensional detection solutions.
  • the data processing device 210 includes: an obtaining unit 2101, a first pre-processing unit 2102, a second pre-processing unit 2103 and a training unit 2104. in:
  • Obtaining unit 2101 is configured to obtain a pre-training data set;
  • the pre-training data set is an n ⁇ k ⁇ m matrix, where m represents the number of periods corresponding to the pre-training data set, and k represents the The number of indicators corresponding to the training data set, one of the period includes n moments; the n is greater than 1, the k is greater than 1, and the m is greater than 1;
  • the first pre-processing unit 2102 is configured to longitudinally segment the pre-training data set through the pre-processing layer of the anomaly detection model to obtain a first training data set; the first training data set is an n ⁇ m ⁇ k matrix. ; The k represents the number of first samples included in the first training data set, and one first sample is used to represent the value of an indicator in n ⁇ m time dimensions;
  • the second preprocessing unit 2103 is configured to horizontally segment the pre-training data set through the pre-processing layer to obtain a second training data set;
  • the second training data set is an m ⁇ k ⁇ n matrix; so
  • the n represents the number of second samples included in the second training data set, and one second sample is used to represent the values of k indicators corresponding to one moment in m time periods;
  • the training unit 2104 is configured to train an anomaly detection model based on the first training data set and the second training data set to obtain a target detection model; the target detection model is used to determine detection parameters of detection data within a time period. ; The detection parameters are used to determine whether the detection data is abnormal.
  • the anomaly detection model also includes: a first graph neural network GAT layer, a second GAT layer and a prediction layer.
  • the training unit 2104 is specifically configured as:
  • the k first samples in the first training data set are input to the first GAT layer, and k ⁇ n first features are obtained through processing of the first GAT layer; one of the first features is used to characterize The relationship between the value of the indicator corresponding to the first feature and the values of k-1 indicators;
  • n second samples in the second training data set are input to the second GAT layer, and n ⁇ k second features are obtained through processing of the second GAT layer; one of the second features is used to characterize all The relationship between the value of the moment corresponding to the second feature and the values of n-1 moments;
  • At least the k ⁇ n first features and the n ⁇ k second features are spliced to obtain a prediction data set; the prediction data set is an n ⁇ s ⁇ m matrix; the s is greater than or equal to narrate k;
  • the parameters in the anomaly detection model are adjusted based on the first prediction result to obtain a target detection model.
  • the training unit 2104 is further configured to:
  • the first processing includes: determining the similarity coefficients between the first node and the n first nodes respectively, and obtaining the n similarity coefficients;
  • the first feature corresponding to the first node is determined based on the data corresponding to the n nodes and the n attention coefficients.
  • the prediction layer includes a fully connected layer and a variational autoencoder VAE layer
  • the prediction data set includes m prediction samples
  • the training unit 2104 is also configured as:
  • Each of the m prediction samples is input to the fully connected layer, and m sets of predicted values are obtained through processing by the fully connected layer;
  • a set of prediction values is obtained after processing by the fully connected layer; a set of prediction values includes: the predicted values of the k indicators at the next moment; The next moment is the next moment in the period corresponding to the prediction sample;
  • Each of the m prediction samples is input to the VAE layer, and m sets of reconstruction probabilities are obtained through processing by the VAE layer;
  • a set of reconstruction probabilities is obtained after processing by the VAE layer; a set of reconstruction probabilities includes: the reconstruction probabilities of the k indicators at the next moment;
  • Determining the first prediction result includes: the m sets of prediction values and the m sets of reconstruction probabilities.
  • training Unit 2104 is also configured to:
  • m target losses are determined; wherein, for a set of predicted values and the set of predictions The reconstruction probability corresponding to the numerical value determines one of the target losses;
  • the parameters in the anomaly detection model are adjusted based on the m target losses to obtain the target detection model.
  • the anomaly detection model further includes a splicing layer
  • the training unit 2104 is further configured to:
  • the k ⁇ n first features, the n ⁇ k second features and the pre-training data set are input to the splicing layer, and spliced data is obtained through processing of the splicing layer; wherein, the prediction The data set is the spliced data; the s is equal to 3 times the k.
  • the anomaly detection model also includes: a splicing layer and a gated recurrent unit GRU layer, and the training unit 2104 is also configured as:
  • the spliced data is input to the GRU layer, and the interference of indicator dimensions in the spliced data is filtered through the GRU layer to obtain the prediction data set; the s is less than 3 times the k.
  • the data processing device 210 may further include a prediction unit configured to: after training an anomaly detection model based on the first training data set and the second training data set to obtain a target detection model ,implement:
  • the first period is any period
  • the detection data is input into the target detection model, and the detection parameter values of the k indicators at the second moment are obtained through processing by the target detection model; the second moment is the next time period of the first period. time;
  • the total score is greater than or equal to the score threshold, it is determined that the detection data is abnormal.
  • the prediction unit is further configured to:
  • the first formula includes:
  • the Score represents the total score corresponding to the detection data
  • xi represents the actual value of the i-th indicator at the second moment
  • x′ i represents the predicted value of the i-th indicator at the second moment.
  • the p′ i represents the reconstruction probability value of the i-th indicator at the second moment
  • the ⁇ represents the preset coefficient.
  • the data processing device 210 may further include a positioning unit configured as:
  • abnormality positioning is performed.
  • the data processing device includes each included unit, which can be implemented by a processor in an electronic device; of course, it can also be implemented by a specific logic circuit; during the implementation process, the processor It can be a central processing unit (CPU, Central Processing Unit), a microprocessor (MPU, Micro Processor Unit), a digital signal processor (DSP, Digital Signal Processor) or a field programmable gate array (FPGA, Field-Programmable Gate Array) wait.
  • CPU Central Processing Unit
  • MPU Micro Processor Unit
  • DSP Digital Signal Processor
  • FPGA Field-Programmable Gate Array
  • the technical solutions of the embodiments of the present application can be embodied in the form of software products that are essentially or contribute to related technologies.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable A computer device (which may be a personal computer, a server, a network device, etc.) executes all or part of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (Read Only Memory, ROM), magnetic disk or optical disk and other media that can store program code.
  • embodiments of the present application are not limited to any specific combination of hardware and software.
  • embodiments of the present application provide an electronic device, including a memory and a processor.
  • the memory stores a computer program that can be run on the processor, and the processor executes the program.
  • the electronic device 220 may be the above-mentioned electronic device.
  • the electronic device 220 includes: a processor 2201, at least one communication bus 2202, a user interface 2203, at least one external communication interface 2204 and a memory 2205.
  • the communication bus 2202 is configured to implement connection communication between these components.
  • the user interface 2203 may include a display screen, and the external communication interface 2204 may include a standard wired interface and a wireless interface.
  • the memory 2205 is configured to store instructions and applications executable by the processor 2201, and can also cache data to be processed or processed by the processor 2201 and various modules in the electronic device (for example, image data, audio data, voice communication data and video Communication data), which can be implemented through flash memory (FLASH) or random access memory (Random Access Memory, RAM).
  • FLASH flash memory
  • RAM Random Access Memory
  • embodiments of the present application provide a storage medium, that is, a computer-readable storage medium, on which a computer program is stored.
  • a storage medium that is, a computer-readable storage medium, on which a computer program is stored.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division.
  • the coupling, direct coupling, or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be electrical, mechanical, or other forms. of.
  • the units described above as separate components may or may not be physically separated; the components shown as units may or may not be physical units; they may be located in one place or distributed to multiple network units; Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • all functional units in the embodiments of the present application can be integrated into one processing unit, or each unit can be separately used as a unit, or two or more units can be integrated into one unit; the above-mentioned integration
  • the unit can be implemented in the form of hardware or in the form of hardware plus software functional units.
  • the aforementioned program can be stored in a computer-readable storage medium.
  • the execution includes: The steps of the above method embodiment; and the aforementioned storage media include: mobile storage devices, read-only memory (Read Only Memory, ROM), magnetic disks or optical disks and other various media that can store program codes.
  • the integrated units mentioned above in this application are implemented in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the computer software product is stored in a storage medium and includes a number of instructions to enable A computer device (which may be a personal computer, a server, a network device, etc.) executes all or part of the methods described in various embodiments of this application.
  • the aforementioned storage media include: mobile storage devices, ROMs, magnetic disks or optical disks and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Image Analysis (AREA)
  • Complex Calculations (AREA)

Abstract

一种数据处理方法、装置、设备及存储介质,所述方法包括:获得预训练数据集(S201);所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集(S202);通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集(S203);基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型(S204)。上述方法异常检测时准确度高,且通用性好。

Description

数据处理方法、装置、设备及存储介质
相关申请的交叉引用
本申请基于申请号为202210760323.5、申请日为2022年06月29日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式并入本申请。
技术领域
本申请涉及数据处理技术领域,涉及但不限于数据处理方法、装置、设备及存储介质。
背景技术
随着计算机技术的飞速发展,越来越多的技术应用在金融领域,传统金融业正在逐步向金融科技(Fintech)转变,但由于金融行业的安全性和实时性要求,也对技术提出了更高的要求。
基于业界标准输出的时序运维指标(Metrics)种类繁多,监控不同业务、应用、***、集群的各项指标,包含业务指标;例如交易量、成功率、接口耗时等;***服务指标;例如中央处理器(central processing unit,CPU)、内存(mem)、闪存(DISK)、输入出错(in/out,IO)等。指标间相互作用关系复杂,***及服务的健康状态往往是由一系列指标共同决定的。挖掘某一的Metirc指标可以提供单个层面级别的信息,但仅通过单指标异常判定***异常往往会造成过多的误告。多指标运维时序检测,可以更全面地了解整个运行***服务。
相关技术中,多维指标异常检测常直接利用生成对抗网络(Generative Adversarial Network,GAN)等深度生成模型,以原始多指标数据为模型整体输入,基于概率密度估计和生成样本构建编码器和解码器计算重构误差,以重构误差判别异常。
这样,一方面,相关算法基于原始输入服从某一分布的假设,以整体原始数据为输入,对于高维的随机向量难以直接建模;需增加分布近似和条件独立性来简化模型,但会弱化原有数据表征效果,最终导致模型鲁棒性、通用性不足;另一方面,相关技术不关注数据内部的关系,所以异常检测的准确度低。
发明内容
本申请提供一种数据处理方法及装置、设备、存储介质,异常检测时准确度高,且通用性好。
本申请的技术方案是这样实现的:
本申请提供了一种数据处理方法,所述方法包括:获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;
通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;
通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;
基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
本申请提供了一种数据处理装置,所述装置包括:
获得单元,配置为获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述 预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;
第一预处理单元,配置为通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;
第二预处理单元,配置为通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;
训练单元,配置为基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
本申请还提供了一种电子设备,包括:存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述数据处理方法。
本申请还提供了一种存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述数据处理方法。
本申请所提供的数据处理方法、装置、设备及存储介质,包括:获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
对于本申请的方案,在获得预训练数据集后,将预训练数据进行纵向和横向分割,以得到第一训练数据集和第二训练数据集,再通过第一训练数据集和第二训练数据集对异常检测模型进行训练,得到目标检测模块。这样,一方面,由于纵向分割后,一个第一训练数据集中的一个样本对应一个指标,所以基于第一训练数据集的训练可以关注到指标之间的关系,横向分割后一个第二训练数据集中的一个样本对应一个时刻,所以基于第二训练数据集的训练可以关注到时刻之间的关系;因此得到的目标检测模型在异常检测时,异常检测的准确度更高;另一方面,整个模型的应用场景也不受限制,对于指标的类型、指标的数量、样本的数量等均不作限定,所以目标检测模型的通用性好。
附图说明
图1为本申请实施例提供的数据处理***的一种可选的结构示意图;
图2为本申请实施例提供的数据处理方法的一种可选的流程示意图
图3为本申请实施例提供的数据处理方法的一种可选的流程示意图;
图4为本申请实施例提供的数据处理方法的一种可选的流程示意图;
图5为本申请实施例提供的数据处理方法的一种可选的流程示意图;
图6为本申请实施例提供的数据处理方法的一种可选的流程示意图;
图7为本申请实施例提供的数据处理过程的一种可选的框架结构示意图;
图8为本申请实施例提供的原始数据的一种可选的示意图;
图9为本申请实施例提供的归一化后的数据的一种可选的示意图;
图10为本申请实施例提供的归一且截取后数据的一种可选的示意图;
图11为本申请实施例提供的纵向分割后的一种可选的数据示意图;
图12为本申请实施例提供的横向分割后的一种可选的数据示意图;
图13为本申请实施例提供的确定注意力系数的一种可选的流程示意图;
图14为本申请实施例提供的面向指标间的图注意力层输出特征的一种可选的示意图;
图15为本申请实施例提供的面向不同时刻的图注意力层输出特征的一种可选的示意图;
图16为本申请实施例提供的拼接的一种可选的流程示意图;
图17为本申请实施例提供的GRU层的一种可选的原理示意图;
图18为本申请实施例提供的GRU层的一种可选的结构示意图;
图19为本申请实施例提供的确定损失的一种可选的结构示意图;
图20为本申请实施例提供的检测过程的一种可选的结构示意图;
图21为本申请实施例提供的数据处理装置的一种可选的结构示意图;
图22为本申请实施例提供的电子设备的一种可选的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对申请的具体技术方案做进一步详细描述。以下实施例用于说明本申请,但不用来限制本申请的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二\第三”仅是为例区别不同的对象,不代表针对对象的特定排序,不具有先后顺序的限定。可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
本申请实施例可提供数据处理方法及装置、设备和存储介质。实际应用中,数据处理方法可由数据处理装置实现,数据处理装置中的各功能实体可以由电子设备的硬件资源,如处理器等计算资源、通信资源(如用于支持实现光缆、蜂窝等各种方式通信)协同实现。
本申请实施例提供的数据处理方法应用于数据处理***,数据处理***包括第一设备。
第一设备用于执行:获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
可选的,数据处理***还可以包括第二设备。第二设备用于收集历史数据,得到历史数据集,并将历史数据集发送至第一设备,以使第一设备根据历史数据集得到预训练数据集。
可以理解的,基于历史数据集得到预训练数据集的过程也可以在第二设备侧实现。
需要说明的是,第一设备和第二设备可以集成于同一个电子设备上,也可以分别独立部署于不同的电子设备上。
作为一示例,数据处理***的结构可如图1所示,包括:第一设备10和第二设备20。其中,第一设备10与第二设备20之间可以进行数据的传输。
这里,第一设备10用于执行:获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;
基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
其中,第一设备10可以为服务器、电脑等具有相关数据处理能力的电子设备。
第二电子设备20,用于收集历史数据,得到历史数据集,并将历史数据集发送至第一设备,以使第一设备根据历史数据集得到预训练数据集。
其中,第二电子设备20可以包括:移动终端设备(例如手机、平板电脑等),或者非移动终端设备(例如台式电脑、服务器等)等具有相关数据处理能力的电子设备。
下面,结合图1所示的数据处理***的示意图,对本申请实施例提供的数据处理方法及装置、设备和存储介质的各实施例进行说明。
第一方面,本申请实施例提供一种数据处理方法,该方法应用于数据处理装置;其中,该数据处理装置可以部署于图1中的第一设备10。下面,以电子设备为执行主体,对本申请实施例提供的数据处理过程进行说明。
图2示意了一种可选的一种数据处理方法的流程示意图,参考图2所示的内容,该数据处理方法可以包括但不限于图2所示的S201至S204。
S201、电子设备获得预训练数据集。
S201可以实施为:电子设备获得历史数据集,将历史数据集进行归一化和截取处理后,得到预训练数据集。
由于获得的历史数据集中的指标类型角度,每个类型的量纲尺度不一,对应的最大值和最小值也不同,所以需要对历史数据集中的数据做归一化处理。
本申请实施例对归一化的具体方式不作限定,可以根据实际需求进行处理。
示例性的,可以采用MAXMIN归一方法,将历史数据的取值映射到[0,1]之间。
对于数据的截取,本申请实施例对截取的具体过程也不做限定,可以根据实际需求进行配置。
示例性的,可以使用长度为90的滑动窗口n,长度为50滑动步长d对原始数据进行截取,从而得到预训练数据集。
预训练数据集为n×k×m的矩阵,其中,m表示预训练数据集对应的时段的数量,k表示训练数据集对应的指标的数量,一个时段包括n个时刻;n大于1,k大于1,m大于1。
示例性的,预训练数据集为90×7×27的矩阵;该矩阵可以表示截取到了27个时段的数据,一个时段对应的时长为90分钟,数据指标包括7个指标。
S202、电子设备通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集。
第一训练数据集为n×m×k的矩阵;k表示第一训练数据集包括的第一样本的数量,一个第一样本用于表征一个指标在n×m个时间维度内的取值。
S202可以实施为:电子设备通过异常检测模型的预处理层对预训练数据集进行纵向分割,分割时以指标类型作为分割点,在两个指标间进行分割,然后将多个时段的数据进行融合,从而得到第一训练数据集。
可以看出:在第一训练数据集中,一个第一样本对应一个指标。
S203、电子设备通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集。
第二训练数据集为m×k×n的矩阵;n表示第二训练数据集包括的第二样本的数量,一个第二样本用于表征一个时刻对应的k个指标在m个时间段的取值
S203可以实施为:电子设备通过异常检测模型的预处理层对预训练数据集进行横向分割,分割时以不同时刻作为分割点,在两个时刻间进行分割,然后将多个时段的数据进行融合,从而得到第二训练数据集。
可以看出:在第二训练数据集中,一个第二样本对应一个时刻。
S204、电子设备基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型。
S204、电子设备将第一训练数据集中的k个第一样本和第二训练数据集中的n个第二样本输入至异常检测模型,异常检测模型输出对应的检测参数,电子设备基于输出的检测参数调整异常检测模型中的参数,在异常模型中的参数满足要求的情况下,得到目标检测模型。
本申请实施例对于具体的训练方式不限定,可以根据实际需求进行配置。例如可以基于损失函数进行方向传播训练。
目标检测模型用于确定一个时间段内的检测数据的检测参数;检测参数用于确定检测数据是否异常。
本申请实施例对检测参数的具体类型不作限定,可以根据时间需求进行配置。示例性的检测参数可以包括以下一项或者多项:预测数值和重构概率。
本申请实施例提供的数据处理方案包括:获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
对于本申请的方案,在获得预训练数据集后,将预训练数据进行纵向和横向分割,以得到第一训练数据集和第二训练数据集,再通过第一训练数据集和第二训练数据集对异常检测模型进行训练,得到目标检测模块。这样,一方面,由于纵向分割后,一个第一训练数据集中的一个样本对应一个指标,所以基于第一训练数据集的训练可以关注到指标之间的关系,横向分割后一个第二训练数据集中的一个样本对应一个时刻,所以基于第二训练数据集的训练可以关注到时刻之间的关系;因此得到的目标检测模型在异常检测时,异常检测的准确度更高;另一方面,整个模型的应用场景也不受限制,对于指标的类型、指标的数量、样本的数量等均不作限定,所以目标检测模型的通用性好。
下面,对S204电子设备基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型的过程进行说明。该过程可以包括但不限于下述方式1至方式3。
方式1、异常检测模型依次包括:预处理层、第一图神经网络(Graph Attention Network,GAT)层、第二GAT层以及预测层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层以及预测层。
方式2、异常检测模型依次包括:预处理层、第一图神经网络GAT层、第二GAT层、拼接层以及预测层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层、拼接层以及预测层。
方式3、异常检测模型依次包括:预处理层、第一图神经网络GAT层、第二GAT层、拼接层、门控循环单元GRU层以及预测层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层、拼接层、门控循环单元GRU层以及预测层。
下面,对方式1异常检测模型依次包括:预处理层、第一图神经网络GAT层、第二GAT层以及预测 层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层以及预测层的过程进行说明。如图3所示,该过程可以包括但不限于S2041至S2045。
S2041、电子设备将所述第一训练数据集中的k个第一样本输入所述第一GAT层,通过所述第一GAT层的处理得到k×n个第一特征。
一个第一特征用于表征第一特征对应的指标的取值与k-1个指标的取值之间的关系。简单来说,第一特征可以反映出指标之间的关系。
S2041可以实施为:电子设备将第一训练数据集中的k个第一样本输入第一GAT层,通过所述第二GAT层的处理,针对一个第一样本生成n个第一特征,遍历k个第一样本,从而得到k×n个第一特征。
其中,第一特征的维度为m。
S2042、电子设备将所述第二训练数据集中的n个第二样本输入所述第二GAT层,通过所述第二GAT层的处理得到n×k个第二特征。
一个第二特征用于表征第二特征对应的时刻的取值与n-1个时刻的取值之间的关系。简单来说,第二特征可以反映出时刻之间的关系。
S2042可以实施为:电子设备将第二训练数据集中的n个第二样本输入第二GAT层,通过所述第二GAT层的处理,针对一个第二样本生成k个第二特征,遍历n个第二样本,从而得到n×k个第二特征。
其中,第二特征的维度为m。
S2043、电子设备至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集。
预测数据集为n×s×m的矩阵;s大于或等于k。
本申请实施例对于得到预测数据集的具体处理过程不作限定,可以根据实际需求进行配置。
在一种可能的实施方式中,得到预测数据集的处理可以包括:拼接处理。
在另一种可能的实施方式中,得到预测数据集的处理可以包括:拼接处理和GRU层的滤波处理。
S2044、电子设备将所述预测数据集输入至所述预测层,通过所述预测层的处理得到第一预测结果。
本申请实施例对预测层的具体内容以及第一预测结果的具体内容不作限定,可以根据实际需求进行配置。
一种可能的实施方式中,预测层可以包括全连接层;对应的第一预测结果包括预测数值。
另一种可能的实施方式中,预测层可以包括变分自编码器(Variational auto-encoder,VAE)层;对应的第一预测结果包括重构概率。
再一种可能的实施方式中,,预测层可以包括全连接层和VAE层;对应的第一预测结果包括预测数值和重构概率。
S2045、电子设备基于所述第一预测结果调整所述异常检测模型中的参数,得到目标检测模型。
S2045可以实施为:电子设备基于第一预测结果调整异常检测模型中,预处理层、第一图神经网络GAT层、第二GAT层以及预测层中的相关参数,从而得到目标检测模型。
通过第一GAT层和第二GAT层可以提取到第一特征(指标间的关系)和第二特征(时刻之间的关系);所以根据第一特征和第二特征得到的目标检测模型具有较高的准确度。
下面,以一个第一样本为例,对S2041电子设备将所述第一训练数据集中的k个第一样本输入所述第一GAT层,通过所述第一GAT层的处理得到k×n个第一特征的过程进行说明。该过程可以包括但不限于下述S20411和S20412。
S20411、电子设备将所述第一样本的每一行作为一个第一节点,得到n个第一节点。
示例性的,第一样本用于表征一个指标在n×m个时间维度内的取值。对应的,第一样本的每一行表示一个指标在一个时刻,不同时段的取值。
由于第一样本的维度为n×m,电子设备将每一行作为一个第一节点,从而得到n个第一节点。
S20412、电子设备针对所述n个第一节点中的每个所述第一节点,执行第一处理,得到所述第一节点对应的第一特征。
可以理解的,对于n个第一节点,可以得到n个第一特征。
在S20412中,以一个第一节点为例,对第一处理过程进行说明。
第一处理可以包括但不限于下述A至C。
A、电子设备确定第一节点分别与所述n个第一节点之间的相似系数,得到所述n个相似系数。
本申请实施例对于确定相似系数的具体方法不作限定,可以根据实际需求进行配置。
B、电子设备将所述n个相似系数转换为n个注意力系数。
因各节点特征量纲不一致,为了使相似系数在不同节点之间易于比较,在注意力系数加权求和前,需对相似系数进行归一化处理,即将相似系数转换为注意力系数。
本申请实施例具体的转换方式不作限定,可以根据实际需求进行配配置。
示例性的,可以使用使用softmax归一化指数函数进行转换。
C、电子设备基于所述n个节点对应的数据和所述n个注意力系数确定所述第一节点对应的第一特征。
示例性的,电子设备将n个节点对应的数据和每个数据对应的注意力系数进行乘积运算,将乘积运算的和作为第一特征。
可以理解的,对于k个第一样本的处理过程类似,具体实现可以参考上述S,此处不再一一赘述。
可以理解的,对于n个第二样本的处理过程也类似,具体实现可以参考上述S,此处不再一一赘述。
在得到一个节点对应的第一特征时,基于节点间的注意力系数的处理方式,实现简单且准确。
下面,对S2044电子设备将所述预测数据集输入至所述预测层,通过所述预测层的处理得到第一预测结果的过程进行说明。如图4所示,该过程可以包括但不限于S20441至S20443。
其中,预测层包括全连接层和变分自编码器VAE层,预测数据集包括m个预测样本。
S20441、电子设备将所述m个预测样本中的每个所述预测样本输入至所述全连接层,通过所述全连接层的处理得到m组预测数值。
其中,针对每个预测样本,通过全连接层的处理后得到一组预测数值;一组预测数值包括:所述k个指标在下一时刻的预测取值;下一时刻为预测样本对应的时段的下一时刻。
全连接层用于对每个预测样本,根据每个预测样本在一个时段的k个指标的数据,预测在该时段的下一个时刻的该k个指标的预测数值。
S20441可以实施为:电子设备将m个预测样本中的每个预测样本输入至全连接层,全连接层针对一个样本处理后得到一组预测数值,遍历m个预测样本,从而得到m组预测数值。
S20442、电子设备将所述m个预测样本中的每个所述预测样本输入至所述VAE层,通过所述VAE层的处理得到m组重构概率。
其中,针对每个所述预测样本,通过所述VAE层的处理后得到一组所述重构概率;一组所述重构概率包括:所述k个指标在下一时刻的重构概率。
VAE层用于对每个预测样本,根据每个预测样本在一个时段的k个指标的数据,预测在该时段的下一个时刻的该k个指标的重构概率。
S20442可以实施为:电子设备将m个预测样本中的每个预测样本输入至VAE层,VAE层针对一个样本处理后得到一组重构概率,遍历m个预测样本,从而得到m组重构概率。
S20443、电子设备确定所述第一预测结果包括:所述m组预测数值和所述m组重构概率。
这样,预测层包括全连接层和VAE层,通过两个维度来确定预测结果,具有准确度高的特点。
下面,对S2045电子设备基于所述第一预测结果调整所述异常检测模型中的参数,得到目标检测模型的过程进行说明。该过程可以包括但不限于下述S20451至S20453。
其中,预测层包括全连接层和变分自编码器VAE层,第一预测结果包括述m组预测数值和m组重构概率。
S20451、电子设备确定所述全连接层对应的第一损失函数,以及所述VAE层对应的第二损失函数;所述第一损失函数与所述第二损失函数不同。
本申请实施例对于第一损失函数与第二损失函数的具体函数描述不作限定,可以根据实际需求进行配置。
示例性的,第一损失函数可以包括:
Figure PCTCN2022120467-appb-000001
在第一损失函数中,LOSS 预测表示全连接层对应的损失函数;x n,i表示第n时刻第i个指标变量实际值;x` n,i表示第n时刻第i个指标变量预测数值。
示例性的,第二损失函数可以包括:
Figure PCTCN2022120467-appb-000002
S20452、电子设备至少基于所述m组预测数值、所述m组重构概率、所述第一损失函数和所述第二损失函数,确定m个目标损失。
其中,针对一组预测数值和该一组预测数值对应的重构概率,确定一个目标损失。
一组预测数值和一组重构概率基于同一个时段而对应。
20452可以实施为:电子设备针对m组预测数值和m组重构概率,将每一组预测数值和对应的一组重构概率带入第一损失函数和第二损失函数中,分别计算出第一损失值和第二损失值;然后将第一损失值和第二损失值求和得到一个目标损失;电子设备遍历m组预测数值和m组重构概率,从而得到m个目标损失。
S20453、电子设备基于所述m个目标损失调整所述异常检测模型中的参数,得到所述目标检测模型。
一种可能的实施方式中,S20453可以实施为:电子设备基于m个目标损失,调整异常检测模型中预测层的参数,在目标损失满足需求的情况下,得到目标检测模型。
另一种可能的实施方式中,S20453可以实施为:电子设备基于m个目标损失,分别对异常检测模型中每一层的参数进行逐步调整,在目标损失满足需求的情况下,得到目标检测模型。
通过将第一损失与第二损失结合计算目标损失的方式,可以将全连接层与VAE层之间的影响考虑进去,得到的目标检测模型准确度更高。
对于方式1,异常检测模型依次包括:预处理层、第一GAT层、第二GAT层以及预测层,具有实现简单,处理效率高的特点;
下面,对方式2、异常检测模型依次包括:预处理层、第一图神经网络GAT层、第二GAT层、拼接层以及预测层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层、拼接层以及预测层的过程进行说明。
与方式1不同的是,S2043电子设备至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集的过程不同。在方式2中,该过程可以包括但不限于A1或者A2。
A1、电子设备将所述k×n个第一特征、所述n×k个第二特征以及所述预训练数据集输入至所述拼接层,通过所述拼接层的处理得到拼接数据。
其中,预测数据集为所述拼接数据;s等于3倍的k。
A2、电子设备将k×n个第一特征、n×k个第二特征输入至所述拼接层,通过所述拼接层的处理得到拼接数据。
其中,预测数据集为拼接数据;s等于2倍的k。
可以理解的,还可以有其他拼接方式,例如,将预训练数据和k×n个第一特征进行拼接;或者将预训练数据和n×k个第二特征进行拼接;具体实现过程参考上述A1或者A2的描述,此处不再一一赘述。
对于方式2,异常检测模型依次包括:预处理层、第一GAT层、第二GAT层、拼接层以及预测层;通过增加拼接层的处理,可以进一步结合预处理后的数据,得到的目标模型在检测时,准确度较高。
下面对方式3、异常检测模型依次包括:预处理层、第一图神经网络GAT层、第二GAT层、拼接层、门控循环单元(Gated Recurrent Unit,GRU)层以及预测层;对应的,电子设备基于第一训练数据集和第二训练数据集训练异常检测模型中的预处理层、第一图神经网络GAT层、第二GAT层、拼接层、门控循环单元GRU层以及预测层的过程进行说明。
与方式1不同的是,在S2043电子设备至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集的过程不同。在方式3中,该过程可以包括但不限于B1和B2。
B1、电子设备将所述k×n个第一特征、所述n×k个第二特征以及预训练数据集输入至所述拼接层, 通过所述拼接层的处理得到拼接数据。
其中,拼接数据为n×3k×m的矩阵。
B2、电子设备将所述拼接数据输入至所述GRU层,通过所述GRU层对所述拼接数据中,指标维度的干扰进行过滤,得到所述预测数据集。
s小于3倍的k。
由于拼接后的拼接数据在指标方向的维度较大,若不进行滤波处理,数据处理量会较大,所以本申请的实施例通过GRU层对拼接数据中指标维度的干扰进行过滤,从而得到预测数据。为了同时保持预测数据中指标维度的完整性,一般使s大于或等于k,且s小于3k。
对于方式3、异常检测模型依次包括:预处理层、第一GAT层、第二GAT层、拼接层、门控循环单元GRU层以及预测层;由于拼接层的处理可以提高检测的准确度,且GRU层可以进一步提高处理速度。
本申请实施例提供的数据处理方法在得到目标检测模型后,还可以通过目标检测模型对检测数据进行检测,以确定是否出现异常。
如图5所示,该数据处理方法还可以包括但不限于下述S205至S208。
S205、电子设备获取第一时段内k个指标的检测数据。
第一时段为任一时段。
S206、电子设备将所述检测数据输入至所述目标检测模型,通过所述目标检测模型的处理得到所述k个指标在第二时刻的检测参数值。
第二时刻为所述第一时段的下一时刻。
S207、电子设备基于所述k个指标在第二时刻的检测参数值,确定所述检测数据对应的总分数。
本申请实施例对于根据检测数值确定检测数据对应的总分数的具体方式不作限定,可以根据实际进行配置。
S208、电子设备在所述总分数大于或等于分数阈值的情况下,确定所述检测数据异常。
本申请实施例对于分数阈值的取值不作限定,可以根据实际需求进行配置。
示例性的,分数阈值可以为1。
需要说明的是,在检测数据对应的总分数小于分数阈值的情况下,确定检测数据正常。
这样,通过目标检测模型可以得到检测数据是正常还是异常。且,对于应用场景没有限定,即对于任一场景,均可以得到该场景的目标检测模型,从而实现对该场景的异常检测。
下面,对S207电子设备基于所述k个指标在第二时刻的检测参数值,确定所述检测数据对应的总分数的过程进行说明。
该过程可以包括通过第一公式确定所述检测数据对应的总分数;
第一公式包括:
Figure PCTCN2022120467-appb-000003
其中,所述Score表示所述检测数据对应的总分数,x i表示第i个指标在所述第二时刻的实际取值;x′ i表示第i个指标在所述第二时刻的预测值;所述p′ i表示第i个指标在所述第二时刻的重构概率值;所述γ表示预设系数。
示例性的,γ可以为0.8。
通过这样的方式确定检测数据对应的总分数,具有较高的准确率。
进一步的,本申请实施例提供的数据处理方法还可以进行异常定位。如图6所示,该过程可以包括但不限于下述S209和S210。
S209、电子设备确定所述k个指标在所述第二时刻的异常分数。
示例性的,电子设备确定k个指标在第二时刻的异常分数包括:第一指标在第二时刻的分数为0.046884;第二指标在第二时刻的分数为0.409688;第三指标在第二时刻的分数为0.449229;第四指标在第二时刻的分数为0.021445;第五指标在第二时刻的分数为0.013142;第六指标在第二时刻的分数为0.437159;第七指标在第二时刻的分数为0.051018。
S210、电子设备基于所述k个指标在所述第二时刻的异常分数,进行异常定位。
基于上述S209的示例,电子设备确定第二指标异常、第三指标异常以及第六指标异常。
进一步的,电子设备还可以根据异常的第一指标、第三指标异常以及第六指标异常进行更细的异常分析。
通过这样的方法可以直观的进行异常定位,且异常定位过程实现简单,且异常定位准确率也较高,
下面,以运维处理过程例,通过一个实施例对本申请实施例提供的数据处理方法进行说明。
为了便于理解,对部分技术术语进行解释。
图神经网络(Graph Attention Network,GAT)引入自注意力机制,可以解决图卷积网络(Graph Convolutional Network,GCN)依赖研究对象的整体图结构,依赖拉普拉斯矩阵进行谱分解而无法适用于动态图的问题。
其中,GAT可以基于节点重要性进行权重分配,为每个邻居分配不同的权重,避免GCN在卷积时对所有邻居节点均等处理的情况。通过GAT可以获取每个节点的邻域特征,为邻域中的不同节点分配不同的权重;这样,无需使用高运算量的矩阵运算,同时无需事前知悉图对象的具体结构,具有较强的鲁棒性和适用性。
在该实施例中,将每个运维时序指标视为单独特征,以此构建图对象中的节点。通过GAT对不同特征之间的相关性及每个时间序列内的时间依赖性进行建模,捕获多维时间序列的指标特征和时序关系。
门控循环单元(Gated Recurrent Unit,GRU),基于长短期记忆网络(Long Short-Term Memory,LSTM)的变体,仅保留更新门和重置门,解决LSTM训练时间长、参数多、计算复杂度高的问题。在该实施例中,通过各种门函数来挖掘时间序列中相对较长间隔和延迟等的时序变化规律,提取时间维度的依赖信息,解决循环神经网络(Recurrent Neural Network,RNN)在训练中很容易发生的梯度***和梯度消失的问题。
在相关技术中,业界标准输出的时序运维指标(Metrics)种类繁多,运维过程中需要监控不同业务、应用、***、集群的各项指标,具体包含业务指标(交易量、成功率、接口耗时等)和***服务指标(CPU、MEM、DISK、IO等)。指标间相互作用关系复杂,***及服务的健康状态往往是由一系列指标共同决定的。挖掘某一指标可以提供单个层面级别的信息,但仅通过单指标异常判定***异常往往会造成过多的误告。
多指标运维时序检测,可挖掘***各组成部分的相互作用关系,以指标间拓扑关系信息和指标自身监控数据为输入构成***级的表征信息,可以更全面地了解整个运行***服务。
相关技术中,对于多维指标异常检测常直接利用生成对抗网络(Generative Adversarial Network,GAN)等深度生成模型,以原始多指标数据为模型整体输入,基于概率密度估计和生成样本构建编码器和解码器计算重构误差,以重构误差判别异常。
可以看出,基于深度生成模型的多维运维时序指标检测存在以下问题:
1、深度生成模型基于原始输入服从某一分布的假设,以整体原始数据为输入,对于高维的随机向量难以直接建模。相关算法需增加分布近似和条件独立性来简化模型,这样就会弱化原有数据表征效果,最终导致模型鲁棒性、通用性不足。
2、深度生成模型仅输出重构误差,以此判别***异常。在算法上解析性较弱,难以基于原有模型输出进行异常关联分析与定位。
3、深度生成模型利用深度神经网络可近似任意函数的能力来建模编码器和解码器的复杂分布,其模型结构复杂,算法复杂度高。在现有数据规模庞大,数据流速快的运维异常检测中,检测耗时较长,难以满足实时异常检测需求。
本申请的该实施例具有以下特点:
1、以运维指标为图对象异常检测节点,异常检测转换为图对象挖掘。将多维运维指标数据进行横向分隔(单一时刻不同运维指标)和纵向分隔处理(单一指标不同时刻运维序列),作为图结构对象的节点,基于GAT挖掘指标数据时序内部特征和时序间依赖关系。分隔处理能够从时序依赖和指标类型角度保留原有高维表征。GAT无需事前知悉输入指标间的先验关系和异常结构,学习指标间的时间依赖和异常关联,可动态识别出不同场景的异常并适应场景的变换。本方法通过分隔处理及GAT图注意力层的引 入,既保留原数据的高维特征,也适配不同异常类型。由于相关技术中往往以原始数据整体输入,没进行数据分隔等预处理,高维特征表征效果差;同时需基于输入分布假设,模型通用性差。所以,本申请的该实施例可以有效解决现有方案中高维数据表征效果弱化,模型通用性差等问题。
2、基于图注意力网络构建异常检测模块,分析学习各指标异常关联和时间依赖性关系,解决现有深度学习方法仅对指标内部时间连续性模式进行挖掘,异常关联分析不足问题。同时基于全连接预测模型和VAE重构模型联合训练,获取各指标的异常分数协助定位。由于相关技术中仅输出整体异常分数,对各指标异常没关联分析,异常定位缺失;所以,本申请的该实施例可以解决现有方案中异常检测故障定位缺失和异常定位关联方法单一的问题。
3、基于GRU网络挖掘序列数据中的长期依赖性,捕获图注意力层输出时间序列中的顺序模式。GRU使用更新门合并长短期记忆网络(Long Short-Term Memory,LSTM)中的遗忘门和输入门,新增重置门,删除输出门。在模型构建上结构简单,计算量和参数量较小,具有较好的收敛性。可以看出:本申请的该实施例可以优化整体模型结构,对于长期依赖挖掘层,引入结构简单的GRU模块,对LSTM原网络结构的优化和缩减,降低整体模块的训练量。提高了实时检测速度。
对相关技术进行分析,对于多维时间序列的拆分挖掘,相关技术中的异常检测方法是将多维运维时序指标进行横向分隔,转换为多个单维时间序列,针对各维度时序指标特点使用对应领域的异常检测算法。根据不同场景和监控指标类型基于历史运维专家经验,构建不同异常检测规则和算法,开发及后期维护成本高,需对业务有深刻理解。且其忽略了指标间的交叉关系和异常检测的整体特征,无法垂直分析各维度指标间的关联,即拆分挖掘可挖掘各指标的异常状态,但无法实现交叉分析;监控规则设定片面且误告率高。
对于多维时间序列的整体挖掘,相关技术中的异常检测方法是单独使用生成对抗式网络(GAN)等深度生成模型,以多维时序指标为整体输入生成模型并重构其输出,判定重构概率或重构误差的输入是否异常数据。基于多维时序指标的整体挖掘,仅通过重构输出判定***状态,并没有明确地挖掘不同时间序列之间的关系;即整体挖掘,可挖掘多维指标整体异常状态,但对应异常状态的故障定位无法提供各指标的异常作用影响。相关技术中的异常检测方式无法获取时序序列直接的潜在相互关系,难以分析各指标对应异常的作用影响,不利于后续故障定位和问题修复。
下面,对本申请实施例提供的处理方式进行详细说明。
如图7所示,该过程可以包括离线模块71的处理和实时模块72的处理。
离线模块71(也可以称为离线训练模块)的处理过程(相当于训练过程)可以包括但不限于下述S711至S719。
S711、获取历史多维的运维指标数据;
S712、数据预处理;
S713、面向指标间特征的图注意力层(相当于第一GAT层)处理;
S714、面向指标内时序的图注意力层(相当于第二GAT层)处理;
S715、多头注意力层(相当于拼接层)拼接处理;
S716、GRU层挖掘长期时序依赖特征;
S717、基于全连接层的预测模块处理;
S718、基于VAE的重构模块(相当于VAE层)处理;
S719、联合异常检测判别模块处理。
具体来说,离线模块71用于:训练历史多维运维指标数据,经数据预处理和数据分隔后输入两个并行的图注意力层(GAT),捕获多维指标间特征以及单一指标时序内时间戳的关系。通过学习多变量运维指标序列在时间和特征维度上的复杂依赖关系,作为GRU层输入捕获时间序列的模式特征,通过各种门函数挖掘时间序列中相对较长间隔和延迟等的时序变化规律,提取时间维度的依赖信息。最后结合预测和重构的异常检测,联合其目标函数为优化对象,分析图注意力层学习的多个时间序列的注意力分数作为异常根因判断。
实时模块72(也可以称为实时检测模块或者实时异常检测模块)的处理过程(相当于检测过程)可 以包括但不限于下述S721至S725。
S721、获取实时多维运维指标数据(相当于第一时段的检测数据);
S722、多维异常检测模型(相当于目标检测模型)加载;
S723、计算异常检测分数判别异常;
S724、运维核实异常检测准确性;
S725、基于特征异常分数定位异常。
具体来说,实时模块72用于:采集实时运维指标,加载多维异常检测模型检测,计算异常检测分数判断***健康状况,反馈给运维人员核实告警准确性。运维人员根据告警内容确认异常状态,通过每一特征的异常分数定位异常原因。
下面,对离线训练模块的处理过程进行详细说明。
该过程可以包括但不限于:
步骤1、数据预处理;
步骤2、基于GAT构建图注意力网络;
步骤3、基于GRU挖掘长期时序数据;
步骤4、基于预测和基于重构联合异常检测模型构建。
下面,对步骤1数据预处理的过程进行说明。
数据预处理部分主要包括:数据归一化和数据截取。其中,数据归一化可以减少模型运算量,指标数据截取则可以将归一化后的多维数据转换为非监督模型训练所需的数据形态。
具体处理流程可以包括但不限于下述步骤11和步骤12。
步骤11、数据归一化。
为提高健壮性和训练收敛速度,需对各原始指标序列(也可以称为原始数据)执行数据规范化。数据归一化同时应用于训练集和测试集。原始数据如图8所示,从图8可以看出:在原始数据中,各指标类型的最大值和最小值未知,且各类指标的量钢尺度不一。
本申请的实施例采用最大-最小(MAX-MIN)归一方法对原始数据进行处理,并将数据归一化后原始指标将数据值映射到[0,1]之间,其具体处理结果及方式可以参考下述公式1。
Figure PCTCN2022120467-appb-000004
在公式1中,x′为数据归一化后输出,x x为原始序列数值,max为原始序列最大值,min为原始序列最小值。
其中,归一化后的数据可以如图9所示。
步骤12、指标数据截取。
运维指标数据异常事件往往并非发生在固定时间点,而是在某一时间段内的随机时间点发生。多维运维指标异常检测需对原始序列进行截取,将天文级别数据转换为多样本多维指标矩阵,以符合非监控学习模型输入形态。
基于日常运维经验,经数据归一化的数据使用长度n为90的滑动窗口,长度为50的滑动步长d,生成固定长度序列的输入检测模型,能够较好的反馈***异常。
因此,本申请的主要任务可总结为多维指标异常监控模块输入序列X∈R n*k*m,产生二维输出向量y∈{0,1}。
示例性的,k为运维指标自身维度即指标类型,在本示例中k为7。m为重组样本数,及训练集样本数。对于运维指标数据为分钟级采集,即单天原始序列长度为1440,重组样本数
Figure PCTCN2022120467-appb-000005
Figure PCTCN2022120467-appb-000006
最终预处理后的矩阵为R 90*7*27;其中,如图10所示,截取后的数据为GAT图注意力层输入为X∈R 90*7*27高维数据矩阵。
下面,对步骤2基于GAT构建图注意力网络的过程进行说明。
在日常业务运维中,在服务调用链路的各环节都部署了许多监控项,这些指标以复杂的非线性方式相关联。随着监控项的增多,指标类型复杂性和维度的不断增长,运维人员难以快速检测高维数据中的 异常,分析其中异常关联。本申请的该实施例以各项指标为图对象节点,基于GAT构建图注意力层学习各指标节点间结构特征和节点自身特征,引入注意力机制对邻近节点分配不同权重,减少非关键指标的关注度。通过GAT各节点特征加权求和,实现指标关联重要度的区分。
其中,步骤2可以包括:
步骤21、输入数据矩阵拆分;
步骤22、图注意力层通用构建。
下面,对步骤21输入数据矩阵拆分的过程进行说明。
本申请的该实施例基于GAT构建图注意力网络,需对数据预处理后的多维指标时序进行横向分隔和纵向分隔。分隔数据时,可从时序依赖和指标类型两大维度进行处理;从而保留原有高维数据的表征形态。其中,分隔后的数据可直接用于图注意力层的建模。横向分隔可以将预处理后的数据分隔为单一时刻不同运维指标序列;纵向分隔可以将预处理后的数据分隔为单一指标不同时刻运维序列。
其中,横向分隔后的单一时刻不同运维指标序列可以作为图对象节点,搭建面向指标内时序依赖的图注意力层;纵向分隔后的单一指标不同时刻运维序列可以作为图对象节点,搭建面向指标间特征关系的图注意层。
示例性的,经纵向分隔后的数据矩阵输入维度为n×m×k=90×27×7(相当于第一训练数据集),具体结构如图11所示。
示例性的,经横向分隔后的数据矩阵输入维度为m×k×n=27×7×90(相当于第二训练数据集),具体结构如图12所示。
下面,对步骤22图注意力层通用构建的过程进行说明。
首先,对GAT层输出特征的通用处理过程进行说明。
面向指标间特征关系的图注意力层与面向指标内时序依赖的图注意力层,均基于通用的GAT层处理构建。通过控制不同输入,分别动态学习时间序列特征和时间依赖之间的关系。
其中,GAT将注意力机制attention引入图神经网络,能够在更新图对象中某一个节点的嵌入(Embedding)特征完成降维时,图上的每一个顶点i都参与到attention运算。GAT能够对图中的节点关系进行建模,对于k个节点特征的图对象{v 1,v 2,v 3....v k},其中v i为节点i自身的特征向量。GAT层的每个节点i输出特征h i可以通过公式2表示。
Figure PCTCN2022120467-appb-000007
在公式2中,a ij为节点i和节点j之间的注意力系数;σ为激活函数,例如为sigmoid函数。
下面,对输出特征h i的具体计算过程进行详细说明。该过程可以包括但不限于下述第一至第三的描述。
第一、计算各节点间相似系数(attention)。
以节点i和邻居节点j间的相似系数e ij的计算过程为例进行说明。核心原理包括:通过训练数据学习合适的W(模型共享的学习权重),有效捕获节点i,j之间的相关性。相似系数e ij的计算过程主要包括但不限于下述1)至3)。
Figure PCTCN2022120467-appb-000008
1)、节点特征数据增强。
Figure PCTCN2022120467-appb-000009
分别为节点i和节点j的特征向量表示,其维度均为1×F,F表示输入节点的维度,即经数据分隔后的输入维度。其中,对于面向指标内时序依赖的图注意力层,F维度为27×7;对于面向指标间特征关系的图注意力,F维度为90×27。W为模型共享的学习权重,主要对
Figure PCTCN2022120467-appb-000010
特征进行数据增强,其维度为F×F′,F′为该GAT层输出节点的维度,因此
Figure PCTCN2022120467-appb-000011
Figure PCTCN2022120467-appb-000012
维度均为(1×F)×(F×F′)=1×F′。
本申请的该实施例可以基于常用的numpy矩阵处理python库,使用linalg.eig方法计算各节点的特征向量。numpy库适用大数据量高维指标处理,简化计算过程,提供模型实时检测能力。
2)、节点特征拼接。
(×||×)运算为对变换后的特征进行拼接,拼接后维度为1×2F′。
3)、非线性化激活生成相似系数。
a(.)函数表示参数是a →T的单层前馈神经网络,一般使用LeakyRelu作为非线性激活函数。a →T为注意力核(attention kernel),其维度为2F′×1,主要作用是把拼接后的特征映射到实数上,即
Figure PCTCN2022120467-appb-000013
获得最终的节点相似系数
Figure PCTCN2022120467-appb-000014
第二、对相似系数正则化得到注意力系数a ij
因各节点特征量纲不一致,为了使相似系数在不同节点之间易于比较,在注意力系数加权求和前,需对相似系数进行归一化。如图13所示,该实施例使用归一化指数函数(softmax)计算注意力系数a ij,即a ij=softmax(e ij)。
aij的具体计算方式可以如下述公式3所示。
Figure PCTCN2022120467-appb-000015
在公式3中,LeakyRelu为非线性函数,其函数表达式为y=max(0,x)+leak×min(0,x)。leak是常数,用于保留负半轴信息。本方法使用LeakyRelu作为激活函数计算注意力系数,主要原因是避免GAT在聚合邻居信息前,对各邻居节点归一化处理过程中将节点本身信息去除。
第三、注意力系数加权求和获得输出特征。
将归一化的注意力系数与其对应的特征进行加权,以作为每个节点的最终输出特征h′ i,获取每个节点i融合了领域新的特征h′ i。具体计算公式如下,其中a ij为指标节点i与节点j之间的注意力系数。
Figure PCTCN2022120467-appb-000016
在公式4中,v j为节点j的自身特征,具体的,对于面向指标间特征的图注意力层,其v j为某一时刻不同类型运维指标的具体数值;对于面向指标内时序依赖的图注意力层,其v j为某一运维指标不同时刻的具体数值。
其次,对图注意力层输出结构进行说明。
对于面向指标间特征的图注意力层,该实施例对原始序列数据矩阵进行纵向拆分,将多维时间序列视为一个完整的图对象,每个节点表示某个指标特征,每条边表示两个对应特征之间的关系。面向指标间特征的图注意力层输入维度为k×n,每个节点对应的特征向量为x i={x i,t|t∈[0,n]},节点间(各指标间)相互关系可以通过图注意力机制学习注意力分数a ij衡量,通过加权求和获得图注意力层输出h′ i
示例性的,对于特征v 1来说,其对应的h 1(相当于第一特征)的确定过程可以如图14所示。
对于面向指标内时序依赖的图注意力层,该实施例对原始序列数据矩阵进行横向拆分,将滑动窗口内的各时刻运维指标特征视为一个完整的图对象,以滑动窗口内每个时间戳作为节点,节点x t为t时刻全部运维指标的特征向量,每条边表示不同时刻之间指标数值的关系,通过GAT注意力层挖掘时间序列的时序依赖关系。面向指标内时序依赖的图注意力层输入维度为n×k,节点间(各时刻间)相互关系可以通过图注意力机制学习注意力分数a ij衡量,通过加权求和获得图注意力层输出h′ i
示例性的,对于特征x 1来说,其对应的h 1(相当于第二特征)的确定过程可以如图15所示。
再次,对多头注意力拓展机制(相当于拼接层)的处理过程进行说明。
参考卷积神经网络(Convolutional Neural Network,CNN)中对于每一层特征图设置多个相互独立的卷积核,使得输出特征图具有更多渠道(channel)表征原始数据。如图16所示,本申请的该实施例引入多头注意力拓展使整体模型,将面向特征的图注意力层的输出,面向时序的图注意力层的输出,以及经数据预处理后的数据拼接为n×3k×m的矩阵中,即每一行代表每个时间戳具有3k维度特征向量。
其主要优化效果可以包括但不限于下述第一点和第二点。
第一点、基于数据拆分指定挖掘模式。
对于高度非线性的复杂多维指标数据,内部包含多种潜在模式。深度生成模型基于整体异常检测,并未显式指定学习哪些指标及时序的潜在相互关系。如GAN将其编码器优化问题转换为通过近似地模拟真实数据分布,使得生成样本分布与真实数据分布之间的KL散度最小化。在建模输入中GAN需假设真实数据符合某种分布,通过编码器及解码器联合学习生成某些模式样本。
本申请的该实施例利用数据拆分的预处理,基于运维前验模式指标间和时序内两大模式,将非线性挖掘显式转换为基于图注意力的结构挖掘,在运维异常检测重点关注的指标特征和时序依赖上表征效果更为显著及准确。
第二点、基于多头注意力融合信息。
对于拆分后的图注意力模式挖掘,本申请的该实施例基于多头注意力机制融合信息来自不同的来源作为GRU的输入。引入原数据可弥补数据拆分导致的长期时序模式弱化,减少原有高维数据表征的丢失。具体融合过程可以如图17所示,由图17所示的内容可以看出,多头注意力拓展后,GRU输入新增维度k,但训练迭代batch不变(m=27),对模型的整体训练性能影响较小。
下面,对步骤3基于GRU(相当于GRU层)挖掘长期时序数据的过程进行说明。
通过多头注意力拓展机制,融合了不同来源特征作为GRU层输入,输入维度为n×3k×m,其中包含冗余的短期干扰信息。为捕获时间序列中的长距离时序数据模式并保留重要特征传递到后续的联合异常检测层,本申请的该实施例使用GRU捕捉长期序列间的相互依赖性,提取时间维度的依赖信息。GRU网络结构简单,挖掘中长期时序数据,输入异常检测模块,避免毛刺等短期序列干扰异常检测模块。
步骤3具体可以包括但不限于下述步骤31和步骤32。
步骤31、GRU的网络结构构建。
梯度***和梯度消失问题。GRU的网络结构如图17所示。
由图15所示的GRU网络结构可以看出:t时刻输入为当前时刻网络输入值x t,上一时刻GRU层输出隐状态为h t-1,当前时刻候选记忆内容隐藏层为h′ t。通过上一时刻隐状态h t-1和当前时刻网络输入值x t获取门控状态,利用门函数完成信息的传递,重置及更新。其中r为控制信息重置的门控,z为控制信息更新的门控。更新门控制上一时刻的状态信息保留到当前状态的程度,避免带入所有历史信息造成梯度消失,更新门的值越大说明上一时刻的状态信息带入越多。重置门控制上一状态哪些信息需被遗忘,哪些信息需写入到当前的候选隐藏层h′ t上,重置门越小,前一状态的信息被写入的越少。
步骤32、GRU的网络模型构建。
本申请的该实施例基于常用的神经网络模块pytorch构建GRU中长周期网络模型,调用torch.nn.gru方法接受多头注意力拓展后的三维序列张量,模型训练中GRU网络构建过程可以包括但不限于:GRU网络层构建、GRU网络层输入以及GRU网络层输出。
对于GRU网络层构建,本申请的该实施例调用gru(input_dim,output_dim,layer_num)搭建GRU网络层,其中input_dim表示输入特征维度,例如input_dim=3k=21;output_dim表示隐藏层维度即最终输出维度,例如output_dim=10;layer_num表示GRU网络层数,例如,网络层数layer_num=2。
对于GRU网络层输入,GRU层输入为三维序列X[seq_len,batch_size,input_dim],其中,seq_len表示序列长度,例如seq_len=n=90,batch_size表示重组样本数,例如batch_size=m=27。output_dim表示最终输出,例如,output_dim=h′ t=nn.gru(X)。可以看出:GRU网络以整体多维数据为输入,基于batch_size个独立的GRU训练组件,挖掘seq_len个时间步长的中长周期序列,每个时间步输入到GRU模块的维度为[batch_size,input_dim]。
对于GRU网络层输出,GRU层输出output_dim为包括中长周期依赖的三维时间序列[seq_len,batch_size,output_size],例如,如图18所示,[seq_len,batch_size,output_size]=[90×27×10]。隐藏层h′输出维度为[num_layer,batch_siz,hidden_size]=[2×27×10],根据上述GRU网络可知,t时刻隐藏层输出h′将作为t+1时刻的GRU模块输入,用于计算下一时刻的输出output_dim。
下面,对步骤4基于预测和基于重构联合异常检测模型构建的过程进行说明。
经GRU挖掘后的长期时序特征,能够很好地表征原始多维指标序列,将其输入异常检测模块。基于预测的异常检测常用于时间粒度的运维指标特征提取,进行未来时刻预测及异常检测;基于重构的异常检测擅长捕获时间序列的整体数据分布。因此,本申请的该实施例基于多场景适配优化,GRU挖掘的中长周期时序依赖数据并行输入,将两者结合为整体异常检测模型进行联合优化,异常检测模块的整体结构如下图。在异常检测模块训练类似黑盒训练,模型目标函数定义为两者损失函数之和,即LOSS =LOSS 预测+LOSS 重构,两个模块参数同步更新。
如图19所示,异常检测模块的训练及异常判别处理流程可以包括但不限于:基于预测的异常检测模块的处理和基于重构的异常检测模块的处理。
下面,对基于预测的异常检测模块(相当于预测层)的处理过程进行说明。
基于图注意力层及GRU对原始多维指标数据进行处理,在预测模块上本申请的该实施例使用结构简单且训练开销较小的全连接层。
通过堆积3个隐藏层维度为d的全连接层,对下一时刻指标值进行预测。其中使用均方差函数为损失函数LOSS 预测,其中x n,i为第n时刻第i个指标变量实际值,x` n,i为第n时刻第i个指标变量预测值,利用欧式距离作为相似度衡量。损失函数可以通过公式进行计算得到,其优化过程同常用全连接网络优化过程。
Figure PCTCN2022120467-appb-000017
在公式5中,LOSS 预测表示预测的异常检测模块的损失函数;x n,i表示第n时刻第i个指标变量实际值;x` n,i表示第n时刻第i个指标变量预测值。
下面,对基于重构的异常检测模块(相当于VAE层)的处理过程进行说明。
基于重构的异常检测模块主要学习序列数据的潜在特征,本申请的该实施例基于自编码器改进的变分自编码器VAE搭建重构异常检测模块。VAE利用神经网络来分别建模编码器(Encoder)和解码器(Decoder)。编码器将输入序列x映射为不同维度之间相互独立的低维多元高斯分布
Figure PCTCN2022120467-appb-000018
学习其中隐含特征z并生成隐空间层。解码器根据编码器捕获的隐含特征重构出原始输入数据的概率分布p θ(x|z)。将原始序列分布和重构输出分布之间的差异被称为重构概率,判定具有较小重构概率的为异常序列。
对于VAE目标损失函数的确定过程可以包括:
VAE目标为解码模块重构分布p θ(x)与原始序列分布近似,经编码及解码模块处理后最终目标损失函数可以如公式6所示。
Figure PCTCN2022120467-appb-000019
在公式6中,令
Figure PCTCN2022120467-appb-000020
为变分证据下界
Figure PCTCN2022120467-appb-000021
可见VAE目标损失函数=编码器分布间的KL散度–变分证据下界,
Figure PCTCN2022120467-appb-000022
表示编码器输出到隐空间中的低维分布的数据期望。因KL散度非负,所以可以将目标函数的优化转换为变分证据下界的最大化拟合。
对于VAE目标函数优化过程可以包括:
VAE目标函数优化过程基于常见的神经网络近似拟合和蒙特卡洛方法估计,变分证据下界
Figure PCTCN2022120467-appb-000023
对于L 1项神经网络近似拟合,可以通过下述公式7得到。
Figure PCTCN2022120467-appb-000024
在公式7中,VAE设定隐变量z为简单的先验分布,以便对目标函数进行优化求解。设定p θ(z)为标准正态分布N(0,I),近似后验分布
Figure PCTCN2022120467-appb-000025
服从均值为μ I,方差为
Figure PCTCN2022120467-appb-000026
的正态分布
Figure PCTCN2022120467-appb-000027
并使用神经网络拟合。
对于L 2蒙特卡洛方法估计,可以通过下述公式8得到。
Figure PCTCN2022120467-appb-000028
在公式8中,因L 2梯度方差很大,对于复杂积分的期望问题,常采用蒙特卡洛方法,利用蒙特卡洛采样
Figure PCTCN2022120467-appb-000029
即L 2可以通过下述公式9得到。
Figure PCTCN2022120467-appb-000030
下面,对实时异常检测模块的处理过程进行说明。
本申请的该实施例的实时异常检测模块将加载异步训练的联合异常检测模型,输入实时数据对于某一时刻异常检测模型分别输出基于预测模块的预测值{x′ i|i=1,2,…k}以及基于重构模块的重构概率{p′ i|i=1,2....k}。基于预测模块以确定性的方式预测下一个时间戳的实际值,对于时间序列的随机性挖掘敏感。基于重构模块以随机变量的形式捕捉全局数据分布,忽略破坏时间序列中周期性的噪音。
其中,实时异常检测模块具体处理过程如图20所示。
该过程可以包括但不限于异常分数的计算及判定,异常核实及定位分析。
对于异常分数计算及判定,本申请的该实施例综合考虑每个指标特征的异常表现,结合两种异常模块的输出,计算每一指标特征的异常分数s i并求和作为最终异常判别分数Score,若最终异常判别分数判定结果为大于某个阈值,则确定为异常。
异常判断分数计算可以参考下述公式10。
Figure PCTCN2022120467-appb-000031
在公式10中,Score表示最终异常判别分数,γ为超参数,主要用于组合使用基于预测和基于重建的概率,其数值通过训练集进行网格搜索最优参数。
示例性的,本申请该实施例中,若设定联合异常检测模型超参数γ=0.8,异常阈值为1;因基于训练数据γ=0.8时异常检测模型整体召回率和精度性能表现最好。对于T 1时刻的实时指标输入,加载离线异步训练的异常检测模型,输出T 1时刻各指标的预测值和重构概率值,通过公式10计算各指标的异常分数s i,求和获得最终异常分数Score=1.428525,由于1.428525大于阈值1,则判定T 1时刻***出现异常。
对于异常核实及定位分析,多维异常检测模块判定***异常后,进行处理核实。若核实误报,可调整超参数γ及异常阈值,平衡基于预测和基于重构检测比例和异常模块的判别敏感度。若核实为***异常,可根据各指标异常分数快速定位,关联分析。如图20所示,***异常表现于业务交易耗时上涨,耗时指标异常分数S 耗时=0.437159,同时S CPU和S 内存和与耗时异常分数接近且明显大于其他指标的异常分数,可诊断业务交易耗时异常的原因为***CPU和内存异常。可以看出,本申请的该实施例在检测多变量时间序列中的异常外,还提供异常诊断的相关性分析功能。
本申请的该实施例具有以下技术效果:
第一、本申请的该实施例主要由图注意力模块,GRU模块,联合异常检测模块三大核心部分组成。多维运维指标数据经数据截取和分隔等预处理后,基于面向指标间特征和指标内时序的挖掘学习,利用预测和重构进行联合异常检测。
第二、本申请的该实施例基于GAT引入注意力机制搭建图神经网络,对原始多维指标序列进行横向和纵向拆分,构成图对象中的节点。该实施例能够保留原有高维数据形态,基于GAT直接搭建图注意力层。在无图对象先验结构(即指标间相互关系)的情况下,利用注意力机制为不同指标分配权重,模型动态学习时序依赖和特征类型中的潜在异常关联关系。
第三、本申请的该实施例基于GRU挖掘图神经网络输出中的相对较长间隔和延迟等的时序变化规律,避免指标毛刺对整体异常检测模型的影响。同时利用GRU结构简单等特点,加速模型的收敛,简化模型结构,提高异常检测的的实时处理效率。
第四、本申请的该实施例基于预测和VAT重构的联合异常检测,能够适合多种类型和场景的运维指标,联合判别异常能够提供检测模型的健壮性。通过提供各指标分类的异常分数,弥补现有多维检测方案中的异常关联分析缺失问题。
第二方面,为实现上述数据处理方法,本申请实施例的一种数据处理装置,下面结合图21所示的数据处理装置的结构示意图进行说明。
如图21所示,数据处理装置210包括:获得单元2101、第一预处理单元2102、第二预处理单元2103和训练单元2104。其中:
获得单元2101,配置为获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;
第一预处理单元2102,配置为通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;
第二预处理单元2103,配置为通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练 数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;
训练单元2104,配置为基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
在一些实施例中,所述异常检测模型还包括:第一图神经网络GAT层、第二GAT层以及预测层,训练单元2104具体配置为:
将所述第一训练数据集中的k个第一样本输入所述第一GAT层,通过所述第一GAT层的处理得到k×n个第一特征;一个所述第一特征用于表征所述第一特征对应的指标的取值与k-1个指标的取值之间的关系;
将所述第二训练数据集中的n个第二样本输入所述第二GAT层,通过所述第二GAT层的处理得到n×k个第二特征;一个所述第二特征用于表征所述第二特征对应的时刻的取值与n-1个时刻的取值之间的关系;
至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集;所述预测数据集为n×s×m的矩阵;所述s大于或等于所述k;
将所述预测数据集输入至所述预测层,通过所述预测层的处理得到第一预测结果;
基于所述第一预测结果调整所述异常检测模型中的参数,得到目标检测模型。
在一些实施例中,针对一个所述第一样本,训练单元2104还配置为:
将所述第一样本的每一行作为一个第一节点,得到n个第一节点;
针对所述n个第一节点中的每个所述第一节点,执行第一处理,得到所述第一节点对应的第一特征;
其中,所述第一处理包括:确定所述第一节点分别与所述n个第一节点之间的相似系数,得到所述n个相似系数;
将所述n个相似系数转换为n个注意力系数;
基于所述n个节点对应的数据和所述n个注意力系数确定所述第一节点对应的第一特征。
在一些实施例中,所述预测层包括全连接层和变分自编码器VAE层,所述预测数据集包括m个预测样本,训练单元2104还配置为:
将所述m个预测样本中的每个所述预测样本输入至所述全连接层,通过所述全连接层的处理得到m组预测数值;
其中,针对每个所述预测样本,通过所述全连接层的处理后得到一组所述预测数值;一组所述预测数值包括:所述k个指标在下一时刻的预测取值;所述下一时刻为所述预测样本对应的时段的下一时刻;
将所述m个预测样本中的每个所述预测样本输入至所述VAE层,通过所述VAE层的处理得到m组重构概率;
其中,针对每个所述预测样本,通过所述VAE层的处理后得到一组所述重构概率;一组所述重构概率包括:所述k个指标在下一时刻的重构概率;
确定所述第一预测结果包括:所述m组预测数值和所述m组重构概率。
在一些实施例中,在所述预测层包括全连接层和变分自编码器VAE层,所述第一预测结果包括所述m组预测数值和所述m组重构概率的情况下,训练单元2104还配置为:
确定所述全连接层对应的第一损失函数,以及所述VAE层对应的第二损失函数;所述第一损失函数与所述第二损失函数不同;
至少基于所述m组预测数值、所述m组重构概率、所述第一损失函数和所述第二损失函数,确定m个目标损失;其中,针对一组预测数值和所述一组预测数值对应的所述重构概率,确定一个所述目标损失;
基于所述m个目标损失调整所述异常检测模型中的参数,得到所述目标检测模型。
在一些实施例中,所述异常检测模型还包括拼接层,训练单元2104还配置为:
将所述k×n个第一特征、所述n×k个第二特征以及所述预训练数据集输入至所述拼接层,通过所述拼接层的处理得到拼接数据;其中,所述预测数据集为所述拼接数据;所述s等于3倍的所述k。
在一些实施例中,所述异常检测模型还包括:拼接层和门控循环单元GRU层,训练单元2104还配 置为:
将所述k×n个第一特征、所述n×k个第二特征以及预训练数据集输入至所述拼接层,通过所述拼接层的处理得到拼接数据;
将所述拼接数据输入至所述GRU层,通过所述GRU层对所述拼接数据中,指标维度的干扰进行过滤,得到所述预测数据集;所述s小于3倍的所述k。
在一些实施例中,数据处理装置210还可以包括预测单元,预测单元配置为:在所述基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型之后,执行:
获取第一时段内k个指标的检测数据;所述第一时段为任一时段;
将所述检测数据输入至所述目标检测模型,通过所述目标检测模型的处理得到所述k个指标在第二时刻的检测参数值;所述第二时刻为所述第一时段的下一时刻;
基于所述k个指标在第二时刻的检测参数值,确定所述检测数据对应的总分数;
在所述总分数大于或等于分数阈值的情况下,确定所述检测数据异常。
在一些实施例中,在所述检测参数包括重构概率和预测数值,预测单元还配置为:
通过第一公式确定所述检测数据对应的总分数;
所述第一公式包括:
Figure PCTCN2022120467-appb-000032
其中,所述Score表示所述检测数据对应的总分数,x i表示第i个指标在所述第二时刻的实际取值;x′ i表示第i个指标在所述第二时刻的预测值;所述p′ i表示第i个指标在所述第二时刻的重构概率值;所述γ表示预设系数。
在一些实施例中,数据处理装置210还可以包括定位单元,定位单元配置为:
确定所述k个指标在所述第二时刻的异常分数;
基于所述k个指标在所述第二时刻的异常分数,进行异常定位。
需要说明的是,本申请实施例提供的数据处理装置包括所包括的各单元,可以通过电子设备中的处理器来实现;当然也可通过具体的逻辑电路实现;在实施的过程中,处理器可以为中央处理器(CPU,Central Processing Unit)、微处理器(MPU,Micro Processor Unit)、数字信号处理器(DSP,Digital Signal Processor)或现场可编程门阵列(FPGA,Field-Programmable Gate Array)等。
以上装置实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请装置实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
需要说明的是,本申请实施例中,如果以软件功能模块的形式实现上述的数据处理方法,并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。这样,本申请实施例不限制于任何特定的硬件和软件结合。
第三方面,为实现上述数据处理方法,本申请实施例提供一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述实施例中提供的数据处理方法中的步骤。
下面结合图22所示的电子设备220,对电子设备的结构图进行说明。
在一示例中,电子设备220可以为上述电子设备。如图22所示,所述电子设备220包括:一个处理器2201、至少一个通信总线2202、用户接口2203、至少一个外部通信接口2204和存储器2205。其中,通信总线2202配置为实现这些组件之间的连接通信。其中,用户接口2203可以包括显示屏,外部通信接口2204可以包括标准的有线接口和无线接口。
存储器2205配置为存储由处理器2201可执行的指令和应用,还可以缓存待处理器2201以及电子设备中各模块待处理或已经处理的数据(例如,图像数据、音频数据、语音通信数据和视频通信数据),可 以通过闪存(FLASH)或随机访问存储器(Random Access Memory,RAM)实现。
第四方面,本申请实施例提供一种存储介质,也就是计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述实施例中提供的数据处理方法中的步骤。
这里需要指出的是:以上存储介质和设备实施例的描述,与上述方法实施例的描述是类似的,具有同方法实施例相似的有益效果。对于本申请存储介质和设备实施例中未披露的技术细节,请参照本申请方法实施例的描述而理解。
应理解,说明书通篇中提到的“一个实施例”或“一实施例”意味着与实施例有关的特定特征、结构或特性包括在本申请的至少一个实施例中。因此,在整个说明书各处出现的“在一个实施例中”或“在一些实施例中”未必一定指相同的实施例。此外,这些特定的特征、结构或特性可以任意适合的方式结合在一个或多个实施例中。应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。
在本申请所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。以上所描述的设备实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,如:多个单元或组件可以结合,或可以集成到另一个***,或一些特征可以忽略,或不执行。另外,所显示或讨论的各组成部分相互之间的耦合、或直接耦合、或通信连接可以是通过一些接口,设备或单元的间接耦合或通信连接,可以是电性的、机械的或其它形式的。
上述作为分离部件说明的单元可以是、或也可以不是物理上分开的,作为单元显示的部件可以是、或也可以不是物理单元;既可以位于一个地方,也可以分布到多个网络单元上;可以根据实际的需要选择其中的部分或全部单元来实现本实施例方案的目的。
另外,在本申请各实施例中的各功能单元可以全部集成在一个处理单元中,也可以是各单元分别单独作为一个单元,也可以两个或两个以上单元集成在一个单元中;上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、只读存储器(Read Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的介质。
或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对相关技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (12)

  1. 一种数据处理方法,所述方法包括:
    获得预训练数据集;所述预训练数据集为n×k×m的矩阵,其中,所述m表示所述预训练数据集对应的时段的数量,所述k表示所述训练数据集对应的指标的数量,一个所述时段包括n个时刻;所述n大于1,所述k大于1,所述m大于1;
    通过异常检测模型的预处理层对所述预训练数据集进行纵向分割,得到第一训练数据集;所述第一训练数据集为n×m×k的矩阵;所述k表示所述第一训练数据集包括的第一样本的数量,一个所述第一样本用于表征一个指标在n×m个时间维度内的取值;
    通过所述预处理层对所述预训练数据集进行横向分割,得到第二训练数据集;所述第二训练数据集为m×k×n的矩阵;所述n表示所述第二训练数据集包括的第二样本的数量,一个所述第二样本用于表征一个时刻对应的k个指标在m个时间段的取值;
    基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型;所述目标检测模型用于确定一个时间段内的检测数据的检测参数;所述检测参数用于确定所述检测数据是否异常。
  2. 根据权利要求1所述的方法,所述异常检测模型还包括:第一图神经网络GAT层、第二GAT层以及预测层,所述基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型,包括:
    将所述第一训练数据集中的k个第一样本输入所述第一GAT层,通过所述第一GAT层的处理得到k×n个第一特征;一个所述第一特征用于表征所述第一特征对应的指标的取值与k-1个指标的取值之间的关系;
    将所述第二训练数据集中的n个第二样本输入所述第二GAT层,通过所述第二GAT层的处理得到n×k个第二特征;一个所述第二特征用于表征所述第二特征对应的时刻的取值与n-1个时刻的取值之间的关系;
    至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集;所述预测数据集为n×s×m的矩阵;所述s大于或等于所述k;
    将所述预测数据集输入至所述预测层,通过所述预测层的处理得到第一预测结果;
    基于所述第一预测结果调整所述异常检测模型中的参数,得到目标检测模型。
  3. 根据权利要求2所述的方法,针对一个所述第一样本,所述将所述第一训练数据集中的k个第一样本输入所述第一GAT层,得到k×n个第一特征,包括:
    将所述第一样本的每一行作为一个第一节点,得到n个第一节点;
    针对所述n个第一节点中的每个所述第一节点,执行第一处理,得到所述第一节点对应的第一特征;
    其中,所述第一处理包括:确定所述第一节点分别与所述n个第一节点之间的相似系数,得到所述n个相似系数;
    将所述n个相似系数转换为n个注意力系数;
    基于所述n个节点对应的数据和所述n个注意力系数确定所述第一节点对应的第一特征。
  4. 根据权利要求2或3所述的方法,所述预测层包括全连接层和变分自编码器VAE层,所述预测数据集包括m个预测样本,所述将所述预测数据集输入至所述预测层,通过所述预测层的处理得到第一预测结果,包括:
    将所述m个预测样本中的每个所述预测样本输入至所述全连接层,通过所述全连接层的处理得到m组预测数值;
    其中,针对每个所述预测样本,通过所述全连接层的处理后得到一组所述预测数值;一组所述预测数值包括:所述k个指标在下一时刻的预测取值;所述下一时刻为所述预测样本对应的时段的下一时刻;
    将所述m个预测样本中的每个所述预测样本输入至所述VAE层,通过所述VAE层的处理得到m组重构概率;
    其中,针对每个所述预测样本,通过所述VAE层的处理后得到一组所述重构概率;一组所述重构概率包括:所述k个指标在下一时刻的重构概率;
    确定所述第一预测结果包括:所述m组预测数值和所述m组重构概率。
  5. 根据权利要求2-4任一项所述的方法,在所述预测层包括全连接层和变分自编码器VAE层,所述第一预测结果包括所述m组预测数值和所述m组重构概率的情况下,所述基于所述第一预测结果调整所述异常检测模型中的参数,得到目标检测模型,包括:
    确定所述全连接层对应的第一损失函数,以及所述VAE层对应的第二损失函数;所述第一损失函数与所述第二损失函数不同;
    至少基于所述m组预测数值、所述m组重构概率、所述第一损失函数和所述第二损失函数,确定m个目标损失;其中,针对一组预测数值和所述一组预测数值对应的所述重构概率,确定一个所述目标损失;
    基于所述m个目标损失调整所述异常检测模型中的参数,得到所述目标检测模型。
  6. 根据权利要求2-5任一项所述的方法,所述异常检测模型还包括拼接层,所述至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集,包括:
    将所述k×n个第一特征、所述n×k个第二特征以及所述预训练数据集输入至所述拼接层,通过所述拼接层的处理得到拼接数据;其中,所述预测数据集为所述拼接数据;所述s等于3倍的所述k。
  7. 根据权利要求2-6任一项所述的方法,所述异常检测模型还包括:拼接层和门控循环单元GRU层,所述至少将所述k×n个第一特征和所述n×k个第二特征进行拼接,得到预测数据集,包括:
    将所述k×n个第一特征、所述n×k个第二特征以及预训练数据集输入至所述拼接层,通过所述拼接层的处理得到拼接数据;
    将所述拼接数据输入至所述GRU层,通过所述GRU层对所述拼接数据中,指标维度的干扰进行过滤,得到所述预测数据集;所述s小于3倍的所述k。
  8. 根据权利要求1-7任一项所述的方法,在所述基于所述第一训练数据集和所述第二训练数据集训练异常检测模型,得到目标检测模型之后,所述方法还包括:
    获取第一时段内k个指标的检测数据;所述第一时段为任一时段;
    将所述检测数据输入至所述目标检测模型,通过所述目标检测模型的处理得到所述k个指标在第二时刻的检测参数值;所述第二时刻为所述第一时段的下一时刻;
    基于所述k个指标在第二时刻的检测参数值,确定所述检测数据对应的总分数;
    在所述总分数大于或等于分数阈值的情况下,确定所述检测数据异常。
  9. 根据权利要求8所述的方法,在所述检测参数包括重构概率和预测数值,所述基于所述k个指标在第二时刻的检测参数值,确定所述检测数据对应的总分数,包括:
    通过第一公式确定所述检测数据对应的总分数;
    所述第一公式包括:
    Figure PCTCN2022120467-appb-100001
    其中,所述Score表示所述检测数据对应的总分数,x i表示第i个指标在所述第二时刻的实际取值;x′ i表示第i个指标在所述第二时刻的预测值;所述p′ i表示第i个指标在所述第二时刻的重构概率值;所述γ表示预设系数。
  10. 根据权利要求8或9所述的方法,所述方法还包括:
    确定所述k个指标在所述第二时刻的异常分数;
    基于所述k个指标在所述第二时刻的异常分数,进行异常定位。
  11. 一种电子设备,包括存储器和处理器,所述存储器存储有可在处理器上运行的计算机程序,所述处理器执行所述程序时实现权利要求1至10任一项所述的数据处理方法。
  12. 一种存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时,实现权利要求1至10任一项所述的数据处理方法。
PCT/CN2022/120467 2022-06-29 2022-09-22 数据处理方法、装置、设备及存储介质 WO2024000852A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210760323.5A CN115063588A (zh) 2022-06-29 2022-06-29 一种数据处理方法、装置、设备及存储介质
CN202210760323.5 2022-06-29

Publications (1)

Publication Number Publication Date
WO2024000852A1 true WO2024000852A1 (zh) 2024-01-04

Family

ID=83204857

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/120467 WO2024000852A1 (zh) 2022-06-29 2022-09-22 数据处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115063588A (zh)
WO (1) WO2024000852A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117830750A (zh) * 2024-03-04 2024-04-05 青岛大学 一种基于图Transformer的机械故障预测方法
CN117952564A (zh) * 2024-03-22 2024-04-30 江西为易科技有限公司 一种基于进度预测的排程模拟优化方法及***
CN117973683A (zh) * 2024-01-29 2024-05-03 中国人民解放军军事科学院***工程研究院 基于评估知识表征的装备体系效能评估方法和装置

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115063588A (zh) * 2022-06-29 2022-09-16 深圳前海微众银行股份有限公司 一种数据处理方法、装置、设备及存储介质
CN116361635B (zh) * 2023-06-02 2023-10-10 中国科学院成都文献情报中心 一种多维时序数据异常检测方法
CN116383096B (zh) * 2023-06-06 2023-08-18 安徽思高智能科技有限公司 基于多指标时序预测的微服务***异常检测方法及装置
CN116628508B (zh) * 2023-07-20 2023-12-01 科大讯飞股份有限公司 模型训练过程异常检测方法、装置、设备及存储介质
CN117150407A (zh) * 2023-09-04 2023-12-01 国网上海市电力公司 一种工业碳排放数据的异常检测方法
CN117113259B (zh) * 2023-10-19 2023-12-22 华夏天信智能物联(大连)有限公司 用于安全隐患预测的煤矿状态数据处理方法及***
CN117093947B (zh) * 2023-10-20 2024-02-02 深圳特力自动化工程有限公司 一种发电柴油机运行异常监测方法及***

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978379A (zh) * 2019-03-28 2019-07-05 北京百度网讯科技有限公司 时序数据异常检测方法、装置、计算机设备和存储介质
US20190312898A1 (en) * 2018-04-10 2019-10-10 Cisco Technology, Inc. SPATIO-TEMPORAL ANOMALY DETECTION IN COMPUTER NETWORKS USING GRAPH CONVOLUTIONAL RECURRENT NEURAL NETWORKS (GCRNNs)
CN111708739A (zh) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 时序数据的异常检测方法、装置、电子设备及存储介质
US20210056430A1 (en) * 2019-08-23 2021-02-25 Accenture Global Solutions Limited Intelligent time-series analytic engine
CN114221790A (zh) * 2021-11-22 2022-03-22 浙江工业大学 一种基于图注意力网络的bgp异常检测方法及***
CN115063588A (zh) * 2022-06-29 2022-09-16 深圳前海微众银行股份有限公司 一种数据处理方法、装置、设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190312898A1 (en) * 2018-04-10 2019-10-10 Cisco Technology, Inc. SPATIO-TEMPORAL ANOMALY DETECTION IN COMPUTER NETWORKS USING GRAPH CONVOLUTIONAL RECURRENT NEURAL NETWORKS (GCRNNs)
CN109978379A (zh) * 2019-03-28 2019-07-05 北京百度网讯科技有限公司 时序数据异常检测方法、装置、计算机设备和存储介质
US20210056430A1 (en) * 2019-08-23 2021-02-25 Accenture Global Solutions Limited Intelligent time-series analytic engine
CN111708739A (zh) * 2020-05-21 2020-09-25 北京奇艺世纪科技有限公司 时序数据的异常检测方法、装置、电子设备及存储介质
CN114221790A (zh) * 2021-11-22 2022-03-22 浙江工业大学 一种基于图注意力网络的bgp异常检测方法及***
CN115063588A (zh) * 2022-06-29 2022-09-16 深圳前海微众银行股份有限公司 一种数据处理方法、装置、设备及存储介质

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117973683A (zh) * 2024-01-29 2024-05-03 中国人民解放军军事科学院***工程研究院 基于评估知识表征的装备体系效能评估方法和装置
CN117830750A (zh) * 2024-03-04 2024-04-05 青岛大学 一种基于图Transformer的机械故障预测方法
CN117830750B (zh) * 2024-03-04 2024-06-04 青岛大学 一种基于图Transformer的机械故障预测方法
CN117952564A (zh) * 2024-03-22 2024-04-30 江西为易科技有限公司 一种基于进度预测的排程模拟优化方法及***
CN117952564B (zh) * 2024-03-22 2024-06-07 江西为易科技有限公司 一种基于进度预测的排程模拟优化方法及***

Also Published As

Publication number Publication date
CN115063588A (zh) 2022-09-16

Similar Documents

Publication Publication Date Title
WO2024000852A1 (zh) 数据处理方法、装置、设备及存储介质
Tian et al. An intrusion detection approach based on improved deep belief network
CN111178456B (zh) 异常指标检测方法、装置、计算机设备和存储介质
CN111310672A (zh) 基于时序多模型融合建模的视频情感识别方法、装置及介质
Ibrahim et al. Short‐Time Wind Speed Forecast Using Artificial Learning‐Based Algorithms
Nizam et al. Real-time deep anomaly detection framework for multivariate time-series data in industrial iot
CN112966714B (zh) 一种边缘时序数据异常检测和网络可编程控制方法
CN113905391A (zh) 集成学习网络流量预测方法、***、设备、终端、介质
Tan et al. Multi-node load forecasting based on multi-task learning with modal feature extraction
He et al. MTAD‐TF: Multivariate Time Series Anomaly Detection Using the Combination of Temporal Pattern and Feature Pattern
KR102359090B1 (ko) 실시간 기업정보시스템 이상행위 탐지 서비스를 제공하는 방법과 시스템
Legrand et al. Study of autoencoder neural networks for anomaly detection in connected buildings
Liu et al. Ship navigation behavior prediction based on AIS data
CN115329799A (zh) 桥梁安全状态监测方法、装置、计算机设备和存储介质
KR102352954B1 (ko) 예측 자동 회귀 기반 실시간 기업정보시스템 사용자 이상행위 탐지 시스템 및 방법
Li et al. A lstm-based method for comprehension and evaluation of network security situation
CN116910573B (zh) 异常诊断模型的训练方法及装置、电子设备和存储介质
Li et al. A framework for predicting network security situation based on the improved LSTM
CN117095460A (zh) 基于长短时关系预测编码的自监督群体行为识别方法及其识别***
CN115762783A (zh) 一种急性肾损伤预测***
ABBAS A survey of research into artificial neural networks for crime prediction
WO2022022059A1 (en) Context aware anomaly detection
Chen et al. Machine learning-based anomaly detection of ganglia monitoring data in HEP Data Center
CN112232557B (zh) 基于长短期记忆网络的转辙机健康度短期预测方法
CN113723660A (zh) 一种基于dnn-lstm融合模型的特定行为类型预测方法及***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22948941

Country of ref document: EP

Kind code of ref document: A1