WO2022026022A1

WO2022026022A1 - Model selection and parameter estimation for anomaly detection

Info

Publication number: WO2022026022A1
Application number: PCT/US2021/030761
Authority: WO
Inventors: Yuanxiang YING; Juanyong DUAN; Congrui Huang; Yujing WANG; Chunlei Wang; Bixiong Xu; Guodong XING; Jie Tong; Jing Bai; Qi Zhang
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2020-07-31
Filing date: 2021-05-05
Publication date: 2022-02-03
Also published as: CN111914932A

Abstract

The present disclosure provides method and apparatus for implementing anomaly detection. A time-series data including a plurality of data points may be obtained. A feature representation of a data segment corresponding to a current data point may be generated. An anomaly detection model may be selected from a set of candidate anomaly detection models based on the feature representation. A parameter set associated with the selected anomaly detection model may be estimated. Anomaly detection may be performed on the current data point with the selected anomaly detection model and the estimated parameter set.

Description

MODEL SELECTION AND PARAMETER ESTIMATION FOR ANOMALY

DETECTION

BACKGROUND

[0001] Anomaly detection may discover abnormal or unexpected incident in a time- series data. Herein, a time-series data refers to a data sequence recorded in a chronological order, and data points in the data sequence reflect state or degree of changes of a particular phenomenon, index, matter, etc. along with time. Anomaly detection is widely implemented in many industrial applications, and is an important research area in data mining. Accurate anomaly detection may trigger prompt troubleshooting, help to avoid loss in revenue, maintain the reputation and branding of an operating entity, etc. Many large companies have built their own anomaly detection systems, to monitor healthiness of their business, products, services, etc. For example, through an anomaly detection system, an internet company may monitor in real time various metrics of its applications and services, e.g., page views, sales revenue, etc. An anomaly detection system may monitor time-series data continuously, and alert for potential incidents timely. When an administrator or a user receives an alert about an incident, he may make a decision for this incident timely. SUMMARY

[0002] This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

[0003] Embodiments of the present disclosure propose method and apparatus for implementing anomaly detection. A time-series data including a plurality of data points may be obtained. A feature representation of a data segment corresponding to a current data point may be generated. An anomaly detection model may be selected from a set of candidate anomaly detection models based on the feature representation. A parameter set associated with the selected anomaly detection model may be estimated. Anomaly detection may be performed on the current data point with the selected anomaly detection model and the estimated parameter set.

[0004] It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects. [0006] FIG.l illustrates an exemplary process for implementing anomaly detection according to an embodiment of the present disclosure.

[0007] FIG.2 illustrates an exemplary process for feature generation according to an embodiment of the present disclosure.

[0008] FIG.3 illustrates another exemplary process for feature generation according to an embodiment of the present disclosure.

[0009] FIG.4 illustrates an exemplary process for training a primary classifier and a primary estimator according to an embodiment of the present disclosure.

[0010] FIG.5 illustrates an exemplary process for anomaly detection result tuning according to an embodiment of the present disclosure.

[0011] FIG.6 illustrates exemplary user interfaces for anomaly detection result tuning according to an embodiment of the present disclosure.

[0012] FIG.7 is a flowchart of an exemplary method for implementing anomaly detection according to an embodiment of the present disclosure.

[0013] FIG.8 illustrates an exemplary apparatus for implementing anomaly detection according to an embodiment of the present disclosure.

[0014] FIG.9 illustrates an exemplary apparatus for implementing anomaly detection according to an embodiment of the present disclosure.

DETAILED DESCRIPTION

[0015] The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

[0016] An anomaly detection model is usually employed to perform anomaly detection on a time-series data. Existing anomaly detection models may be categorized into statistical models, supervised models, unsupervised models, etc. A statistical model may be easily adopted online, but its accuracy is not sufficient for industrial applications. A supervised model has high accuracy, but it is difficult to adapt to online applications because it relies on a large amount of continuously labeled data. An unsupervised model may reconstruct probabilities of normal time-series data, and report anomaly points when a reconstruction error is higher than a threshold, but is time-consuming or parameter-sensitive.

[0017] Time-series data have various patterns, and their data distributions are usually stochastic and change continuously over time. A specific anomaly detection model may usually only exhibit good performance for a specific pattern of time-series data. It is desirable to select a suitable anomaly detection model and parameters for different patterns of time-series data. Currently, a heuristic classifier is usually employed to perform the selection of anomaly detection model. The heuristic classifier may be a rule-based classifier, which may select a suitable anomaly detection model for a time-series data based on some predefined rules. These predefined rules are based on, for example, periodicity, granularity, and line type, etc. of the time-series data. Moreover, an empirical estimator may be employed to estimate suitable parameters for the selected anomaly detection model. The empirical estimator may be a rule-based estimator, which may estimate suitable parameters for a time-series data based on some predefined rules. These predefined rules are based on, for example, a number of data points, granularity, and line type, etc. of the time-series data. In the case of employing a heuristic classifier to select an anomaly detection model and employing an empirical estimator to estimate parameters, when a new time-series data of a different pattern appear, it is required to manually tuning the heuristic classifier and the empirical estimator by defining new rules for the time-series data. This work is cumbersome and time-consuming.

[0018] Embodiments of the present disclosure propose a scalable anomaly detection system capable of performing anomaly detection on different patterns of time-series data from various data sources. The anomaly detection system according to the embodiments of the present disclosure may automatically determine a suitable anomaly detection model and corresponding parameter set based on features of a time-series data, and perform anomaly detection on the time-series data with the determined anomaly detection model and parameter set.

[0019] In an aspect, the embodiments of the present disclosure propose to select dynamically an anomaly detection model and a parameter set as data points in a time-series data change. For example, a feature representation of a data segment corresponding to a current data point of a time-series data may be generated in real time, a suitable anomaly detection model may be selected for the current data point based on the feature representation, and a parameter set associated with the selected anomaly detection model may be estimated. Therefore, pattern changes of the time-series data can be sensed in real time and the employed anomaly detection model and parameter set can be dynamically adjusted. Herein, the current data point may refer to a data point that is currently being detected during the sequential execution of anomaly detection on a series of data points in the time-series data. It should be appreciated that although the current data point is one data point as an example in the following discussion, the embodiments of the present disclosure may also cover the situation where the current data point includes more than one data points. [0020] In another aspect, the embodiments of the present disclosure propose to perform a selection of an anomaly detection model from a plurality of candidate anomaly detection models through a pre-trained classifier, and perform an estimation of a parameter set through a pre-trained estimator. The classifier and the estimator may be trained using a training dataset including a plurality of time-series data samples.

[0021] In yet another aspect, the embodiments of the present disclosure propose to train a classifier and an estimator with time-series data samples having feedback-based labels. The feedback-based labels may be provided by users.

[0022] In still another aspect, the embodiments of the present disclosure propose a customized anomaly detection result tuning mechanism. An anomaly detection result of a time-series data may be tuned with a single tuning parameter, to update a number of anomaly points included in the anomaly detection result. The tuning parameter may be set through a specific user interface, thereby realizing intuitive and efficient customization of the anomaly detection result.

[0023] FIG. 1 illustrates an exemplary process 100 for implementing anomaly detection according to an embodiment of the present disclosure. The process 100 may be performed by an anomaly detection system according to an embodiment of the present disclosure, which intends to perform anomaly detection on a time-series data 102.

[0024] The time-series data 102 may come from various types of data sources, such as online streaming data, cloud storage, databases, etc., and may be targeted at various types of application scenarios or businesses, such as service traffic, product sales revenue, and Internet of Things (IoT) data, machine operation data, business operation data, etc. It should be appreciated that the embodiments of the present disclosure are not limited to any specific types of time-series data, but can be applied to different types of time-series data in different application scenarios in the same or similar manner. The time-series data 102 may be updated according to a specified time granularity. The time-series data 102 may include a plurality of data points, and the time granularity indicates an update frequency of these data points, such as a minute, an hour, a day, etc. Each data point includes a metric collected at a corresponding time point.

[0025] The anomaly detection system according to the embodiments of the present disclosure may determine a suitable anomaly detection model and an associated parameter set for each data point in a time-series data according to a data segment where the data point is located, and perform anomaly detection on the data point with the determined anomaly detection model and parameter set. The following takes a current data point 104 in the time- series data 102 as an example to illustrate an exemplary process of performing anomaly detection.

[0026] A data segment 106 corresponding to the current data point 104 may be obtained. In an implementation, a limited length of historical data points adjacent to the current data point 104 may be cut from the time series data 102. These historical data points and the current data point 104 may form the data segment 106 corresponding to the current data point 104. The length of the data segment 106 may also be referred to as a detection window size. By considering the data segment 106 in the process 100, a piece of historical data points in the data segment 106 may be used to effectively simulate the actual scene of the current data point 104, so as to perform anomaly detection on the current data point more accurately. [0027] Subsequently, a feature generation operation may be performed on the data segment 106 through a feature generation module 108, in order to generate a feature representation 110 of the data segment 106. An exemplary process of feature generation will be described later in conjunction with FIGS. 2 and 3.

[0028] The feature representation 110 may be provided to a model selector 112. The model selector 112 may select an anomaly detection model from a set of candidate anomaly detection models 114 based on the feature representation 110. The set of candidate anomaly detection models 114 may include various anomaly detection models. Since statistical models and unsupervised models can efficiently achieve stable performance, the set of candidate anomaly detection models 114 preferably includes these two types of models, such as a SR model, a Histogram-based Outlier Score (HBOS) model, a Seasonal Hybrid Extreme Studentized Deviate test (SH-ESD) model, etc. The selected anomaly detection model is, for example, a anomaly detection model 116. The model selector 112 may, for example, include a primary classifier and an optional secondary classifier. The primary classifier may be a multi-classifier trained with a training dataset including a plurality of time-series data samples. An exemplary process of training the primary classifier will be described later in conjunction with FIG. 4. The secondary classifier may be a rule-based heuristic classifier. The secondary classifier according to the embodiments of the present disclosure may select a suitable anomaly detection model for a current data point based on some predefined rules according to a data segment corresponding to the current data point. These predefined rules are, for example, based on periodicity, granularity, and line type of the data segment corresponding to the current data point.

[0029] In an implementation, the primary classifier may individually perform a selection of an anomaly detection model. For example, an anomaly detection model may be selected through the primary classifier, and the anomaly detection model selected through the primary classifier may be directly determined as the selected anomaly detection model 116. In another implementation, the primary classifier and the secondary classifier may be combined with each other to perform the selection of the anomaly detection model. For example, a preliminary anomaly detection model may be selected first through the primary classifier, and a confidence score associated with the preliminary anomaly detection model may be provided at the same time. The confidence score may indicate a credibility of the selected preliminary anomaly detection model. If the confidence score is higher than a predetermined threshold, the preliminary anomaly detection model may be determined as the selected anomaly detection model 116. Alternatively, if the confidence score is not higher than the predetermined threshold, the selected anomaly detection model 116 may be determined through the secondary classifier.

[0030] Each anomaly detection model may be equipped with an associated parameter estimator, which is used to calculate a parameter set required to perform anomaly detection. The parameter set may include values of one or more parameters associated with the anomaly detection model. The parameters include, for example, an anomaly detection threshold, an anomaly ratio, a detection window size, etc.. These parameters may also be referred to as hyper parameters. After the anomaly detection model 116 is selected, a parameter estimator 118 associated with it may be invoked. The parameter estimator 118 may estimate a parameter set 120 associated with the anomaly detection model 116 based on the feature representation 110 of the data segment 106. The parameter estimator 118 may include a primary estimator and an optional secondary estimator. The primary estimator may be an estimator trained with a training dataset including a plurality of time-series data samples. An exemplary process of training the primary estimator will be described later in conjunction with FIG. 4. The secondary estimator may be a rule-based empirical estimator. The secondary estimator according to the embodiments of the present disclosure may estimate a suitable parameter set for a current data point based on some predefined rules according to a data segment corresponding to the current data point. These predefined rules are, for example, based on a number of data points, granularity, line type, etc. of the data segment corresponding to the current data point.

[0031] In an implementation, the primary estimator may individually perform an estimation of a parameter set. For example, a parameter set may be estimated through the primary estimator, and the parameter set estimated through the primary estimator may be directly determined as the estimated parameter set 120. In another implementation, the primary estimator and the secondary estimator may be combined with each other to perform the estimation of the parameter set. For example, a preliminary parameter set may be estimated first through the primary estimator, and a confidence score associated with the preliminary parameter set may be provided at the same time. The confidence score may indicate a credibility of the estimated preliminary parameter set. If the confidence score is higher than a predetermined threshold, the preliminary parameter set may be determined as the estimated parameter set 120. Alternatively, if the confidence score is not higher than the predetermined threshold, the estimated parameter set 120 may be determined through the secondary estimator.

[0032] The selected anomaly detection model 116 and the estimated parameter set 120 may be used to perform anomaly detection on the current data point 104 to obtain an anomaly detection value 122 corresponding to the current data point 104.

[0033] Through iteratively performing the above process on each data point in the time- series data 102, a plurality of anomaly detection values corresponding to these data points may be obtained. These anomaly detection values may be combined into a preliminary anomaly detection result 124 corresponding to the time-series data 102. Optionally, according to an embodiment of the present disclosure, a single tuning parameter, such as a sensitivity value 128, may be used to tune the preliminary anomaly detection result 124 to update a number of anomaly points included in the preliminary anomaly detection result. Herein, the sensitivity value may refer to a parameter used to tune the anomaly detection result. The sensitivity value may be used to change an acceptable range of metric at each time point, thereby modifying anomaly detection values of one or more data points. When a collected metric of a data point falls within the acceptable range of metric at a time point corresponding to the data point determined by the sensitivity value, the data point may be determined to be normal, otherwise, the data point may be determined to be abnormal. For example, the preliminary anomaly detection result 124 may be provided to a result tuning module 126. The result tuning module 126 may receive an indication of the sensitivity value 128 for the preliminary anomaly detection result 124, tune the preliminary anomaly detection result 124 based on the sensitivity value 128, and provide the tuned anomaly detection result 130. An exemplary process of tuning the anomaly detection result will be described later in conjunction with FIG. 5.

[0034] It should be appreciated that the process 100 in FIG. 1 is only an example of the process for implementing anomaly detection. According to actual application requirements, the process for implementing anomaly detection may include any other steps, and may include more or fewer steps. For example, the tuning process of the preliminary anomaly detection result may be omitted from the process 100, so that the preliminary anomaly detection result 124 is directly used as the final anomaly detection result. In addition, the specific order or hierarchy of the steps in the process 100 is only exemplary, and the process for implementing anomaly detection may be performed in an order different from the described order.

[0035] FIG. 2 illustrates an exemplary process 200 for feature generation according to an embodiment of the disclosure. The process 200 may generate a feature representation of a data segment by directly performing a feature extraction operation on the data segment. The process 200 may be executed, for example, by the feature generation module 108 in FIG. 1, to generate a feature representation 204 of a data segment 202. The data segment 202 may correspond to the data segment 106 in FIG. 1, and the feature representation 204 may correspond to the feature representation 110 in FIG. 1.

[0036] The data segment 202 may be represented as v e R^m, wherein m is a number of data points included in the data segment 202. A set of feature extraction operations may be performed on the data segment 202 through a set of feature extractors, such as feature extractors 210-1, 210-2, ..., 210-&, to extract a set of sub feature representations from the data segment 202, such as sub feature representations 220-1, 220-2, ..., 220 -k. It should be appreciated that the set of feature extractors may include one or more feature extractors. These feature extractors may extract features such as variance, autocorrelation, entropy, period, etc. given by the sub feature representations. The feature extractor 210-/ may be represented as /;( ), and the sub feature representation 220-/ may be represented as ft(v ), / e [l ,£]. The sub feature representations 220-1, 220-2, ..., 220 -k may be combined into a feature representation 204 corresponding to the data segment 202. The feature representation 204 may be represented as U , as shown in the following formula:

[0037] FIG. 3 illustrates another exemplary process 300 for feature generation according to an embodiment of the present disclosure. The process 300 may generate a feature representation of the data segment by performing transformation operations on a data segment first, and then performing feature extraction operations on the transformed data segment. Through the transformation operations, it can help to obtain a richer feature representation. The process 300 may be performed by, for example, the feature generation module 108 in FIG. 1, to generate a feature representation 304 of a data segment 302. The data segment 302 may correspond to the data segment 106 in FIG. 1, and the feature representation 304 may correspond to the feature representation 110 in FIG. 1.

[0038] The data segment 302 may be represented as v e R^m, wherein m is a number of data points included in the data segment 302. A set of transformation operations may be performed on the data segment 302 through a set of transformers, such as transformers 310- 1, 310-2, ..., 310-/7, to obtain a set of transformed data segments, such as transformed data segments 320-1, 320-2, ..., 320 -n. The set of transformation operations includes, for example, Fast Fourier Transform (FFT), Detrend, De-seasonality, Spectral Residual (SR), etc.. It should be appreciated that the set of transformers may include one or more transformers. The transformed data segment 320-/ may be represented as w _/<º [ 1 ,//], as shown in the following formula: w_t <- ti(v) (2)

[0039] Subsequently, a set of feature extraction operations are performed on each transformed data segment in the set of transformed data segments through a set of feature extractors, such as feature extractors 330-1, 330-2, ..., 330-&, to obtain a set of initial feature representations corresponding to the transformed data segment. The initial feature representations correspond to variance, autocorrelation, entropy, period, etc., for example. A set of initial feature representations corresponding to the transformed data segment 320-/ may be represented as fi(w_j ,f₂(w_i), ...,f_k(w_i ^'). For example, for the transformed data segment 320-1, a set of initial feature representations 340-1-1, 340-1-2,..., 340-l-& may be obtained; for the transformed data segment 320-2, a set of initial feature representations 340-2-1, 340-2-2, ..., 340-2 -k may be obtained; and for the transformed data segment 320- //, a set of initial feature representations 340-//- 1 340-//-2, ..., 340-n-k ay be obtained. [0040] A set of initial feature representations/_! (Wi),/₂(Wi),

corresponding to each transformed data segment 320-/ may be combined into a sub feature representation 350-/ corresponding to the transformed data segment. The sub feature representation 350-/ may be represented as u as shown in the following formula:

¾ <- (AC w;), ...,/_fc(Wi)) (3)

[0041] For example, a set of initial feature representations 340-1-1, 340-1-2, ..., 340-1- k may be combined into a sub feature representation 350-1 corresponding to a transformed data segment 320-1; a set of initial feature representations 340-2-1, 340-2-2, ..., 340-2- : ay be combined into a sub feature representation 350-2 corresponding to a transformed data segment 320-2; and a set of initial feature representations 340-w-l, 340-//-2, ..., 340-n-k ay be combined into a sub feature representation 350 -n corresponding to a transformed data segment 320 -n.

[0042] Next, the set of sub feature representations 350-1, 350-2, ..., 350 -n corresponding to the set of transformed data segments 320-1, 320-2, ..., 320 -n may be combined into a feature representation 304 corresponding to the data segment 302. The feature representation 304 may be represented as U, as shown in the following formula:

[0043] It should be appreciated that although the process 200 in FIG. 2 and the process 300 in FIG. 3 are shown as separate processes, the process 200 and the process 300 may be executed in combination with each other according to actual application requirements. For example, for a data segment, feature extraction operations may be directly performed on an original data segment according to the process 200, and transformation operations may be performed on the data segment and then feature extraction operations may be performed on the transformed data segment according to the process 300. Finally, a result of the feature extraction operations according to the process 200 and a result of the feature extraction operations according to the process 300 are combined to obtain a feature representation corresponding to the data segment.

[0044] FIG. 4 illustrates an exemplary process 400 for training a primary classifier and a primary estimator according to an embodiment of the present disclosure. The process 400 may be performed offline. The trained primary classifier may be included in a model selector for selecting an anomaly detection model, e.g., included in the model selector 112 in FIG. 1, and the trained primary estimator may be included in a parameter estimator for estimating a parameter set associated with an anomaly detection model, e.g., included in the parameter estimator 118 in FIG. 1.

[0045] The primary classifier 428 and the primary estimator 420 may be trained with a training dataset 402. The training dataset 402 may include a plurality of time-series data samples, and each time-series data sample may have an anomaly label for indicating anomaly points in the time-series data sample. The following takes a time-series data sample 404 in the training dataset 402 as an example to illustrate an exemplary process for training the primary classifier and the primary estimator. A feature representation 418 of the time- series data sample 404 may be generated. The feature representation 418 of the time-series data sample 404 may be generated, for example, through the process 200 in FIG. 2 and/or the process 300 in FIG. 3.

[0046] Each candidate anomaly detection model in a set of candidate anomaly detection models 406 may be used to perform anomaly detection on the time-series data sample 404. The set of candidate anomaly detection models 406 may correspond to the set of candidate anomaly detection models 114 in FIG. 1. Each candidate anomaly detection model may be equipped with an associated primary estimator, which is used to calculate a parameter set required to perform anomaly detection. The following takes a candidate anomaly detection model 408 in the set of candidate anomaly detection models 406 as an example to illustrate an exemplary process for training the primary estimator associated with it.

[0047] The candidate anomaly detection model 408 may have associated multiple candidate parameter sets 410, and each candidate parameter set includes a possible combination of values of various parameters associated with the candidate anomaly detection model 408. The candidate anomaly detection model 408 may use each candidate parameter set in the plurality of candidate parameter sets 410 to perform anomaly detection on the time-series data sample 404 to obtain a set of anomaly detection results 412.

[0048] A best anomaly detection result 414 may be identified from the set of anomaly detection results 412 based on the anomaly label in the time-series data sample 404. The best anomaly detection result 414 may be, for example, an anomaly detection result closest to the anomaly label. Subsequently, a candidate parameter set that produces the best anomaly detection result 414 in the candidate parameter set 410 may be determined as a best parameter set 416.

[0049] At 420, at least the feature representation 418 of the time-series data sample 404 and the best parameter set 416 may be used to train a primary estimator associated with the candidate anomaly detection model 408, so that when actually deployed, the trained primary estimator may estimate a best parameter set of a corresponding anomaly detection model based on a feature representation of a time-series data.

[0050] Each candidate anomaly detection model in the set of candidate anomaly detection models 406 may use each candidate parameter set in its associated multiple candidate parameter sets to perform anomaly detection on the time-series data sample 404. The best anomaly detection results generated by each candidate anomaly detection model may form a set of best anomaly detection results 422 corresponding to the set of candidate anomaly detection models 406. An optimal anomaly detection result 424 for the time-series data sample 404 may be identified from the set of best anomaly detection results 422. Subsequently, a candidate anomaly detection model that produces the optimal anomaly detection result 424 in the set of candidate anomaly detection models 406 may be determined as a best anomaly detection model 426. At 428, at least the feature representation 418 of the time-series data sample 404 and the identification of the best anomaly detection model 426 may be used to train the primary classifier, so that when actually deployed, the trained classifier may select a best anomaly detection model based on a feature representation of a time-series data.

[0051] In an implementation, the set of candidate anomaly detection models 406 is extensible. For example, when an additional candidate anomaly detection model other than the set of candidate anomaly detection models 406 is obtained, anomaly detection may be performed on the time-series data sample 404 with the additional candidate anomaly detection model to obtain an additional anomaly detection result. Specifically, the additional candidate anomaly detection model may use each candidate parameter set of multiple candidate parameter sets associated with it to perform anomaly detection on the time-series data sample 404 to obtain a set of anomaly detection results. Subsequently, a best anomaly detection result may be identified, based on the anomaly label in the time-series data sample 404, from the set of anomaly detection results as an additional anomaly detection result corresponding to the additional candidate anomaly detection model. This additional anomaly detection result may be compared with the best anomaly detection result 424 for the time-series data sample 404. If the additional anomaly detection result is better than the best anomaly detection result 424, the best anomaly detection model 426 may be updated as the additional candidate anomaly detection model. Further, at least the feature representation 418 and the identification of the other candidate anomaly detection model may be used to retrain the primary classifier.

[0052] The anomaly detection system according to the embodiments of the present disclosure also has a feedback mechanism. The feedback mechanism may update a training dataset by adding feedback-based labels to one or more time-series data samples in the training dataset, and retrain a primary classifier and a primary estimator with the updated training dataset. A feedback-based label may include, for example, an anomaly label given by a user for a time-series data sample. Herein, the user may broadly refer to a company, an organization, an individual, etc. that uses the anomaly detection system according to the embodiments of the present invention. For example, the user may own, produce, provide, or manage a time-series data, and want to monitor anomaly events in the time-series data. [0053] According to an embodiment of the present disclosure, after a preliminary anomaly detection result of a time-series data is obtained with an anomaly detection model and an associated parameter set, an acceptable range of metric at each time point may be changed with a sensitivity value, to modify anomaly detection values of one or more data points, thereby updating a number of anomaly points included in an anomaly detection result. A user may set the sensitivity value through a user interface, so that an anomaly detection result may be customized intuitively and efficiently.

[0054] FIG. 5 illustrates an exemplary process 500 for anomaly detection result tuning according to an embodiment of the present disclosure. The process 500 may be performed by the result tuning module 126 in FIG. 1, for example. The result tuning module 126 may determine a tolerance range of at least one data point in a time-series data based on a sensitivity value, and modify an anomaly detection value of the at least one data point based at least on the tolerance range. Herein, the tolerance range may refer to an acceptable range of metric at each time point. The data point may be represented as

wherein i E [0, m] and m is a number of data points included in the time-series data. The data point v_t may have a preliminary anomaly detection value a_u such as the anomaly detection value 122 in FIG. 1. As an example, when v_t is an anomaly point, a_t = 1, and when

is a normal point, a_t = 0.

[0055] At 510, an indication of a sensitivity value for an anomaly detection result for a time-series data including a plurality of data points may be received. As an example, the sensitivity value may be represented as a e [0,100], wherein a larger a means a larger number of anomaly points to be reported. It should be appreciated that representing the sensitivity value in this way is only an example, and the embodiments of the present disclosure is not limited to this, but the sensitivity value may be represented in other ways. [0056] At 520, the data point of the time-series data may be decomposed into at least a trend component g_t and a loss component e_t. In an implementation, a Seasonal-Trend decomposition procedure based on LOESS (STL) may be used to decompose the data point. Using STL, a seasonality component s_t of the data point may also be obtained at the same time. [0057] At 530, a delta unit g_t may be calculated based on the trend component g_t of the data point as shown in the following formula: i <- 0.5 - \g_t\ + 0.S - åJ¹ ₌ ^l ₁\g_j\/m (5)

[0058] At 540, a tolerance range

of the data point may be determined based on the delta unit g_t and the sensitivity value a, as shown in the following formula:

factor(a) g_t (6) wherein factor(a) is an exponentially decreasing function w.r.t. a.

[0059] At 550, an anomaly detection value of the data point may be modified based on the tolerance range and the loss component e_L of the data point. In an implementation, when the absolute value of the loss component e_L of the data point is within the tolerance range of the data point, the data point will be labeled as a normal point, as shown in the following formula: a'i <- a* l(fii> |GI) (7) wherein a! _L is the modified anomaly detection value of data point v_L.

[0060] Alternatively, when the absolute value of the loss component e_L of the data point Vi is not within the tolerance range S_L of the data point, the data point will be labeled as an anomaly point.

[0061] Through steps 520-550, the anomaly detection value of at least one data point in the time-series data may be modified. Current anomaly detection values of various data points may be combined together as the tuned anomaly detection result of the time-series data. At 560, the tuned anomaly detection result may be provided.

[0062] According to an embodiment of the present disclosure, a user may set a sensitivity value for tuning an anomaly detection result through a user interface. FIG. 6 illustrates exemplary user interfaces 600a and 600b for tuning an anomaly detection result according to an embodiment of the present disclosure. It should be appreciated that the user interfaces 600a and 600b are only examples of user interfaces for tuning the anomaly detection result, and the embodiments of the present disclosure are not limited to this, and other forms of user interface may be used to tune the anomaly detection result. In addition, the user interfaces 600a and 600b are only a part of the user interface for tuning the anomaly detection result. According to actual application requirements, the user interface for tuning the anomaly detection result may also include other parts, such as options for setting display parameters, etc..

[0063] In the user interfaces 600a and 600b, anomaly detection results are labeled on a time-series data curve. The horizontal axis represents the time of each data point, which may adopt various time granularities, such as a minute, an hour, a day, etc.. The projection of each data point on the vertical axis indicates a metric collected for that data point. A shaded area surrounding the time-series data curve is composed of the tolerance ranges of the data points on the time-series data curve. The tolerance range of each data point may be determined, for example, through steps 520-540 in FIG. 5. When a metric of a data point falls outside the shaded area, the data point will be labeled as an anomaly point, as indicated by the symbol " X " in the figure.

[0064] According to an embodiment of the present disclosure, a tolerance range of each data point may be controlled by a sensitivity value, as shown in the above formula (6). When the sensitivity value increases, the tolerance range will decrease, and the shaded area will narrow accordingly, so that more data points will be reported as anomaly points. The user interface 600a illustrates the anomaly detection result when the sensitivity value is 65, and the user interface 600b illustrates the anomaly detection result when the sensitivity value is 80. It may be intuitively seen from FIG. 6 that the shaded area of the user interface 600b is smaller than the shaded area of the user interface 600a, and the number of anomaly points reported by the user interface 600b is more than the number of anomaly points reported by the user interface 600a.

[0065] When the anomaly detection system according to the embodiments of the present disclosure is used to perform anomaly detection on a time-series data, if an anomaly detection result does not meet a predefined criterion, it may automatically trigger a reselection of an anomaly detection model and a re-estimation of a parameter set. The predefined criterion may be based on an anomaly ratio and/or a false alert rate of the anomaly detection result, for example. In an implementation, the anomaly ratio and/or false alert rate of the anomaly detection result may be provided to a model selector. The model selector may reselect an anomaly detection model for the time-series data based on the anomaly ratio and/or false alert rate. A parameter estimator associated with the reselected anomaly detection model may also receive the anomaly ratio and/or false alert rate, and re- estimate a parameter set for the time-series data. In this way, the time cost for improving the anomaly detection result that does not meet the predefined criterion may be reduced, thereby achieving more efficient anomaly detection.

[0066] In addition, the model selector selecting an inappropriate model for the time- series data and/or the parameter estimator estimating inappropriate parameters may cause the anomaly detection result of the time-series data to not meet the predefined criterion. According to an embodiment of the present disclosure, the time-series data may be used as a training sample to retrain a primary classifier in the model selector and a primary estimator in the parameter estimator. The primary classifier and primary estimator may be retrained in a process similar to the process 400 in FIG. 4. For example, an anomaly label may be added to the time-series data. The anomaly label may be provided by a user. Each candidate anomaly detection model in a set of candidate anomaly detection models and each candidate parameter set in a plurality of candidate parameter sets associated therewith may be used to perform anomaly detection on the time-series data to obtain a set of anomaly detection results. It may be determined whether a best anomaly detection result in the set of anomaly detection results meets a predefined criterion based on the anomaly label of the time-series data. If the best anomaly detection result meets the predefined criterion, the identification of the anomaly detection model that produces the best anomaly detection result may be used to retrain the primary classifier, and the parameter set that produces the best anomaly detection result may be used to retrain the primary estimator associated with the anomaly detection model. On the contrary, if the best anomaly detection result in the set of anomaly detection results still does not meet the predefined criterion, a new anomaly detection model that may process the time-series data may be added to the set of candidate anomaly detection models to obtain an updated set of candidate anomaly detection models, and the updated set of candidate anomaly detection models may be used to retrain the primary classifier and primary estimator.

[0067] FIG.7 is a flowchart of an exemplary method 700 for implementing anomaly detection according to an embodiment of the present disclosure.

[0068] At step 710, a time-series data including a plurality of data points may be obtained.

[0069] At step 720, a feature representation of a data segment corresponding to a current data point may be generated.

[0070] At step 730, an anomaly detection model may be selected from a set of candidate anomaly detection models based on the feature representation;

[0071] At step 740, a parameter set associated with the selected anomaly detection model may be estimated.

[0072] At step 750, anomaly detection may be performed on the current data point with the selected anomaly detection model and the estimated parameter set.

[0073] In an implementation, the generating a feature representation may comprise: performing a set of feature extraction operations on the data segment to obtain a set of sub feature representations; and combining the set of sub feature representations into the feature representation.

[0074] In an implementation, the generating a feature representation may comprise: performing a set of transformation operations on the data segment to obtain a set of transformed data segments; for each transformed data segment in the set of transformed data segments, performing a set of feature extraction operations to obtain a set of initial feature representations, and combining the set of initial feature representations into a sub feature representation corresponding to the transformed data segment; and combining a set of sub feature representations corresponding to the set of transformed data segments into the feature representation.

[0075] In an implementation, the selecting an anomaly detection model may be performed through a model selector. The model selector may include a primary classifier. [0076] The selecting an anomaly detection model may comprise: selecting a preliminary anomaly detection model through the primary classifier; determining whether a confidence score associated with the preliminary anomaly detection model is higher than a threshold; and in response to determining that the confidence score is higher than the threshold, determining the preliminary anomaly detection model as the selected anomaly detection model.

[0077] The model selector may further comprise a secondary classifier. The selecting an anomaly detection model may further comprise: in response to determining that the confidence score is not higher than the threshold, determining the selected anomaly detection model through the secondary classifier.

[0078] The primary classifier may be trained by the following operations: for a time- series data sample in a training dataset, identifying a best anomaly detection model for the time-series data sample from the set of candidate anomaly detection models; generating a feature representation of the time-series data sample; and training the primary classifier at least using the feature representation of the time-series data sample and an identification of the best anomaly detection model.

[0079] The time-series data sample may have an anomaly label. The identifying a best anomaly detection model may comprise: performing anomaly detection on the time-series data sample with each candidate anomaly detection model in the set of candidate anomaly detection models, respectively, to obtain a set of anomaly detection results; identifying a best anomaly detection result from the set of anomaly detection results based on the anomaly label; and determining a candidate anomaly detection model that produces the best anomaly detection result as the best anomaly detection model.

[0080] The identifying a best anomaly detection model may further comprise: obtaining an additional candidate anomaly detection model; performing anomaly detection on the time-series data sample with the additional candidate anomaly detection model to obtain an additional anomaly detection result; determining that the additional anomaly detection result is better than the best anomaly detection result; and updating the best anomaly detection model as the additional candidate anomaly detection model.

[0081] The estimating a parameter set may be performed through a parameter estimator associated with the selected anomaly detection model. The parameter estimator may include a primary estimator.

[0082] The estimating a parameter set may comprise: estimating a preliminary parameter set through the primary estimator; determine whether a confidence score associated with the preliminary parameter set is higher than a threshold; and in response to determining that the confidence score is higher than the threshold, determining the preliminary parameter set as the estimated parameter set.

[0083] The parameter estimator may further comprise a secondary estimator. The estimating a parameter set may further comprise: in response to determining that the confidence score is not higher than the threshold, determining the estimated parameter set through the secondary estimator.

[0084] The primary estimator may be trained by the following operations: for a time- series data sample in a training dataset, identifying a best parameter set for the time-series data sample from candidate parameter sets associated with the selected anomaly detection model; generating a feature representation of the time-series data sample; and training the primary estimator at least using the feature representation of the time-series data sample and the best parameter set.

[0085] The time-series data sample may have an anomaly label. The identifying a best parameter set may comprise: for each candidate parameter set in the candidate parameter sets, performing anomaly detection on the time-series data sample with the selected anomaly detection model and the candidate parameter set, respectively, to obtain a set of anomaly detection results; identifying a best anomaly detection result from the set of anomaly detection results based on the anomaly label; and determining a candidate parameter set that produces the best anomaly detection result as the best parameter set.

[0086] The time-series data sample may have a feedback-based label. The feedback- based label may include at least an anomaly label.

[0087] In an implementation, the method 700 may further comprise: receiving an indication of a sensitivity value of an anomaly detection result of the time-series data; tuning the anomaly detection result based on the sensitivity value; and providing the tuned anomaly detection result.

[0088] The tuning the anomaly detection result may comprise: determining a tolerance range of at least one data point in the time-series data based on the sensitivity value; and modifying an anomaly detection value of the at least one data point based at least on the tolerance range.

[0089] It should be appreciated that the method 700 may further comprise any steps/processes for implementing customized anomaly detection according to the embodiments of the present disclosure as mentioned above.

[0090] FIG.8 illustrates an exemplary apparatus 800 for implementing anomaly detection according to an embodiment of the present disclosure.

[0091] The apparatus 800 may comprise: a time-series data obtaining module 810, for obtaining a time-series data including a plurality of data points; a feature representation generating module 820, for generating a feature representation of a data segment corresponding to a current data point; a detection model selecting module 830, for selecting an anomaly detection model from a set of candidate anomaly detection models based on the feature representation; a parameter set estimating module 840, for estimating a parameter set associated with the selected anomaly detection model; and an anomaly detection performing module 850, for performing anomaly detection on the current data point with the selected anomaly detection model and the estimated parameter set.

[0092] In an implementation, the apparatus 800 may further comprise a result tuning module, the result tuning module being for: receiving an indication of a sensitivity value of an anomaly detection result of the time-series data; tuning the anomaly detection result based on the sensitivity value; and providing the tuned anomaly detection result.

[0093] Moreover, the apparatus 800 may further comprise any modules for performing steps of the methods for implementing anomaly detection according to the embodiments of the present disclosure as mentioned above.

[0094] FIG.9 illustrates an exemplary apparatus 900 for implementing anomaly detection according to an embodiment of the present disclosure.

[0095] The apparatus 900 may comprise at least one processor 910 and a memory 920 storing computer-executable instructions. When executing the computer-executable instructions, the at least one processor 910 may perform any operations of the methods for implementing anomaly detection according to the embodiments of the present disclosure as mentioned above.

[0096] The embodiments of the present disclosure may be embodied in a non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations of the methods for implementing anomaly detection according to the embodiments of the present disclosure as mentioned above.

[0097] It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

[0098] It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

[0099] Processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a microprocessor, microcontroller, digital signal processor (DSP), a field- programmable gate array (FPGA), a programmable logic device (PLD), a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described throughout the present disclosure. The functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with software being executed by a microprocessor, microcontroller, DSP, or other suitable platform.

[00100] Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, threads of execution, procedures, functions, etc. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip), an optical disk, a smart card, a flash memory device, random access memory (RAM), read only memory (ROM), programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), a register, or a removable disk. Although memory is shown separate from the processors in the various aspects presented throughout the present disclosure, the memory may be internal to the processors, e.g., cache or register.

[00101] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are expressly incorporated herein and are intended to be encompassed by the claims.

Claims

1. A method for implementing anomaly detection, comprising: obtaining a time-series data including a plurality of data points; generating a feature representation of a data segment corresponding to a current data point; selecting an anomaly detection model from a set of candidate anomaly detection models based on the feature representation; estimating a parameter set associated with the selected anomaly detection model; and performing anomaly detection on the current data point with the selected anomaly detection model and the estimated parameter set.

2. The method of claim 1, wherein the generating a feature representation comprises: performing a set of transformation operations on the data segment to obtain a set of transformed data segments; for each transformed data segment in the set of transformed data segments, performing a set of feature extraction operations to obtain a set of initial feature representations, and combining the set of initial feature representations into a sub feature representation corresponding to the transformed data segment; and combining a set of sub feature representations corresponding to the set of transformed data segments into the feature representation.

3. The method of claim 1, wherein the selecting an anomaly detection model is performed through a model selector, the model selector including a primary classifier.

4. The method of claim 3, wherein the selecting an anomaly detection model comprises: selecting a preliminary anomaly detection model through the primary classifier; determining whether a confidence score associated with the preliminary anomaly detection model is higher than a threshold; and in response to determining that the confidence score is higher than the threshold, determining the preliminary anomaly detection model as the selected anomaly detection model.

5. The method of claim 3, wherein the primary classifier is trained by the following operations: for a time-series data sample in a training dataset, identifying a best anomaly detection model for the time-series data sample from the set of candidate anomaly detection models; generating a feature representation of the time-series data sample; and training the primary classifier at least using the feature representation of the time- series data sample and an identification of the best anomaly detection model.

6. The method of claim 5, wherein the time-series data sample has an anomaly label, and the identifying a best anomaly detection model comprises: performing anomaly detection on the time-series data sample with each candidate anomaly detection model in the set of candidate anomaly detection models, respectively, to obtain a set of anomaly detection results; identifying a best anomaly detection result from the set of anomaly detection results based on the anomaly label; and determining a candidate anomaly detection model that produces the best anomaly detection result as the best anomaly detection model.

7. The method of claim 6, wherein the identifying a best anomaly detection model further comprises: obtaining an additional candidate anomaly detection model; performing anomaly detection on the time-series data sample with the additional candidate anomaly detection model to obtain an additional anomaly detection result; determining that the additional anomaly detection result is better than the best anomaly detection result; and updating the best anomaly detection model as the additional candidate anomaly detection model.

8. The method of claim 1, wherein the estimating a parameter set is performed through a parameter estimator associated with the selected anomaly detection model, the parameter estimator including a primary estimator.

9. The method of claim 8, wherein the estimating a parameter set comprises: estimating a preliminary parameter set through the primary estimator; determine whether a confidence score associated with the preliminary parameter set is higher than a threshold; and in response to determining that the confidence score is higher than the threshold, determining the preliminary parameter set as the estimated parameter set.

10. The method of claim 8, wherein the primary estimator is trained by the following operations: for a time-series data sample in a training dataset, identifying a best parameter set for the time-series data sample from candidate parameter sets associated with the selected anomaly detection model; generating a feature representation of the time-series data sample; and training the primary estimator at least using the feature representation of the time- series data sample and the best parameter set.

11. The method of claim 10, wherein the time-series data sample has an anomaly label, and the identifying a best parameter set comprises: for each candidate parameter set in the candidate parameter sets, performing anomaly detection on the time-series data sample with the selected anomaly detection model and the candidate parameter set, respectively, to obtain a set of anomaly detection results; identifying a best anomaly detection result from the set of anomaly detection results based on the anomaly label; and determining a candidate parameter set that produces the best anomaly detection result as the best parameter set.

12. The method of claim 1, further comprising: receiving an indication of a sensitivity value of an anomaly detection result of the time-series data; tuning the anomaly detection result based on the sensitivity value; and providing the tuned anomaly detection result.

13. The method of claim 12, wherein the tuning the anomaly detection result comprises: determining a tolerance range of at least one data point in the time-series data based on the sensitivity value; and modifying an anomaly detection value of the at least one data point based at least on the tolerance range.

14. An apparatus for implementing anomaly detection, comprising: a time-series data obtaining module, for obtaining a time-series data including a plurality of data points; a feature representation generating module, for generating a feature representation of a data segment corresponding to a current data point; a detection model selecting module, for selecting an anomaly detection model from a set of candidate anomaly detection models based on the feature representation; a parameter set estimating module, for estimating a parameter set associated with the selected anomaly detection model; and an anomaly detection performing module, for performing anomaly detection on the current data point with the selected anomaly detection model and the estimated parameter set.

15. An apparatus for implementing anomaly detection, comprising: at least one processor; and a memory storing computer-executable instructions that, when executed, cause the at least one processor to: obtain a time-series data including a plurality of data points, generate a feature representation of a data segment corresponding to a current data point, select an anomaly detection model from a set of candidate anomaly detection models based on the feature representation, estimate a parameter set associated with the selected anomaly detection model, and perform anomaly detection on the current data point with the selected anomaly detection model and the estimated parameter set.