WO2021139253A1 - 获取kpi异常数据样本的方法、装置和计算机设备 - Google Patents

获取kpi异常数据样本的方法、装置和计算机设备 Download PDF

Info

Publication number
WO2021139253A1
WO2021139253A1 PCT/CN2020/118528 CN2020118528W WO2021139253A1 WO 2021139253 A1 WO2021139253 A1 WO 2021139253A1 CN 2020118528 W CN2020118528 W CN 2020118528W WO 2021139253 A1 WO2021139253 A1 WO 2021139253A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
kpi
abnormal data
candidate
path
Prior art date
Application number
PCT/CN2020/118528
Other languages
English (en)
French (fr)
Inventor
邓悦
郑立颖
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139253A1 publication Critical patent/WO2021139253A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Definitions

  • This application relates to the technical field of artificial intelligence, and in particular to a method, device, computer equipment, and storage medium for obtaining KPI abnormal data samples.
  • KPIs key performance indicators
  • KPI anomaly detection is a very important part in the field of intelligent operation and maintenance.
  • KPI anomaly algorithms are usually used to predict KPI anomalies, which requires a large amount of KPI anomaly sample data to help training And evaluation algorithms, but currently marking abnormal KPI data requires professional knowledge and a large amount of labor, which not only consumes a lot of human and material resources, but also has low efficiency.
  • the main purpose of this application is to provide a method, device, computer equipment, and storage medium for obtaining KPI abnormal data samples, aiming to solve the technical problem of time-consuming and labor-consuming marking of abnormal KPI samples in the prior art.
  • this application proposes a method for obtaining KPI abnormal data samples, including: obtaining KPI data for a preset period of time.
  • the KPI data is data of a time series data type, and each piece of KPI data includes multiple data points. , And each unit time corresponds to one said data point in chronological order;
  • Similar distance calculations are performed on the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data to obtain a plurality of regular path distances.
  • the regular path distances are the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data. The sum of the distances of all similar data points in the known KPI abnormal data;
  • the candidate KPI abnormal data is marked as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • This application also provides a device for obtaining KPI abnormal data samples, including:
  • the acquiring data unit is used to acquire KPI data for a preset time period.
  • the KPI data is data of a time series data type. Each piece of KPI data includes multiple data points, and each unit time corresponds to one data in chronological order point;
  • the abnormality detection unit is configured to perform abnormality detection on the KPI data to obtain the corresponding potential abnormal data point, taking the potential abnormal data point as the end point, and retrospectively intercepting a specified segment of KPI data in chronological order as candidate KPI abnormal data;
  • the adjustment time unit is used to adjust the candidate KPI abnormal data and the preset known KPI abnormal data, so that the candidate KPI abnormal data and the time sequence in the known KPI abnormal data correspond to the length of time Consistent
  • the calculating distance unit is used to calculate similar distances between the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data to obtain multiple regular path distances, and the regular path distances are all adjusted The sum of the distances between the candidate KPI abnormal data and all similar data points in the known KPI abnormal data;
  • An abnormality judging unit for judging whether the candidate KPI abnormal data is KPI abnormal data according to the distance of each regular path;
  • the abnormal marking unit is used to determine that the candidate KPI abnormal data is KPI abnormal data, and then the candidate KPI abnormal data is marked as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and when the processor executes the computer program, a method for obtaining KPI abnormal data samples is implemented;
  • the method for obtaining KPI abnormal data samples includes:
  • KPI data for a preset time period, where the KPI data is data of a time series data type, each piece of KPI data includes multiple data points, and each unit time corresponds to one data point in chronological order;
  • Similar distance calculations are performed on the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data to obtain a plurality of regular path distances.
  • the regular path distances are the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data. The sum of the distances of all similar data points in the known KPI abnormal data;
  • the candidate KPI abnormal data is marked as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • This application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, a method for obtaining KPI abnormal data samples is realized;
  • the method for obtaining KPI abnormal data samples includes:
  • KPI data for a preset time period, where the KPI data is data of a time series data type, each piece of KPI data includes multiple data points, and each unit time corresponds to one data point in chronological order;
  • Similar distance calculations are performed on the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data to obtain a plurality of regular path distances.
  • the regular path distances are the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data. The sum of the distances of all similar data points in the known KPI abnormal data;
  • the candidate KPI abnormal data is marked as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • FIG. 1 is a schematic diagram of the steps of a method for obtaining KPI abnormal data samples in an embodiment of this application;
  • FIG. 2 is a schematic block diagram of the structure of an apparatus for obtaining KPI abnormal data samples in an embodiment of the application
  • FIG. 3 is a schematic block diagram of the structure of a computer device according to an embodiment of the application.
  • the method for obtaining KPI abnormal data samples in this embodiment includes:
  • Step S1 Obtain KPI data for a preset time period, where the KPI data is data of a time series data type, each piece of KPI data includes multiple data points, and each unit time corresponds to one data point in chronological order;
  • Step S2 Perform anomaly detection on the KPI data to obtain corresponding potential anomaly data points, with the potential anomaly data point as an end point, backtrackingly intercept a specified segment of KPI data in chronological order as candidate KPI anomaly data;
  • Step S3 Adjust the candidate KPI abnormal data and the preset known KPI abnormal data, so that the candidate KPI abnormal data and the known KPI abnormal data correspond to the same length of time;
  • Step S4 Perform similar distance calculations on the adjusted candidate KPI abnormal data and corresponding data points in the known KPI abnormal data to obtain multiple regular path distances, where the regular path distances are the adjusted candidate KPIs The sum of the distance between the abnormal data and all similar data points in the known KPI abnormal data;
  • Step S5 Judging whether the candidate KPI abnormal data is KPI abnormal data according to the distance of each regular path;
  • Step S6 If yes, mark the candidate KPI abnormal data as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • the above-mentioned intelligent KPI anomaly recognition model is usually applied in the field of intelligent operation and maintenance, for example, in Internet services.
  • the intelligent KPI anomaly recognition model can directly identify abnormal KPI data from millions to tens of millions of data.
  • the abnormal KPI samples used to train and evaluate the model algorithm can be obtained through the method for obtaining abnormal KPI data samples provided in this application.
  • the above KPI (key performance indicator) data is a time series data type, and a piece of KPI data can be in accordance with Multiple data points generated in time sequence, each data point corresponds to a unit of time, such as CPU usage, the data points corresponding to each second within 5 seconds are 20%, 23%, 21%, 21%, 20 %, that is, a piece of KPI data is 20%, 23%, 21%, 21%, 20%, and the corresponding time series are the first second, the second second, the third second, the fourth second, and the fifth second in sequence; If one of the data is significantly different from the rest of the data, the KPI data in this segment can be considered as abnormal data, such as 20%, 23%, 23%, 21%, 81%. Since 81% of abnormal data occurs, this segment is abnormal. data.
  • step S1 obtain KPI data for a preset time period.
  • the preset time period may be one minute.
  • multiple different servers can be continuously monitored at the same time, and then KPIs are obtained every one minute.
  • Data such as acquiring multiple pieces of KPI data at the same time for the same server, or acquiring multiple pieces of different KPI data from multiple servers at the same time.
  • step S2 perform anomaly detection on the acquired KPI data to obtain candidate KPI anomaly data.
  • perform anomaly detection on the KPI data such as calculating through a preset algorithm or filtering through preset rules, to obtain Correspond to the potential abnormal data point of each KPI data, and then use the potential abnormal data point as the end point of the fragment, and then retrospectively intercept a specified fragment of KPI data as candidate KPI abnormal data in chronological order, that is, the above candidate KPI abnormal data can be a piece of data
  • multiple pieces of candidate KPI abnormal data or one piece of candidate KPI abnormal data can be obtained from a piece of KPI data.
  • the abnormal template can be matched with the candidate KPI abnormal data to obtain the candidate KPI abnormal data with higher similarity to the abnormal template, that is, the KPI abnormal Data
  • the above abnormal template is known KPI abnormal data, that is, the known KPI abnormal data is also data of the time series data type.
  • the dynamic time warping algorithm can be used to calculate the KPI abnormal data, and the dynamic time warping is through The similarity between the two time series points is compared to obtain the similarity between the two time series.
  • the candidate KPI abnormal data and the known KPI abnormal data can be adjusted to make Candidate KPI anomaly data is consistent with the time sequence corresponding to the known KPI anomaly data, that is, first stretch or shrink the two to the same time length, and then calculate the multiple regular path distances between the two, and regularize the path
  • the distance is the sum of the distance between the candidate KPI anomaly data and all similar data points in the known KPI anomaly data, and the above distance can be the Euclidean distance.
  • steps S5-S6 judge whether the candidate KPI abnormal data is abnormal KPI data according to the distance of each regularization path. Since the smaller the regularization path distance, the greater the possibility that the candidate KPI abnormal data is abnormal data.
  • the minimum regular path distance is compared with a preset value, and if it is less than the preset value, the candidate KPI abnormal data corresponding to the regular path distance should be used as the KPI abnormal data. After the KPI abnormal data is obtained, the samples are marked, so that the marked KPI abnormal data is used as the sample for training the intelligent KPI abnormal recognition model.
  • the method for obtaining KPI abnormal data samples can obtain a large number of KPI abnormal samples through a small number of known KPI abnormal data samples, combined with anomaly detection and similarity algorithms. Compared with the traditional collection of labeled samples, it can be significantly Di reduces the operator's marking time by more than 90%, greatly reducing operation and maintenance costs.
  • step S2 includes:
  • Step S21 preprocessing the KPI data to obtain multiple pieces of standardized KPI data
  • Step S22 Perform feature extraction on the standardized KPI data to obtain feature information corresponding to each data point;
  • Step S23 Calculate the feature information through a preset isolated forest algorithm to obtain a feature score corresponding to the feature information
  • Step S24 Compare the feature score with a preset first threshold, and when the feature score exceeds the first threshold, use the data point corresponding to the feature score as the potential abnormal data point; otherwise, The data point corresponding to the feature score is regarded as a normal data point.
  • the KPI data is preprocessed. Specifically, the KPI data is first searched. Because the monitoring system may not receive the data, some data point values are lost. If the search is found, the value of the data point is lost. In the case of data, you can use linear interpolation to fill in based on adjacent data points, for example, get adjacent data, then average, and insert the average into the position of the missing data point. In addition, since KPIs may come from different servers, these KPI data can be standardized to eliminate scale differences and prepare for subsequent calculations. For example, the KPI data collected by each server can be calculated as the mean and standard deviation, and then Each piece of KPI data is subtracted from the mean value and then divided by the standard deviation to obtain standardized KPI data.
  • step S22 in order to apply the isolated forest algorithm, it is first necessary to extract anomalous features. That is, it is necessary to perform feature extraction on standardized KPI data to obtain feature information corresponding to each data point.
  • the feature information includes feature values.
  • a time series prediction model can be used as a feature extractor.
  • the above step S22 includes:
  • Step S221 Obtain historical data corresponding to the KPI data, where the historical data is KPI data prior to the preset time period in chronological order;
  • Step S222 Input the historical data into a preset time series prediction model for calculation to obtain predicted KPI data corresponding to the preset time period;
  • Step S223 Calculate each data point in the predicted KPI data and the standardized KPI data to obtain the absolute difference between the actual value and the predicted value of each data point;
  • Step S224 Use the absolute difference as the characteristic information of each data point.
  • the predicted KPI data can be calculated by inputting historical data into the time series prediction model. Specifically, first obtain each The historical data corresponding to the KPI data.
  • the historical data is the KPI data before the above preset time period in chronological order.
  • the preset time period is the data within the current minute
  • the historical data can be the data back to the previous minute. Or going back to the data two minutes ago, the time length of the historical data is also one minute. Then input the historical data into the preset time series forecasting model for calculation to obtain the forecast KPI data corresponding to the aforementioned preset time period.
  • the aforementioned time series forecasting model can be calculated by using existing algorithms, such as moving average, which will not be repeated here. .
  • moving average which will not be repeated here.
  • normal data points can be well predicted due to their small errors from predicted data points and conform to expected behaviors, while abnormal data points with unexpected patterns are difficult to predict, resulting in larger prediction errors. Therefore, each data point in the predicted KPI data and the standardized KPI data is calculated.
  • the data point of the above KPI data is a numerical value, and the numerical value is used to represent a preset index, such as 20%, 23%, 21%, 21%, 20%;
  • the absolute difference between the actual value and the predicted value of each data point is obtained, and the absolute difference is used as the characteristic of each data point information; prediction set value of the data point x i p i, to calculate the absolute difference between the actual and predicted values for each data point, i.e.,
  • time series forecasting models can be used to make predictions at the same time.
  • six algorithms with lower computational complexity and good performance can be used to calculate at the same time, respectively.
  • moving average, weighted moving average, exponential weighted moving average, autoregressive integrated moving average and Holt Winters six corresponding predicted values are obtained, and then all are calculated with the actual values of the data points to obtain the characteristic matrix X(n ⁇ 6).
  • the feature information is calculated by the preset isolation forest algorithm to obtain the feature score corresponding to the feature information. It is necessary to know that the isolation forest algorithm randomly selects a feature and the minimum and maximum values of the selected feature. Split the value to isolate the observation result. Normally, only a few conditions are needed to separate abnormal data, while more conditions are needed to separate normal data. Therefore, the feature score of KPI abnormal data can be the number of conditions required to separate the specified observations.
  • the recursive partition of the isolated forest algorithm can be represented by a tree structure (iTree). The number of splits required to separate KPI data is equivalent to the path length from the root node to the end node.
  • the abnormal KPI data is the shorter average path length on the itree
  • each terminal node in the isolated forest algorithm has a score between 0 and 1.
  • the score is calculated based on its path length. The higher the score, the greater the possibility of anomaly. Since the isolated forest algorithm is an existing technology, the specific calculation process will not be repeated here.
  • an appropriate threshold ⁇ can be selected.
  • the threshold is called the first threshold, and the feature score is compared with the preset first threshold. If the feature score X i of the data point is greater than ⁇ , the data point is regarded as a potential abnormal data point; otherwise, the data point corresponding to the feature score is regarded as a normal data point.
  • the selection of the first threshold requires a trade-off between high recall rate and high accuracy, and the F score is generally used as a metric, that is, to select as many candidate KPI abnormal data as possible from the KPI data to avoid Missed, because the percentage of abnormal data points in the KPI data is very small, usually about 1%. That is, most of the data points may be normal, so most of the data points have low feature scores, and only a few points have high feature scores. Therefore, the abnormal score of the 85% quantile can be selected as the first threshold.
  • step S4 includes:
  • Step S41 Find multiple regular paths according to the candidate KPI abnormal data and each data point in the known KPI abnormal data, where the regular paths are expressed as w k represents the k-th path, and max(
  • the actual time length of the abnormal data k is the adjusted time length between the candidate KPI abnormal data and the known KPI abnormal data
  • w k (i, j), which represents the time sequence of the candidate KPI abnormal data i data points and the j-th point in the known KPI abnormal data time series constitute the k-th path;
  • Step S42 Calculate the sum of the distances of the corresponding data points according to the regular paths to obtain the distances of the regular paths.
  • the dynamic time warping technology is used to obtain the distance of the regular path.
  • multiple regular paths are found based on the candidate KPI abnormal data and the known KPI abnormal data.
  • KPI abnormal data are all time series data
  • the candidate KPI abnormal data is marked as the first time series c
  • the corresponding data point is c j
  • the known KPI abnormal data is marked as the second time series q
  • the corresponding data point is q i ;
  • the above regular path is expressed as w 2 represents the path of the first step
  • w 3 represents the path of the second step
  • the j-th point in the second time series is similar, and the sum of the distances of all similar points is used as the regular path distance, and the regular path distance is used to measure the similarity of the two time series.
  • the next element of the path can only be one of (i+1, j), (i, j+1), (i+1, j+1), and That is, the first step is to calculate the distance between the first data point of the first time series and the first data point of the second time series; the second step Or (2,1) or (2,2), that is, the second step is to calculate the distance between the first data of the first time series and the second data point of the second time series, or to calculate the second data point of the first time series The distance between the data and the first data of the second time series, or calculate the distance between the second data of the first time series and the second data of the second time series, and so on, until the calculation to the kth step w k , to ensure that each data point of the first time series and the second time series is calculated once to obtain multiple regular paths.
  • step S42 the sum of the distances of the corresponding data points is calculated according to each regular path, and the distance of each regular path is obtained.
  • Dist(i,j) represents the distance between the i-th point in the first time series and the j-th point in the second time series.
  • Dist(i,j) measures the similarity between the first i points of the first time series and the first j points of the second time series; the final regular path distance is D(
  • step S42 includes:
  • Step S421 Calculate the distance between the candidate KPI abnormal data and the known KPI abnormal data in sequence according to each of the regularization paths;
  • Step S422 When the data point distance is greater than the second threshold, stop calculating the current regularized path, and obtain the next regularized path to recalculate until all the regularized paths are calculated, and each regularized path is calculated. Regularize the path distance.
  • the calculation process since the entire process of calculating the distance through a regular path is relatively large and complex, in order to improve efficiency and speed up the calculation, in the calculation process, if the distance between the data points in the two time series on a regular path is too far, it can be directly Stop the calculation, and then repeat the calculation of the next regularization path. Specifically, calculate the distance between the candidate KPI abnormal data and the known KPI abnormal data according to each regularization path, if the data point i of the first time series and the second time series If the data points j are too far apart, the matching degree of q i and c j will be too low, and continuing to calculate according to the ruled path will waste resources, so you can directly stop calculating the ruled path.
  • the second threshold is determined by the ruled time window w
  • w When calculating the distance, first judge whether the distance between the data point i of the first time series and the data point j of the second time series is less than w, if
  • m is the length of the abnormal template, that is, the length of the known KPI abnormal data.
  • step S42 includes:
  • Step S423 Calculate the path distance value of each step according to the current regular path, and calculate a specified value according to the characteristic information of the candidate KPI abnormal data;
  • Step S424 Use the minimum value of the path distance values as the optimal value, and determine whether the specified value is greater than the optimal value;
  • Step S425 If yes, stop calculating the current regularized path, and obtain the next regularized path to recalculate until all the regularized paths are calculated, and the distance of each regularized path is obtained.
  • the path distance value of each step in the current regularized path is calculated, and the minimum value of the path distance value is used as the optimal value, and the characteristic information of the candidate KPI abnormal data is calculated at the same time A specified value, and then determine whether the specified value is greater than the optimal value, if it is greater than the optimal value, stop calculating the current regularization path, and then re-acquire the next regularization path calculation, repeat the above steps, until all regularization paths are calculated, get The distance of each path is adjusted so as to reduce unnecessary calculations and avoid wasting resources.
  • the above specified value can be calculated through the following steps:
  • Step S421' sequentially acquiring the largest feature value, the smallest feature value, the feature value of the first data point, and the feature value of the last data point in the candidate KPI abnormal data;
  • Step S422' Calculate the maximum square difference of the maximum eigenvalue, the minimum eigenvalue, the eigenvalue of the first data point and the eigenvalue of the last data point to obtain the specified value.
  • the above-mentioned feature information includes feature values, the maximum feature value and the minimum feature value in the time sequence of obtaining candidate KPI abnormal data, and the feature value of the first data point and the last data point in the time series Eigenvalue, and then find the maximum square difference of the four eigenvalues, the maximum square difference is the above specified value.
  • the above specified value can also be obtained through the following preset rules.
  • the above-mentioned first time series can be exchanged with the second time series, that is, q, c in the above formula can be exchanged; the first time series can also be deformed, and the regular time window w is Moving distance, each data point moves forward by w to obtain the upper limit of the first time series c, and each data point moves backward by w to obtain the lower limit of the first time series c, so that the corresponding upper limit U and U of the candidate KPI abnormal data can be obtained Lower limit L, and then obtain another specified value according to the above formula.
  • the method for obtaining KPI abnormal data samples provided by this application, through screening in the similarity matching process, not only has good robustness, fast iteration speed, but also avoids wasting computing resources and computing time, significantly shortening the response time, and at the same time
  • the hardware requirements are low, and the memory is small, which is convenient for operation and maintenance personnel to deploy.
  • this embodiment provides a device for obtaining KPI abnormal data samples.
  • the device corresponds to the above-mentioned method for obtaining KPI abnormal data samples.
  • the device includes:
  • the acquiring data unit 1 is used to acquire KPI data for a preset time period.
  • the KPI data is data of a time series data type. Each piece of KPI data includes multiple data points, and each unit of time corresponds to one data point in time sequence. data point;
  • the abnormality detection unit 2 is used to perform abnormality detection on the KPI data to obtain the corresponding potential abnormal data point, with the potential abnormal data point as the end point, backtracking and intercepting a specified segment of KPI data in chronological order as candidate KPI abnormal data ;
  • the adjustment time unit 3 is configured to adjust the candidate KPI abnormal data and the preset known KPI abnormal data, so that the candidate KPI abnormal data and the known KPI abnormal data correspond to the same time length;
  • the distance calculation unit 4 is configured to calculate similar distances between the adjusted candidate KPI abnormal data and the corresponding data points in the known KPI abnormal data to obtain multiple regular path distances, and the regular path distances are adjusted The sum of the distances between the candidate KPI abnormal data and all similar data points in the known KPI abnormal data;
  • the abnormality determining unit 5 is configured to determine whether the candidate KPI abnormal data is KPI abnormal data according to the distance of each regular path;
  • the abnormal marking unit 6 is used to determine that the candidate KPI abnormal data is KPI abnormal data, and then mark the candidate KPI abnormal data as KPI abnormal data as a sample for training the intelligent KPI abnormal recognition model.
  • the above-mentioned abnormality detection unit 2 includes:
  • the data processing subunit is used to preprocess the KPI data to obtain multiple standardized KPI data
  • the feature extraction subunit is used to perform feature extraction on the standardized KPI data to obtain feature information corresponding to each data point;
  • the score calculation subunit is configured to calculate the feature information through a preset isolated forest algorithm to obtain the feature score corresponding to the feature information;
  • the comparison score subunit is used to compare the feature score with a preset first threshold, and when the feature score exceeds the first threshold, the data point corresponding to the feature score is used as the potential abnormal data Otherwise, the data point corresponding to the feature score is regarded as a normal data point.
  • the above-mentioned feature extraction subunit includes:
  • An acquiring history module configured to acquire historical data corresponding to the KPI data, where the historical data is KPI data prior to the preset time period in chronological order;
  • the prediction data module is used to input the historical data into a preset time series prediction model for calculation to obtain the predicted KPI data corresponding to the preset time period;
  • a difference calculation module configured to calculate each data point in the predicted KPI data and the standardized KPI data to obtain the absolute difference between the actual value and the predicted value of each data point;
  • the feature information module is used to use the absolute difference as the feature information of each data point.
  • the aforementioned distance calculation unit 4 includes:
  • Find the path subunit In English, find multiple regular paths according to the candidate KPI abnormal data and each data point in the known KPI abnormal data, where the regular path is expressed as w k represents the k-th path, and max(
  • the distance calculation subunit is used to calculate the sum of the distances of the corresponding data points according to each regular path, and obtain the distance of each regular path.
  • the foregoing distance calculation subunit includes:
  • the point distance calculation module is configured to sequentially calculate the data point distance between the candidate KPI abnormal data and the known KPI abnormal data according to each of the regularized paths;
  • the first calculation module is used to stop calculating the current regularized path when the distance of the data point is greater than the second threshold, and obtain the next regularized path and recalculate it until all the regularized paths are calculated according to all the regularized paths to obtain Each of the regular path distances.
  • the foregoing distance calculation subunit includes:
  • the path value calculation module is configured to calculate the path distance value of each step according to the current regular path, and calculate a specified value according to the characteristic information of the candidate KPI abnormal data;
  • the determination module is configured to use the minimum value among the path distance values as the optimal value, and determine whether the specified value is greater than the optimal value;
  • the second calculation module is configured to stop calculating the current regularized path, and obtain the next regularized path for recalculation, if it is, until all the regularized paths are completely calculated, and each regularized path distance is obtained.
  • the feature acquiring module is used to sequentially acquire the largest feature value, the smallest feature value, the feature value of the first data point, and the feature value of the last data point in the candidate KPI abnormal data;
  • the variance finding module is used for finding the maximum square difference of the maximum eigenvalue, the minimum eigenvalue, the eigenvalue of the first data point and the eigenvalue of the last data point to obtain the specified value.
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used to store all the data required for obtaining the abnormal KPI samples described above.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to implement the method for obtaining KPI abnormal data samples in any of the above embodiments.
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium.
  • the computer storage medium may be non-volatile or volatile.
  • a computer program is stored thereon. The computer program implements any of the foregoing when executed by a processor.
  • a method of obtaining KPI abnormal data samples according to an embodiment.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in a variety of forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Hardware Design (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本申请涉及人工智能技术领域,应用于智慧城市中,提出的获取KPI异常数据样本的方法、装置、计算机设备和存储介质,其中方法包括:获取预设时间段的KPI数据;将KPI数据进行异常检测得到潜在异常数据点,以潜在异常数据点为终点,按时间顺序回溯截取一指定片段KPI数据作为候选KPI异常数据;对候选KPI异常数据与已知KPI异常数据进行调整,使两者时间长度一致,并将两者的对应数据点进行相似距离计算得到多个规整路径距离;依据各规整路径距离判断候选KPI异常数据是否为KPI异常数据;若是,则将候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本,通过少量已知的KPI异常数据样本即可得到大量的KPI异常数据样本,大大地降低运维成本。

Description

获取KPI异常数据样本的方法、装置和计算机设备
本申请要求于2020年7月31日提交中国专利局、申请号为202010762945.2,发明名称为“获取KPI异常数据样本的方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及到人工智能的技术领域,特别是涉及到一种获取KPI异常数据样本的方法、装置、计算机设备和存储介质。
背景技术
互联网服务如搜索引擎和网上购物,已经成为我们日常生活中不可或缺的一部分,为了确保业务不中断,通常运营商都需要密切监控各种KPI(关键性能指标),例如搜索响应时间、CPU使用率等,以准确检测KPI异常并及时启动故障排除和解决措施。
KPI异常检测是智能运维领域中非常重要的一环,发明人发现,在智能运维领域中,通常采用KPI异常算法来预测KPI的异常情况,这样就需要大量的KPI异常样本数据来帮助训练和评估算法,但目前标记异常的KPI数据需要专业知识以及大量的劳动力,不但耗费大量的人力物力资源,而且效率较低。
技术问题
本申请的主要目的为提供一种获取KPI异常数据样本的方法、装置、计算机设备和存储介质,旨在解决现有技术中标记KPI异常样本耗时耗力的技术问题。
技术解决方案
基于上述发明目的,本申请提出一种获取KPI异常数据样本的方法,包括:获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
本申请还提供一种获取KPI异常数据样本的装置,包括:
获取数据单元,用于获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
异常检测单元,用于将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
调整时间单元,用于对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据中的时间序列所对应的时间长度一致;
计算距离单元,用于对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的 对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
判断异常单元,用于依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
标记异常单元,用于判定所述候选KPI异常数据为KPI异常数据,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种获取KPI异常数据样本的方法;
其中,所述获取KPI异常数据样本的方法包括:
获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种获取KPI异常数据样本的方法;
其中,所述获取KPI异常数据样本的方法包括:
获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
附图说明
图1为本申请一实施例中获取KPI异常数据样本的方法的步骤示意图;
图2为本申请一实施例中获取KPI异常数据样本的装置的结构示意框图;
图3为本申请一实施例的计算机设备的结构示意框图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
本发明的最佳实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
参照图1,本实施例中的获取KPI异常数据样本的方法,包括:
步骤S1:获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
步骤S2:将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
步骤S3:对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
步骤S4:对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
步骤S5:依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
步骤S6:若是,则将对所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
本实施例中,上述智能KPI异常识别模型通常应用于智能运维领域,例如互联网服务中,通过智能KPI异常识别模型可直接从数百万到千万的数据量中将异常的KPI数据识别出来,而用于训练和评估模型算法的KPI异常样本,则可以通过本申请提供的获取KPI异常数据样本的方法获得,上述KPI(关键性能指标)数据为时间序列数据类型,一段KPI数据可以为按照时间顺序产生的多个数据点,每个数据点对应一个单位时间,例如CPU使用率,在5秒内每一秒所对应的数据点依次为20%、23%、21%、21%、20%,也即一段KPI数据为20%、23%、21%、21%、20%,其对应的时间序列依次为第一秒、第二秒、第三秒、第四秒以及第五秒;若其中一个数据与其余数据差别较大,则可以认为该段KPI数据为异常数据,例如20%、23%、23%、21%、81%,由于出现异常数据81%,则该段为异常数据。
如上述步骤S1所述,获取预设时间段的KPI数据,上述预设时间段可以为一分种,本实施例中,可同时持续监控多个不同的服务器,然后每隔一分钟获取一次KPI数据,例如针对同一个服务器同时获取多条KPI数据,或者同时获取多个服务器的多条不同KPI数据。
如上述步骤S2所述,将获取到的KPI数据进行异常检测,从中获得候选KPI异常数据,首先对KPI数据进行异常检测,例如通过预设的算法进行计算或者通过预设规则进行筛选,从而得到对应各KPI数据的潜在异常数据点,然后以该潜在异常数据点作为片段的终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据,也即上述候选KPI异常数据可以为一条数据的一个片段,本实施例中,可以从一条KPI数据中得到多段候选KPI异常数据或者一段候选KPI异常数据。
如上述步骤S3-S4所述,获得候选KPI异常数据后,可通过异常模板与候选KPI异常数据进行相似度匹配,从而获得与异常模板相似度较高的候选KPI异常数据,也即得到KPI异常数据,上述异常模板为已知的KPI异常数据,也即该已知KPI异常数据也是时间序列数据类型的数据,举例地,可采用动态时间规整算法来计算得到KPI异常数据,动态时间规整是通过对两个时间序列点之间的相似性进行比较,进而得到两个时间序列的相似性,本实施例中,为了计算便利,可对候选KPI异常数据与已知KPI异常数据进行调整,以使候选KPI异常数据与已知KPI异常数据中的时间序列所对应的时间长度一致,即首先将两者进行拉伸或伸缩到相同时间长度,再计算出两者的多个规整路径距离,规整路径距离为候选KPI异常数据与已知KPI异常数据中所有相似数据点的距离之和,上述距离可以为欧式距离。
如上述步骤S5-S6所述,依据各规整路径距离判断候选KPI异常数据是否为KPI异常数据,由于规整路径距离越小,候选KPI异常数据为异常数据的可能性越大,这时可将其中最小规整路径距离与一预设值进行比较,若小于预设值,则该将规整路径距离所对应的候选KPI异常数据作为KPI异常数据。得到KPI异常数据后,对其进行样本标记,使得标记后的KPI异常数据作为训练智能KPI异常识别模型的样本。
本申请提供的获取KPI异常数据样本的方法,通过少量已知的KPI异常数据样本,结合异常检测和相似度算法即可获得大量的KPI异常样本,与传统的认为采集标记样本相比,可以显著滴减少操作者90%以上的标记时间,大大地降低运维成本。
在一个实施例中,上述步骤S2,包括:
步骤S21:将所述KPI数据进行预处理,得到多条标准化的KPI数据;
步骤S22:对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息;
步骤S23:将所述特征信息通过预设的孤立森林算法计算,得到所述特征信息对应的特征分数;
步骤S24:将所述特征分数与预设的第一阈值进行比较,当所述特征分数超过所述第一阈值,则将所述特征分数对应的数据点作为所述潜在异常数据点,否则,将所述特征分数对应的数据点作为正常数据点。
如上述步骤S21所述,将KPI数据进行预处理,具体而言,首先对KPI数据进行查找,由于监视***可能会出现接收不到数据的情况,导致一些数据点值丢失,若查找到出现丢失数据的情况,可基于相邻的数据点使用线性插值来填充,如,获取相邻数据,然后求平均值,将平均值***丢失数据点的位置。此外,由于KPI可能来自不同的服务器,可将这些KPI数据进行标准化,以消除规模差异,为后续计算做好准备,例如对每个服务器所采集的KPI数据,分别求得均值和标准差,然后每段KPI数据都减去均值再除以标准差,得到标准化后的KPI数据。
如上述步骤S22所述,为了应用孤立森林算法,首先需要提取异常特征。也即需要对标准化后的KPI数据进行特征提取,得到对应各数据点的特征信息,上述特征信息包括特征值,具体而言,可采用时间序列预测模型作为特征抽取器,在一个实施例中,上述步骤S22,包括:
步骤S221:获取所述KPI数据对应的历史数据,所述历史数据为按时间顺序处于所述预设时间段之前的KPI数据;
步骤S222:将所述历史数据输入至预设的时间序列预测模型进行计算,得到对应所述预设时间段的预测KPI数据;
步骤S223:将所述预测KPI数据与所述标准化后的KPI数据中的各个数据点进行计算,得到各所述数据点的实际值与预测值之间的绝对差;
步骤S224:以所述绝对差作为各所述数据点的特征信息。
本实施例中,需要将实际的候选KPI数据与预测的KPI数据进行计算,以此获得特征信息,而预测的KPI数据可由历史数据输入到时间序列预测模型计算得到,具体而言,首先获取各KPI数据对应的历史数据,该历史数据为按时间顺序处于上述预设时间段之前的KPI数据,例如预设时间段为当前一分钟内的数据,历史数据可以为回溯到前一分钟的数据,或者回溯到两分钟前的数据,历史数据的时间长度也是一分钟。然后将历史数据输入至预设的时间序列预测模型进行计算,得到对应上述预设时间段的预测KPI数据,上述时间序列预测模型可采用现有的算法计算,例如移动平均,此处不再赘述。一般来说,正常的数据点由于与预测数据点的误差较小,符合预期行为,可以很好地预测,而具有非预期模式的异常数据点很难预测,从而产生较大的预测误差。故而将预测KPI数据与标准化后的KPI数据中的各个数据点进行计算,本实施例中,上述KPI数据的数据点为数值,该数值用于代表预设指标,如上述例子中的20%、23%、21%、21%、20%;将上述两段数据的各个数据点进行比较后,得到各数据点的实际值与预测值之间的绝对差,以绝对差作为各数据点的特征信息;设数据点x i的预测值为p i,计算出各数据点的实际值与预测值之间的绝对差,即|p i-x i|,以此作为数据点的特征信息。
在另一个实施例中,为了提高准确性,可同时采用多个不同的时间序列预测模型来进行预测,例如可同时采用六种具有较低的计算复杂度和良好的性能的算法来计算,分别为差分、移动平均、加权移动平均、指数加权移动平均、自回归综合移动平均和Holt Winters, 得到六个对应的预测值,然后均与数据点的实际值进行计算,得到特征矩阵X(n×6)。
如上述步骤S23所述,将特征信息通过预设的孤立森林算法计算,得到特征信息对应的特征分数,需知孤立森林算法通过随机选择一个特征和所选特征的最小值和最大值之间的分割值来隔离观察结果。通常情况下,分离异常数据只需要少数几个条件,而分离正常数据则需要更多的条件,因此,KPI异常数据的特征分数可以是分离指定的观测值所需的条件数。孤立森林算法的递归分区可以用树结构(iTree)来表示,分离KPI数据所需的拆分次数相当于从根节点到终止节点的路径长度,异常的KPI数据则是itree上平均路径长度较短的实例,孤立森林算法中的每个终端节点都有一个在0~1之间的分数,该分数是根据其路径长度计算的,得分越高,异常的可能性越大。由于孤立森林算法为现有技术,具体的计算过程此处不再赘述。
如上述步骤S24所述,为了根据特征分数来检测潜在的异常,可以选择一个合适的阈值θ,此处将该阈值名为第一阈值,将特征分数与预设的第一阈值进行比较,若数据点的特征分数X i大于θ,则将该数据点作为潜在异常数据点,否则,将特征分数对应的数据点作为正常数据点。
通常,第一阈值的选择需要在高召回率和高精度之间进行权衡,并且一般使用F分数作为度量标准,也即从KPI数据中尽量多地选出具有异常情形的候选KPI异常数据,避免错过,由于在KPI数据中,其异常数据点所占的百分比非常小,通常约为1%。也即大多数数据点可能是正常的,故大部分数据点的特征分数较低,只有少数点有较高的特征分数,因此,可选择第85%分位的异常分数作为第一阈值。
在一个实施例中,上述步骤S4,包括:
步骤S41:依据所述候选KPI异常数据以及所述已知KPI异常数据的中各个数据点找出多个规整路径,其中,所述规整路径表示为
Figure PCTCN2020118528-appb-000001
w k表示第k步路径,且max(|X|,|Y|)≤k≤|X|+|Y|,X为所述候选KPI异常数据的实际时间长度,Y为所述已知KPI异常数据的实际时间长度,k为所述候选KPI异常数据与所述已知KPI异常数据调整后的时间长度,w k=(i,j),表示所述候选KPI异常数据的时间序列中第i个数据点与所述已知KPI异常数据时间序列中第j个点构成第k步路径;
步骤S42:按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离。
本实施例中,采用动态时间规整技术来得到规整路径距离,具体而言,依据候选KPI异常数据以及已知KPI异常数据的中各个数据点找出多个规整路径,由于候选KPI异常数据以及已知KPI异常数据均为时间序列数据,为了便于描述,此处将上述候选KPI异常数据记为第一时间序列c,对应的数据点为c j,已知KPI异常数据记为第二时间序列q,对应的数据点为q i;上述规整路径表示为
Figure PCTCN2020118528-appb-000002
w 2表示第1步路径,w 3表示第2步路径,w k表示第k步路径,其中w k=(i,j),即第k步中认为第一段时间序列第i个点和第二段时间序列第j个点是相似的,所有相似点的距离之和作为规整路径距离,用规整路径距离来衡量两个时间序列的相似性。规整路径距离越小,相似度越高。
对于规整路径,其中,max(|X|,|Y|)≤k≤|X|+|Y|,X为候选KPI异常数据的实际时间长度,Y为已知KPI异常数据的实际时间长度,k为候选KPI异常数据与已知KPI异常数据调整后的时间长度;规整路径必须从
Figure PCTCN2020118528-appb-000003
开始,到
Figure PCTCN2020118528-appb-000004
结束,规整路径
Figure PCTCN2020118528-appb-000005
中的i和j必须是单调递增的,所谓单调递增指的是:w k=(i,j),w k+1=(i′,j′),i≤i′≤i+1,j≤j′≤j+1;
如果路径已经通过了元素(i,j),那么路径下一个元素只能是(i+1,j),(i,j+1),(i+1,j+1)中的一个,也即第一步先计算第一时间序列的第一个数据点与第二时间序列的第一个数据点的距离;第二步
Figure PCTCN2020118528-appb-000006
或者(2,1)或者(2,2),也即第二步计算第一时间序列 的第一个数据与第二时间序列的第二数据点的距离,或者计算第一时间序列的第二个数据与第二个时间序列的第一个数据的距离,或者计算第一时间序列的第二个数据与第二时间序列的第二个数据的距离,以此类推,直至计算到第k步w k,以保证第一时间序列和第二时间序列的每个数据点都计算过一次,从而得到多个规整路径。
如上述步骤S42所述,按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离,具体而言,上述路径规整距离所对应的距离矩阵:D(i,j)=Dist(i,j)+min{D(i-1,j),D(i,j-1),D(i-1,j-1)};
其中,Dist(i,j)表示第一时间序列的第i个点与第二时间序列的第j个点之间的距离。Dist(i,j)衡量的是第一时间序列前i个点与第二时间序列前j个点的相似性;最终的规整路径距离为D(|X|,|Y|),D(|X|,|Y|)的值越小,两个时间序列的相似性越大。
在一个实施例中,上述步骤S42,包括:
步骤S421:按照各所述规整路径依次计算所述候选KPI异常数据与所述已知KPI异常数据的数据点距离;
步骤S422:当所述数据点距离大于第二阈值,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
本实施例中,由于通过规整路径来计算距离的整个过程运算较为庞大复杂,为了提高效率,加快运算,在计算过程中,若一规整路径上两个时间序列中数据点距离太远时可以直接停止计算,然后重复计算下一条规整路径,具体而言,按照各规整路径依次计算候选KPI异常数据与已知KPI异常数据的数据点距离,若第一时间序列的数据点i和第二时间序列的数据点j相距太远,则q i和c j匹配度会太低,继续按该规整路径计算会浪费资源,故而可以直接停止计算该条规整路径,上述第二阈值由规整时间窗w决定,在计算距离时,先判断第一时间序列的数据点i和第二时间序列的数据点j的距离是否小于w,如果|i-j|≤w,则继续按规整路径计算,若大于w则停止计算该条规整路径,重新获取下一条规整路径计算,重复上述步骤,直至所有规整路径全部计算完毕,得到各规整路径距离。
其中,当w=0时,
Figure PCTCN2020118528-appb-000007
m是异常模板的长度,也即已知KPI异常数据的长度。当w≥m时,可将上述公式D(i,j)=Dist(i,j)+min{D(i-1,j),D(i,j-1,Di-1,j-1,改为Di,j=Disti,j+min{Di-1,j-1,Di-2,j-1,Di-1,j-1})。
在一个实施例中,上述步骤S42,包括:
步骤S423:按当前的所述规整路径计算出每一步的路径距离值,以及依据所述候选KPI异常数据的特征信息计算出一指定值;
步骤S424:以所述路径距离值中的最小值作为最佳值,并判断所述指定值是否大于所述最佳值;
步骤S425:若是,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
本实施例中,在得到规整路径之后,计算出当前的规整路径中每一步的路径距离值,并以其中路径距离值的最小值作为最佳值,同时依据候选KPI异常数据的特征信息计算出一指定值,然后判断指定值是否大于最佳值,若大于最佳值,则停止计算当前的规整路径,然后重新获取下一条规整路径计算,重复上述步骤,直至所有规整路径全部计算完毕,得到各规整路径距离,这样可以减少不必要的计算,避免浪费资源。其中,上述指定值可通过以下步骤计算得到:
步骤S421’:依次获取所述候选KPI异常数据中最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值;
步骤S422’:对所述最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值求最大平方差,得到所述指定值。
本实施例中,上述特征信息包括特征值,获取候选KPI异常数据这一段时间序列中的最大特征值和最小特征值,以及该时间序列中第一个数据点的特征值以及最后一个数据点的特征值,然后求四个特征值的最大平方差,该最大平方差即为上述指定值。
在另一个实施例中,上述指定值还可以通过以下预设规则获得,首先通过上述规整时间窗口来定义两个新的时间序列U,L:U i=max jq j;L i=min iq j;where j∈[i-w,i+w];也即对上述第二时间序列进行变形,以规整时间窗w为移动距离,各个数据点向前移动w,得到第二时间序列q的上限,各个数据点向后移动w,得到第二时间序列q的下限,其中,
Figure PCTCN2020118528-appb-000008
则该指定值为
Figure PCTCN2020118528-appb-000009
其中,I(·)是指示函数。
在另一实施例中,可以叫上述第一时间序列与第二时间序列进行交换,也即上式子中的q,c交换;同样可以对第一时间序列进行变形,以规整时间窗w为移动距离,各个数据点向前移动w,得到第一时间序列c的上限,各个数据点向后移动w,得到第一时间序列c的下限,从而可以获得候选KPI异常数据的对应的上限U和下限L,然后按上述公式获得另一个指定值。
在另一实施例中,在计算规整路径距离的时候,若在计算过程中计算出的距离值大于上述指定值,则会提前终止,将这一候选KPI异常数据减除,不再继续计算下去。
本申请提供的获取KPI异常数据样本的方法,通过在相似度匹配过程中进行筛选,不但鲁棒性好,迭代速度快,而且避免浪费计算资源以及计算时间,显著地缩短了响应时间,同时对硬件的要求低,占用内存很小,方便运维人员部署。
参照图2,本实施例中提供一种获取KPI异常数据样本的装置,该装置对应上述获取KPI异常数据样本的方法,该装置包括:
获取数据单元1,用于获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
异常检测单元2,用于将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
调整时间单元3,用于对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
计算距离单元4,用于对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
判断异常单元5,用于依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
标记异常单元6,用于判定所述候选KPI异常数据为KPI异常数据,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
在一个实施例中,上述异常检测单元2,包括:
数据处理子单元,用于将所述KPI数据进行预处理,得到多条标准化的KPI数据;
特征提取子单元,用于对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息;
计算分数子单元,用于将所述特征信息通过预设的孤立森林算法计算,得到所述特征信息对应的特征分数;
比较分数子单元,用于将所述特征分数与预设的第一阈值进行比较,当所述特征分数超过所述第一阈值,则将所述特征分数对应的数据点作为所述潜在异常数据点,否则,将所述特征分数对应的数据点作为正常数据点。
如上述特征提取子单元所述,为了应用孤立森林算法,首先需要提取异常特征。也即 需要对标准化后的KPI数据进行特征提取,得到对应各数据点的特征信息,上述特征信息包括特征值,具体而言,可采用时间序列预测模型作为特征抽取器,在一个实施例中,上述特征提取子单元,包括:
获取历史模块,用于获取所述KPI数据对应的历史数据,所述历史数据为按时间顺序处于所述预设时间段之前的KPI数据;
预测数据模块,用于将所述历史数据输入至预设的时间序列预测模型进行计算,得到对应所述预设时间段的预测KPI数据;
计算差值模块,用于将所述预测KPI数据与所述标准化后的KPI数据中的各个数据点进行计算,得到各所述数据点的实际值与预测值之间的绝对差;
特征信息模块,用于以所述绝对差作为各所述数据点的特征信息。
在一个实施例中,上述计算距离单元4,包括:
找出路径子单元,英语依据所述候选KPI异常数据以及所述已知KPI异常数据的中各个数据点找出多个规整路径,其中,所述规整路径表示为
Figure PCTCN2020118528-appb-000010
w k表示第k步路径,且max(|X|,|Y|)≤k≤|X|+|Y|,X为所述候选KPI异常数据的实际时间长度,Y为所述已知KPI异常数据的实际时间长度,k为所述候选KPI异常数据与所述已知KPI异常数据调整后的时间长度,w k=(i,j),表示所述候选KPI异常数据的时间序列中第i个数据点与所述已知KPI异常数据时间序列中第j个点构成第k步路径;
计算距离子单元,用于按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离。
在一个实施例中,上述计算距离子单元,包括:
计算点距模块,用于按照各所述规整路径依次计算所述候选KPI异常数据与所述已知KPI异常数据的数据点距离;
第一计算模块,用于所述数据点距离大于第二阈值,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
在一个实施例中,上述计算距离子单元,包括:
计算路值模块,用于按当前的所述规整路径计算出每一步的路径距离值,以及依据所述候选KPI异常数据的特征信息计算出一指定值;
判断定值模块,用于以所述路径距离值中的最小值作为最佳值,并判断所述指定值是否大于所述最佳值;
第二计算模块,用于若是,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
其中,上述指定值可通过以下模块计算得到:
获取特征模块,用于依次获取所述候选KPI异常数据中最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值;
求出方差模块,用于对所述最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值求最大平方差,得到所述指定值。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过***总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作***、计算机程序和数据库。该内存器为非易失性存储介质中的操作***和计算机程序的运行提供环境。该计算机设备的数据库用于存储上述获取KPI异常样本所需的所有数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现上述任一实施例的获取KPI异常数据样本的方法。
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构 的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请一实施例还提供一种计算机可读存储介质,所述计算机存储介质可以是非易失性,也可以是易失性,其上存储有计算机程序,计算机程序被处理器执行时实现上述任一实施例的获取KPI异常数据样本的方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储与一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM一多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。
以上所述仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种获取KPI异常数据样本的方法,其中,包括:
    获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
    将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
    对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
    对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
    依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
    若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
  2. 根据权利要求1所述的获取KPI异常数据样本的方法,其中,所述将所述KPI数据进行异常检测,得到对应的潜在异常数据点的步骤,包括:
    将所述KPI数据进行预处理,得到标准化的KPI数据;
    对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息;
    将所述特征信息通过预设的孤立森林算法计算,得到所述特征信息对应的特征分数;
    将所述特征分数与预设的第一阈值进行比较,当所述特征分数超过所述第一阈值,则将所述特征分数对应的数据点作为所述潜在异常数据点,否则,将所述特征分数对应的数据点作为正常数据点。
  3. 根据权利要求2所述的获取KPI异常数据样本的方法,其中,所述KPI数据的数据点为数值,所述数值用于代表预设指标,所述对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息的步骤,包括:
    获取所述KPI数据对应的历史数据,所述历史数据为按时间顺序处于所述预设时间段之前的KPI数据;
    将所述历史数据输入至预设的时间序列预测模型进行计算,得到对应所述预设时间段的预测KPI数据;
    将所述预测KPI数据与所述标准化后的KPI数据中的各个数据点进行对比,得到各所述数据点的实际值与预测值之间的绝对差;
    以所述绝对差作为各所述数据点的特征信息。
  4. 根据权利要求1所述的获取KPI异常数据样本的方法,其中,所述对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离的步骤,包括:
    依据所述候选KPI异常数据以及所述已知KPI异常数据的中各个数据点找出多个规整路径,其中,所述规整路径表示为
    Figure PCTCN2020118528-appb-100001
    w k表示第k步路径,且max(|X|,|Y|)≤k≤|X|+|Y|,X为所述候选KPI异常数据的实际时间长度,Y为所述已知KPI异常数据的实际时间长度,k为所述候选KPI异常数据与所述已知KPI异常数据调整后的时间长度,w k=(i,j),表示所述候选KPI异常数据的时间序列中第i个数据点与所述已知KPI异常数据时间序列中第j个点构成第k步路径;
    按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离。
  5. 根据权利要求4所述的获取KPI异常数据样本的方法,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按照各所述规整路径依次计算所述候选KPI异常数据与所述已知KPI异常数据的数据 点距离;
    当所述数据点距离大于第二阈值,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
  6. 根据权利要求4所述的获取KPI异常数据样本的方法,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按当前的所述规整路径计算出每一步的路径距离值,以及依据所述候选KPI异常数据的特征信息计算出一指定值;
    以所述路径距离值中的最小值作为最佳值,并判断所述指定值是否大于所述最佳值;
    若是,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
  7. 根据权利要6所述的获取KPI异常数据样本的方法,其中,所述特征信息包括特征值,所述依据所述候选KPI异常数据的特征信息计算出一指定值的步骤,包括:
    依次获取所述候选KPI异常数据中最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值;
    对所述最大的特征值、最小的特征值、第一个数据点的特征值以及最后一个数据点的特征值求最大平方差,得到所述指定值。
  8. 一种获取KPI异常数据样本的装置,其中,包括:
    获取数据单元,用于获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
    异常检测单元,用于将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
    调整时间单元,用于对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
    计算距离单元,用于对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
    判断异常单元,用于依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
    标记异常单元,用于判定所述候选KPI异常数据为KPI异常数据,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现一种获取KPI异常数据样本的方法;
    其中,所述获取KPI异常数据样本的方法包括:
    获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
    将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
    对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
    对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
    依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
    若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
  10. 根据权利要求9所述的计算机设备,其中,所述将所述KPI数据进行异常检测,得到对应的潜在异常数据点的步骤,包括:
    将所述KPI数据进行预处理,得到标准化的KPI数据;
    对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息;
    将所述特征信息通过预设的孤立森林算法计算,得到所述特征信息对应的特征分数;
    将所述特征分数与预设的第一阈值进行比较,当所述特征分数超过所述第一阈值,则将所述特征分数对应的数据点作为所述潜在异常数据点,否则,将所述特征分数对应的数据点作为正常数据点。
  11. 根据权利要求10所述的计算机设备,其中,所述KPI数据的数据点为数值,所述数值用于代表预设指标,所述对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息的步骤,包括:
    获取所述KPI数据对应的历史数据,所述历史数据为按时间顺序处于所述预设时间段之前的KPI数据;
    将所述历史数据输入至预设的时间序列预测模型进行计算,得到对应所述预设时间段的预测KPI数据;
    将所述预测KPI数据与所述标准化后的KPI数据中的各个数据点进行对比,得到各所述数据点的实际值与预测值之间的绝对差;
    以所述绝对差作为各所述数据点的特征信息。
  12. 根据权利要求9所述的计算机设备,其中,所述对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离的步骤,包括:
    依据所述候选KPI异常数据以及所述已知KPI异常数据的中各个数据点找出多个规整路径,其中,所述规整路径表示为
    Figure PCTCN2020118528-appb-100002
    w k表示第k步路径,且max(|X|,|Y|)≤k≤|X|+|Y|,X为所述候选KPI异常数据的实际时间长度,Y为所述已知KPI异常数据的实际时间长度,k为所述候选KPI异常数据与所述已知KPI异常数据调整后的时间长度,w k=(i,j),表示所述候选KPI异常数据的时间序列中第i个数据点与所述已知KPI异常数据时间序列中第j个点构成第k步路径;
    按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离。
  13. 根据权利要求12所述的计算机设备,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按照各所述规整路径依次计算所述候选KPI异常数据与所述已知KPI异常数据的数据点距离;
    当所述数据点距离大于第二阈值,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
  14. 根据权利要求13所述的计算机设备,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按当前的所述规整路径计算出每一步的路径距离值,以及依据所述候选KPI异常数据的特征信息计算出一指定值;
    以所述路径距离值中的最小值作为最佳值,并判断所述指定值是否大于所述最佳值;
    若是,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现一种获取KPI异常数据样本的方法;
    其中,所述获取KPI异常数据样本的方法包括:
    获取预设时间段的KPI数据,所述KPI数据为时间序列数据类型的数据,每一条KPI数据包括多个数据点,且按时间顺序每一单位时间对应一所述数据点;
    将所述KPI数据进行异常检测,得到对应的潜在异常数据点,以所述潜在异常数据点为终点,按时间顺序回溯截取一指定片段的KPI数据作为候选KPI异常数据;
    对所述候选KPI异常数据与预设的已知KPI异常数据进行调整,以使所述候选KPI异常数据与所述已知KPI异常数据所对应的时间长度一致;
    对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离,所述规整路径距离为调整后的所述候选KPI异常数据与所述已知KPI异常数据中所有相似数据点的距离之和;
    依据各所述规整路径距离判断所述候选KPI异常数据是否为KPI异常数据;
    若是,则将所述候选KPI异常数据标记成KPI异常数据,以作为训练智能KPI异常识别模型的样本。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述KPI数据进行异常检测,得到对应的潜在异常数据点的步骤,包括:
    将所述KPI数据进行预处理,得到标准化的KPI数据;
    对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息;
    将所述特征信息通过预设的孤立森林算法计算,得到所述特征信息对应的特征分数;
    将所述特征分数与预设的第一阈值进行比较,当所述特征分数超过所述第一阈值,则将所述特征分数对应的数据点作为所述潜在异常数据点,否则,将所述特征分数对应的数据点作为正常数据点。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述KPI数据的数据点为数值,所述数值用于代表预设指标,所述对标准化后的所述KPI数据进行特征提取,得到对应各数据点的特征信息的步骤,包括:
    获取所述KPI数据对应的历史数据,所述历史数据为按时间顺序处于所述预设时间段之前的KPI数据;
    将所述历史数据输入至预设的时间序列预测模型进行计算,得到对应所述预设时间段的预测KPI数据;
    将所述预测KPI数据与所述标准化后的KPI数据中的各个数据点进行对比,得到各所述数据点的实际值与预测值之间的绝对差;
    以所述绝对差作为各所述数据点的特征信息。
  18. 根据权利要求15所述的计算机可读存储介质,其中,所述对调整后的所述候选KPI异常数据与所述已知KPI异常数据中的对应数据点进行相似距离计算得到多个规整路径距离的步骤,包括:
    依据所述候选KPI异常数据以及所述已知KPI异常数据的中各个数据点找出多个规整路径,其中,所述规整路径表示为
    Figure PCTCN2020118528-appb-100003
    w k表示第k步路径,且max(|X|,|Y|)≤k≤|X|+|Y|,X为所述候选KPI异常数据的实际时间长度,Y为所述已知KPI异常数据的实际时间长度,k为所述候选KPI异常数据与所述已知KPI异常数据调整后的时间长度,w k=(i,j),表示所述候选KPI异常数据的时间序列中第i个数据点与所述已知KPI异常数据时间序列中第j个点构成第k步路径;
    按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按照各所述规整路径依次计算所述候选KPI异常数据与所述已知KPI异常数据的数据点距离;
    当所述数据点距离大于第二阈值,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述按照各规整路径分别计算出对应的数据点的距离之和,得到各规整路径距离的步骤,包括:
    按当前的所述规整路径计算出每一步的路径距离值,以及依据所述候选KPI异常数据的特征信息计算出一指定值;
    以所述路径距离值中的最小值作为最佳值,并判断所述指定值是否大于所述最佳值;
    若是,则停止计算当前的所述规整路径,并获取下一所述规整路径重新计算,直至按照所有所述规整路径全部计算完毕,得到各所述规整路径距离。
PCT/CN2020/118528 2020-07-31 2020-09-28 获取kpi异常数据样本的方法、装置和计算机设备 WO2021139253A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010762945.2 2020-07-31
CN202010762945.2A CN111897695B (zh) 2020-07-31 2020-07-31 获取kpi异常数据样本的方法、装置和计算机设备

Publications (1)

Publication Number Publication Date
WO2021139253A1 true WO2021139253A1 (zh) 2021-07-15

Family

ID=73184128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118528 WO2021139253A1 (zh) 2020-07-31 2020-09-28 获取kpi异常数据样本的方法、装置和计算机设备

Country Status (2)

Country Link
CN (1) CN111897695B (zh)
WO (1) WO2021139253A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722176A (zh) * 2021-07-30 2021-11-30 银清科技有限公司 一种自适应的异常性能指标确定方法及装置
CN116074876A (zh) * 2023-03-07 2023-05-05 南京邮电大学 一种基于小波变换的通讯基站异常检测方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446647A (zh) * 2020-12-14 2021-03-05 上海众源网络有限公司 异常元素的定位方法、装置、电子设备及存储介质
CN112559602B (zh) * 2021-02-21 2021-07-13 北京工业大数据创新中心有限公司 一种工业设备征兆的目标样本的确定方法及***
CN116712049B (zh) * 2023-08-09 2023-10-20 临沂大学 一种运动数据采集处理方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107133343A (zh) * 2017-05-19 2017-09-05 哈工大大数据产业有限公司 基于时间序列近似匹配的大数据异常状态检测方法及装置
US9886559B1 (en) * 2012-02-24 2018-02-06 Cerner Innovation, Inc. Assessing fitness by entropy and bispectral analysis
CN109726731A (zh) * 2018-07-26 2019-05-07 平安科技(深圳)有限公司 用电量异常检测方法、装置、设备及可读存储介质
CN110286656A (zh) * 2019-05-07 2019-09-27 清华大学 一种错误数据容忍的虚警过滤方法和装置
CN111092891A (zh) * 2019-12-20 2020-05-01 杭州安恒信息技术股份有限公司 一种网络中异常点的检测方法、检测***及相关装置

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5074919B2 (ja) * 2007-12-27 2012-11-14 アズビル株式会社 標準時系列データ算出方法、異常検出方法、標準時系列データ算出装置、および異常検出装置
KR101621019B1 (ko) * 2015-01-28 2016-05-13 한국인터넷진흥원 시계열 통계 기반 공격의심 이상징후를 탐지하기 위한 방법
CN106572493B (zh) * 2016-10-28 2018-07-06 南京华苏科技有限公司 Lte网络中的异常值检测方法及***
JP7059684B2 (ja) * 2018-02-23 2022-04-26 トヨタ自動車株式会社 異常検知データ識別装置、および異常識別システム
CN109102005B (zh) * 2018-07-23 2020-08-18 杭州电子科技大学 基于浅层模型知识迁移的小样本深度学习方法
CN109032829B (zh) * 2018-07-23 2020-12-08 腾讯科技(深圳)有限公司 数据异常检测方法、装置、计算机设备及存储介质
US11579951B2 (en) * 2018-09-27 2023-02-14 Oracle International Corporation Disk drive failure prediction with neural networks
US10834106B2 (en) * 2018-10-03 2020-11-10 At&T Intellectual Property I, L.P. Network security event detection via normalized distance based clustering
CN111046892A (zh) * 2018-10-12 2020-04-21 北京京东尚科信息技术有限公司 异常识别方法和装置
CN109902721B (zh) * 2019-01-28 2024-07-02 平安科技(深圳)有限公司 异常点检测模型验证方法、装置、计算机设备及存储介质
CN110175247B (zh) * 2019-03-13 2021-06-08 北京邮电大学 一种优化基于深度学习的异常检测模型的方法
CN109992479A (zh) * 2019-03-31 2019-07-09 西安电子科技大学 一种多维度kpi数据异常定位方法、装置及计算机设备
CN110400052A (zh) * 2019-06-28 2019-11-01 北京汉迪移动互联网科技股份有限公司 关键性能指标的监控方法和装置
CN111198906A (zh) * 2019-12-20 2020-05-26 天阳宏业科技股份有限公司 一种数据处理方法、装置、***及存储介质
CN111065106B (zh) * 2019-12-31 2022-04-26 重庆大学 移动通信网络中基于异常检测和核密度估计kde的指标突变小区检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9886559B1 (en) * 2012-02-24 2018-02-06 Cerner Innovation, Inc. Assessing fitness by entropy and bispectral analysis
CN107133343A (zh) * 2017-05-19 2017-09-05 哈工大大数据产业有限公司 基于时间序列近似匹配的大数据异常状态检测方法及装置
CN109726731A (zh) * 2018-07-26 2019-05-07 平安科技(深圳)有限公司 用电量异常检测方法、装置、设备及可读存储介质
CN110286656A (zh) * 2019-05-07 2019-09-27 清华大学 一种错误数据容忍的虚警过滤方法和装置
CN111092891A (zh) * 2019-12-20 2020-05-01 杭州安恒信息技术股份有限公司 一种网络中异常点的检测方法、检测***及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHAO NENGWEN; ZHU JING; LIU RONG; LIU DAPENG; ZHANG MING; PEI DAN: "Label-Less: A Semi-Automatic Labelling Tool for KPI Anomalies", IEEE INFOCOM 2019 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 29 April 2019 (2019-04-29), pages 1882 - 1890, XP033561170, DOI: 10.1109/INFOCOM.2019.8737429 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722176A (zh) * 2021-07-30 2021-11-30 银清科技有限公司 一种自适应的异常性能指标确定方法及装置
CN116074876A (zh) * 2023-03-07 2023-05-05 南京邮电大学 一种基于小波变换的通讯基站异常检测方法

Also Published As

Publication number Publication date
CN111897695A (zh) 2020-11-06
CN111897695B (zh) 2022-06-17

Similar Documents

Publication Publication Date Title
WO2021139253A1 (zh) 获取kpi异常数据样本的方法、装置和计算机设备
CN111210024B (zh) 模型训练方法、装置、计算机设备和存储介质
WO2021139252A1 (zh) 运维故障根因识别方法、装置、计算机设备和存储介质
CN111259625B (zh) 意图识别方法、装置、设备及计算机可读存储介质
TWI543102B (zh) 異因分析與校正方法與系統
US20210365342A1 (en) Method and system for power equipment diagnosis based on windowed feature and hilbert visualization
WO2020015089A1 (zh) 身份信息风险评定方法、装置、计算机设备和存储介质
CN105677791B (zh) 用于分析风力发电机组的运行数据的方法和***
WO2020015104A1 (zh) 风险旅客流量预测方法、装置、计算机设备和存储介质
US10712733B2 (en) Methods and systems for discovery of prognostic subsequences in time series
CN110263326B (zh) 一种用户行为预测方法、预测装置、存储介质及终端设备
CN111080117A (zh) 设备风险标签的构建方法、装置、电子设备及存储介质
CN113284000B (zh) 用户用电数据异常检测方法、装置、计算机设备
CN115858794B (zh) 用于网络运行安全监测的异常日志数据识别方法
JP2009068946A (ja) 欠陥分類装置および方法並びにプログラム
CN116796275A (zh) 一种工业设备多模态时序异常检测方法
CN113110961B (zh) 设备异常检测方法、装置、计算机设备及可读存储介质
CN106844152B (zh) 银行后台任务跑批时间的相关性分析方法及装置
US20200279148A1 (en) Material structure analysis method and material structure analyzer
CN110930012B (zh) 基于敏感性分析及改进阴性选择法的能耗异常定位方法
CN117290404A (zh) 一种主配网故障处理方法快速检索实用方法及***
CN104820691A (zh) 一种交通流预测中数据库设计方法及其查询向量得出方法
CN117592865B (zh) 一种装备零备件质量状态预测方法及装置
US20230342402A1 (en) Data analysis apparatus, data analysis system, and non-transitory computer-readable storage medium
CN117541562A (zh) 基于不确定性估计的半监督无参考图像质量评价方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912811

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912811

Country of ref document: EP

Kind code of ref document: A1