CN113205146A

CN113205146A - Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison

Info

Publication number: CN113205146A
Application number: CN202110545508.XA
Authority: CN
Inventors: 孙栓柱; 周春蕾; 李逗; 孙彬; 王林; 王其祥; 高进; 李春岩; 沈洋; 黄治军; 张磊; 傅高健
Original assignee: Jiangsu Fangtian Power Technology Co Ltd
Current assignee: Jiangsu Fangtian Power Technology Co Ltd
Priority date: 2021-05-19
Filing date: 2021-05-19
Publication date: 2021-08-03

Abstract

The invention discloses a novel abnormal fluctuation detection method, namely a time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison. The algorithm includes five parts: preparing data; constructing a data fragment; calculating the segment characteristics; detecting and executing; and outputting the result. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in the industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and combines the advantages of a detection method based on a statistical model and similarity measurement; the calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time.

Description

Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison

Technical Field

The invention relates to a novel data abnormal fluctuation detection algorithm, in particular to a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison.

Background

Data anomaly detection is one of the important research contents of data analysis mining. Since the introduction of the industrial internet concept, the industry has accumulated a large amount of industrial equipment operational data. The data are analyzed and mined to obtain the abnormity, the operation characteristics of the industrial equipment are further analyzed, and the understanding of the operation state of the equipment becomes important research content. Especially, the method has important significance in the aspect of guaranteeing the safe operation of the equipment by finding the data abnormity and diagnosing the potential abnormity of the equipment in time.

In the conventional equipment safety guarantee work, regular maintenance and overhaul are usually performed on the equipment by manpower. The primary task in securing industrial equipment through data diagnostic techniques is to discover anomalies in the data. In recent years, the industry has also tried to perform related research using accumulated equipment data of industrial equipment running and real-time monitoring to realize more efficient equipment monitoring and data abnormality diagnosis functions.

Currently, algorithms for detecting data anomalies can be mainly classified into the following categories: statistical model-based, cluster-based, similarity metric-based, constraint rule-based, and the like. However, these common anomaly detection methods are usually applied to a limited data set scale, and have no real-time detection capability, so that it is generally difficult to meet the requirement of large-data-volume real-time industrial data stream anomaly detection. Especially in the face of complex industrial scenes, the algorithms are to be improved in terms of calculation amount and detection effect. The traditional anomaly detection method is not designed for industrial real-time big data, for example, the anomaly detection method based on statistics is suitable for detecting outlier anomalies, fluctuation anomalies and other situations, but can not effectively identify continuous anomaly intervals appearing in industrial production. The clustering-based anomaly detection method mainly quantifies the distance between an anomaly point and a normal point to judge an outlier, and is difficult to be suitable for anomaly detection on a large data set and a real-time data stream. And the calculated amount is generally large, and the detection effect depends on the quality of clustering. The anomaly detection method based on the similarity measurement mainly judges whether target detection data are abnormal or not by calculating the similarity between sequences, but the method has higher calculation time cost and low timeliness. In the rule constraint-based anomaly detection method, the time sequence characteristics in a time sequence are mainly and effectively utilized through sequence dependence and speed constraint skills, and highly anomalous data is repaired, but the method is generally difficult to meet the requirement of variable-mode sequence anomaly detection.

Disclosure of Invention

The purpose of the invention is as follows: the invention provides a new data abnormal fluctuation detection algorithm, namely a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison, aiming at the requirement of abnormal fluctuation detection of an industrial time sequence data stream. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and the advantages of the detection method based on the statistical model and similarity measurement are combined. The calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time. The practical application proves that the algorithm has higher accuracy, the algorithm can adapt to the working condition change in the industrial data flow, the false alarm rate is reduced, and the abnormal fluctuation of the industrial time sequence data flow is accurately identified.

The technical scheme is as follows: a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison mainly comprises the following steps:

step 1: preparing data;

step 2: constructing a data fragment;

and step 3: calculating the statistical characteristics of the data segments;

and 4, step 4: detecting and executing;

and 5: outputting the result;

according to one aspect of the invention, the build target data segment is represented as: f_t ^T:＜D,T,t＞。

According to one aspect of the invention, the statistical feature calculation on the data segments includes, but is not limited to, data itself, time stamp, mean, and variance, and the target data segment features are constructed as a target data segment feature set:

according to one aspect of the invention, according to G_dGenerating a neighbor set data segment feature group G with N number by the contained data features:

according to one aspect of the present invention, G and G are obtained_dThen, for the target data segment G_dMainly using the evaluation of G based on Min-distance_dSimilarity with its neighbor set G, and obtain the result dataset: r ═ US_i。

According to one aspect of the invention, a detection result data set R is obtained, which is used to determine a data segment F containing a detection target_t ^TWhether an anomaly exists:

setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents G_iAnd G_dAllowed (permitted)Upper distance limit, i.e., when R < ε (R ∈ R), data G is considered_dNormal;

calculation of the considered data G in the structural data set R_dNormal G_iWhen N/N is lower than lambda, we consider the data fragment F_t ^TAn anomaly exists.

Has the advantages that: the method has the obvious advantage that the abnormal fluctuation detection is carried out on the detection target by constructing the statistical characteristics of the data segments. Compared with the existing anomaly detection, the target data segment can be detected through the limited neighbor set data segment, so that the calculation time is reduced, the detection efficiency is improved, and the timeliness of industrial big data detection is met. Meanwhile, the detection target is optimized from the traditional single-point target detection into fragment detection, and the accuracy of identifying the abnormal fluctuation of the detection target is improved.

Drawings

Fig. 1 is a general structural view of the present invention.

Fig. 2 is a flowchart of abnormal fluctuation detection of the present invention.

Detailed Description

As shown in fig. 1, in this embodiment, a time series data abnormal fluctuation detection algorithm based on segment statistical feature comparison mainly includes five parts:

preparing data: preparing data for a detection task, determining a detection target, and performing preparation work for algorithm execution;

constructing a data fragment: the method comprises two parts, wherein one part is the construction of a data fragment of a detection target; constructing a neighbor set data segment of a detection target;

calculating the characteristics of the fragments: the data segments are constructed by diagnosis, and statistical characteristic calculation is carried out;

detection execution: calculating the feature similarity of the data fragment features of the neighbor set and the data fragment of the detection target, and judging whether the data has abnormal fluctuation or not through similarity measurement;

and (4) outputting a result: and outputting the result, and judging the abnormality of the detection target.

The following is a detailed description.

Step 1: and (4) preparing data.

Real-time data stream D_t＝{...,x_t-3,x_t-2,x_t-1,x_tEvery data in the data has a corresponding unique time t. And the data fluctuation abnormity detection takes data at the time t in the real-time data stream as a detection target.

Step 2: and constructing a data fragment.

In a real-time data stream D_t＝{...,x_t-3,x_t-2,x_t-1,x_tIn with x_tAnd constructing a detection target data segment with the time length of T for the target to be detected. The method for constructing the target data fragment comprises the following steps: f_t ^T< D, T, T >, where D is the current data stream, T is the fragment size, and T is the termination time of the fragment. F_t ^TFor the constructed data fragment: f_t ^T＝x_t-T,...,x_t。

And step 3: and calculating the statistical characteristics of the data segments.

The data segment characteristics refer to data information including statistical characteristics and the like of the data segments constructed by the algorithm, such as data, time marks, mean values, variances and the like.

Where the data itself refers to the original data in the fragment, i.e. F_t ^T＝x_t-T,...,x_t；

Time-stamped finger fragment F_t ^T＝x_t-T,...,x_tTime t in (1);

mean value: the average reflects the general trend of the data. The calculation formula is as follows:

variance: variance is an important characteristic for measuring the degree of data dispersion, and the variance varies greatly in the whole data segment if data deviates from a general trend. The calculation formula is as follows:

maximum value: upper limit of reaction data: max_F；

Minimum value: lower limit of reaction data: min_F；

In actual operation, other statistical features capable of reflecting data features can be selected according to needs, and target data segment features are constructed into a target data segment feature group:

by constructing the target data segment feature group, abnormal fluctuation detection of data is converted into detection of the data segment feature group, so that data features can be effectively mined, and the detection accuracy is improved.

And 4, step 4: and (6) detecting and executing.

First from the historical data D according to step 3_t-1＝{...,x_t-4,x_t-3,x_t-2,x_t-1In accordance with G_dGenerating N neighbor set data segment feature groups G by the contained data features; the data are as follows:

in the process of constructing the neighbor set data segment feature group G, the following four aspects are mainly considered:

one is as follows: and (3) timeliness: for time-sequential data, when G_dTime mark of (1) is t, G_iShould be within the valid range and should not be further away from t;

the second step is as follows: the periodicity is as follows: for time-sequence data, whether the data has periodic characteristics or not should be fully considered, and corresponding G is constructed according to the periodic characteristics_i；

And thirdly: randomness: in consideration of the first two requirements, G should be constructed as randomly as possible in the time dimension_i；

Fourthly, the method comprises the following steps: setting a reasonable parameter N: in obtaining G and G_dThen, for the target data segment G_dDetection of (2), mainTo use the evaluation of G based on Min-style distance_dSimilarity to its neighbor set G:

wherein i_uFinger G_iThe u-th feature of (1), d_uFinger G_dThe u-th feature of (1);

the advantage of using weighted Min-style distances is that the parameter omega can be adjusted as desired_uBy adjusting the algorithm pair

The sensitivity of a feature of (1);

after distance calculation is carried out on data of the detection point feature group, a result data set R ═ US is obtained_i。

And 5: and outputting the result.

In step 4, we obtain a detection result data set R, which we want to use to determine a data segment F containing the detection target_t ^TWhether an anomaly exists:

setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents G_iAnd G_dThe upper bound of the distance allowed, i.e. when R < ε (R ∈ R), the data G is considered_dNormal;

calculation of the considered data G in the structural data set R_dNormal G_iNumber N, when N/N is lower than λ, we consider data fragment F_t ^TAn anomaly exists.

Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the embodiments, and various equivalent modifications can be made within the technical spirit of the present invention, and the scope of the present invention is also within the scope of the present invention.

Claims

1. An abnormal fluctuation detection algorithm for time series data based on segment statistical characteristic comparison. The method is characterized by mainly comprising the following steps:

step 1: preparing data;

step 2: constructing a data fragment;

and step 3: calculating the statistical characteristics of the data segments;

and 4, step 4: detecting and executing;

and 5: and outputting the result.

2. The algorithm for detecting the abnormal fluctuation of the time series data based on the comparison of the segment statistical characteristics as claimed in claim 1, wherein the construction target data segment is represented as:

F_t ^T:＜D,T,t＞。

3. the algorithm for detecting abnormal fluctuation of time series data based on segment statistical characteristic comparison as claimed in claim 1, wherein the statistical characteristic calculation for data segments includes but is not limited to data itself, time stamp, mean, and variance, and constructs the target data segment characteristics as target data segment characteristic set:

4. the algorithm for detecting abnormal fluctuation of time series data based on comparison of statistical characteristics of fragments as claimed in claim 1, wherein the algorithm is based on G_dGenerating a neighbor set data segment feature group G with N number by the contained data features:

5. the algorithm for detecting abnormal fluctuation of time series data based on comparison of statistical characteristics of fragments as claimed in claim 1, wherein G and G are obtained_dThen, for the target data segment G_dMainly using the evaluation of G based on Min-distance_dAdjacent theretoSimilarity between the population sets G, and obtaining a result data set: r ═ US_i。

6. The algorithm of claim 1, wherein a detection result data set R is obtained, which is used to determine the data segment F containing the detection target_t ^TWhether an anomaly exists: