CN113205146A - Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison - Google Patents
Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison Download PDFInfo
- Publication number
- CN113205146A CN113205146A CN202110545508.XA CN202110545508A CN113205146A CN 113205146 A CN113205146 A CN 113205146A CN 202110545508 A CN202110545508 A CN 202110545508A CN 113205146 A CN113205146 A CN 113205146A
- Authority
- CN
- China
- Prior art keywords
- data
- algorithm
- abnormal fluctuation
- segment
- detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/11—Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Operations Research (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a novel abnormal fluctuation detection method, namely a time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison. The algorithm includes five parts: preparing data; constructing a data fragment; calculating the segment characteristics; detecting and executing; and outputting the result. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in the industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and combines the advantages of a detection method based on a statistical model and similarity measurement; the calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time.
Description
Technical Field
The invention relates to a novel data abnormal fluctuation detection algorithm, in particular to a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison.
Background
Data anomaly detection is one of the important research contents of data analysis mining. Since the introduction of the industrial internet concept, the industry has accumulated a large amount of industrial equipment operational data. The data are analyzed and mined to obtain the abnormity, the operation characteristics of the industrial equipment are further analyzed, and the understanding of the operation state of the equipment becomes important research content. Especially, the method has important significance in the aspect of guaranteeing the safe operation of the equipment by finding the data abnormity and diagnosing the potential abnormity of the equipment in time.
In the conventional equipment safety guarantee work, regular maintenance and overhaul are usually performed on the equipment by manpower. The primary task in securing industrial equipment through data diagnostic techniques is to discover anomalies in the data. In recent years, the industry has also tried to perform related research using accumulated equipment data of industrial equipment running and real-time monitoring to realize more efficient equipment monitoring and data abnormality diagnosis functions.
Currently, algorithms for detecting data anomalies can be mainly classified into the following categories: statistical model-based, cluster-based, similarity metric-based, constraint rule-based, and the like. However, these common anomaly detection methods are usually applied to a limited data set scale, and have no real-time detection capability, so that it is generally difficult to meet the requirement of large-data-volume real-time industrial data stream anomaly detection. Especially in the face of complex industrial scenes, the algorithms are to be improved in terms of calculation amount and detection effect. The traditional anomaly detection method is not designed for industrial real-time big data, for example, the anomaly detection method based on statistics is suitable for detecting outlier anomalies, fluctuation anomalies and other situations, but can not effectively identify continuous anomaly intervals appearing in industrial production. The clustering-based anomaly detection method mainly quantifies the distance between an anomaly point and a normal point to judge an outlier, and is difficult to be suitable for anomaly detection on a large data set and a real-time data stream. And the calculated amount is generally large, and the detection effect depends on the quality of clustering. The anomaly detection method based on the similarity measurement mainly judges whether target detection data are abnormal or not by calculating the similarity between sequences, but the method has higher calculation time cost and low timeliness. In the rule constraint-based anomaly detection method, the time sequence characteristics in a time sequence are mainly and effectively utilized through sequence dependence and speed constraint skills, and highly anomalous data is repaired, but the method is generally difficult to meet the requirement of variable-mode sequence anomaly detection.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a new data abnormal fluctuation detection algorithm, namely a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison, aiming at the requirement of abnormal fluctuation detection of an industrial time sequence data stream. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and the advantages of the detection method based on the statistical model and similarity measurement are combined. The calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time. The practical application proves that the algorithm has higher accuracy, the algorithm can adapt to the working condition change in the industrial data flow, the false alarm rate is reduced, and the abnormal fluctuation of the industrial time sequence data flow is accurately identified.
The technical scheme is as follows: a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison mainly comprises the following steps:
step 1: preparing data;
step 2: constructing a data fragment;
and step 3: calculating the statistical characteristics of the data segments;
and 4, step 4: detecting and executing;
and 5: outputting the result;
according to one aspect of the invention, the build target data segment is represented as: ft T:<D,T,t>。
According to one aspect of the invention, the statistical feature calculation on the data segments includes, but is not limited to, data itself, time stamp, mean, and variance, and the target data segment features are constructed as a target data segment feature set:
according to one aspect of the invention, according to GdGenerating a neighbor set data segment feature group G with N number by the contained data features:
according to one aspect of the present invention, G and G are obtaineddThen, for the target data segment GdMainly using the evaluation of G based on Min-distancedSimilarity with its neighbor set G, and obtain the result dataset: r ═ USi。
According to one aspect of the invention, a detection result data set R is obtained, which is used to determine a data segment F containing a detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdAllowed (permitted)Upper distance limit, i.e., when R < ε (R ∈ R), data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiWhen N/N is lower than lambda, we consider the data fragment Ft TAn anomaly exists.
Has the advantages that: the method has the obvious advantage that the abnormal fluctuation detection is carried out on the detection target by constructing the statistical characteristics of the data segments. Compared with the existing anomaly detection, the target data segment can be detected through the limited neighbor set data segment, so that the calculation time is reduced, the detection efficiency is improved, and the timeliness of industrial big data detection is met. Meanwhile, the detection target is optimized from the traditional single-point target detection into fragment detection, and the accuracy of identifying the abnormal fluctuation of the detection target is improved.
Drawings
Fig. 1 is a general structural view of the present invention.
Fig. 2 is a flowchart of abnormal fluctuation detection of the present invention.
Detailed Description
As shown in fig. 1, in this embodiment, a time series data abnormal fluctuation detection algorithm based on segment statistical feature comparison mainly includes five parts:
preparing data: preparing data for a detection task, determining a detection target, and performing preparation work for algorithm execution;
constructing a data fragment: the method comprises two parts, wherein one part is the construction of a data fragment of a detection target; constructing a neighbor set data segment of a detection target;
calculating the characteristics of the fragments: the data segments are constructed by diagnosis, and statistical characteristic calculation is carried out;
detection execution: calculating the feature similarity of the data fragment features of the neighbor set and the data fragment of the detection target, and judging whether the data has abnormal fluctuation or not through similarity measurement;
and (4) outputting a result: and outputting the result, and judging the abnormality of the detection target.
The following is a detailed description.
Step 1: and (4) preparing data.
Real-time data stream Dt={...,xt-3,xt-2,xt-1,xtEvery data in the data has a corresponding unique time t. And the data fluctuation abnormity detection takes data at the time t in the real-time data stream as a detection target.
Step 2: and constructing a data fragment.
In a real-time data stream Dt={...,xt-3,xt-2,xt-1,xtIn with xtAnd constructing a detection target data segment with the time length of T for the target to be detected. The method for constructing the target data fragment comprises the following steps: ft T< D, T, T >, where D is the current data stream, T is the fragment size, and T is the termination time of the fragment. Ft TFor the constructed data fragment: ft T=xt-T,...,xt。
And step 3: and calculating the statistical characteristics of the data segments.
The data segment characteristics refer to data information including statistical characteristics and the like of the data segments constructed by the algorithm, such as data, time marks, mean values, variances and the like.
Where the data itself refers to the original data in the fragment, i.e. Ft T=xt-T,...,xt;
Time-stamped finger fragment Ft T=xt-T,...,xtTime t in (1);
mean value: the average reflects the general trend of the data. The calculation formula is as follows:
variance: variance is an important characteristic for measuring the degree of data dispersion, and the variance varies greatly in the whole data segment if data deviates from a general trend. The calculation formula is as follows:
maximum value: upper limit of reaction data: maxF;
Minimum value: lower limit of reaction data: minF;
In actual operation, other statistical features capable of reflecting data features can be selected according to needs, and target data segment features are constructed into a target data segment feature group:
by constructing the target data segment feature group, abnormal fluctuation detection of data is converted into detection of the data segment feature group, so that data features can be effectively mined, and the detection accuracy is improved.
And 4, step 4: and (6) detecting and executing.
First from the historical data D according to step 3t-1={...,xt-4,xt-3,xt-2,xt-1In accordance with GdGenerating N neighbor set data segment feature groups G by the contained data features; the data are as follows:
in the process of constructing the neighbor set data segment feature group G, the following four aspects are mainly considered:
one is as follows: and (3) timeliness: for time-sequential data, when GdTime mark of (1) is t, GiShould be within the valid range and should not be further away from t;
the second step is as follows: the periodicity is as follows: for time-sequence data, whether the data has periodic characteristics or not should be fully considered, and corresponding G is constructed according to the periodic characteristicsi;
And thirdly: randomness: in consideration of the first two requirements, G should be constructed as randomly as possible in the time dimensioni;
Fourthly, the method comprises the following steps: setting a reasonable parameter N: in obtaining G and GdThen, for the target data segment GdDetection of (2), mainTo use the evaluation of G based on Min-style distancedSimilarity to its neighbor set G:wherein iuFinger GiThe u-th feature of (1), duFinger GdThe u-th feature of (1);
the advantage of using weighted Min-style distances is that the parameter omega can be adjusted as desireduBy adjusting the algorithm pairThe sensitivity of a feature of (1);
after distance calculation is carried out on data of the detection point feature group, a result data set R ═ US is obtainedi。
And 5: and outputting the result.
In step 4, we obtain a detection result data set R, which we want to use to determine a data segment F containing the detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdThe upper bound of the distance allowed, i.e. when R < ε (R ∈ R), the data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiNumber N, when N/N is lower than λ, we consider data fragment Ft TAn anomaly exists.
Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the embodiments, and various equivalent modifications can be made within the technical spirit of the present invention, and the scope of the present invention is also within the scope of the present invention.
Claims (6)
1. An abnormal fluctuation detection algorithm for time series data based on segment statistical characteristic comparison. The method is characterized by mainly comprising the following steps:
step 1: preparing data;
step 2: constructing a data fragment;
and step 3: calculating the statistical characteristics of the data segments;
and 4, step 4: detecting and executing;
and 5: and outputting the result.
2. The algorithm for detecting the abnormal fluctuation of the time series data based on the comparison of the segment statistical characteristics as claimed in claim 1, wherein the construction target data segment is represented as:
Ft T:<D,T,t>。
3. the algorithm for detecting abnormal fluctuation of time series data based on segment statistical characteristic comparison as claimed in claim 1, wherein the statistical characteristic calculation for data segments includes but is not limited to data itself, time stamp, mean, and variance, and constructs the target data segment characteristics as target data segment characteristic set:
5. the algorithm for detecting abnormal fluctuation of time series data based on comparison of statistical characteristics of fragments as claimed in claim 1, wherein G and G are obtaineddThen, for the target data segment GdMainly using the evaluation of G based on Min-distancedAdjacent theretoSimilarity between the population sets G, and obtaining a result data set: r ═ USi。
6. The algorithm of claim 1, wherein a detection result data set R is obtained, which is used to determine the data segment F containing the detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdThe upper bound of the distance allowed, i.e. when R < ε (R ∈ R), the data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiWhen N/N is lower than lambda, we consider the data fragment Ft TAn anomaly exists.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545508.XA CN113205146A (en) | 2021-05-19 | 2021-05-19 | Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110545508.XA CN113205146A (en) | 2021-05-19 | 2021-05-19 | Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113205146A true CN113205146A (en) | 2021-08-03 |
Family
ID=77031772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110545508.XA Withdrawn CN113205146A (en) | 2021-05-19 | 2021-05-19 | Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113205146A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117648232A (en) * | 2023-12-11 | 2024-03-05 | 武汉天宝莱信息技术有限公司 | Application program data monitoring method, device and storage medium |
-
2021
- 2021-05-19 CN CN202110545508.XA patent/CN113205146A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117648232A (en) * | 2023-12-11 | 2024-03-05 | 武汉天宝莱信息技术有限公司 | Application program data monitoring method, device and storage medium |
CN117648232B (en) * | 2023-12-11 | 2024-05-24 | 武汉天宝莱信息技术有限公司 | Application program data monitoring method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109614576B (en) | Transformer anomaly detection method based on multi-dimensional Gaussian distribution and trend segmentation | |
CN110018670B (en) | Industrial process abnormal working condition prediction method based on dynamic association rule mining | |
US10719577B2 (en) | System analyzing device, system analyzing method and storage medium | |
JP6141235B2 (en) | How to detect anomalies in time series data | |
US8566070B2 (en) | Apparatus abnormality monitoring method and system | |
JP4394286B2 (en) | Multidimensional method and system for statistical process management | |
CN110895526A (en) | Method for correcting data abnormity in atmosphere monitoring system | |
CN111353482A (en) | LSTM-based fatigue factor recessive anomaly detection and fault diagnosis method | |
US20150220847A1 (en) | Information Processing Apparatus, Diagnosis Method, and Program | |
Zhang et al. | Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns | |
CN112766429B (en) | Method, device, computer equipment and medium for anomaly detection | |
CN109784668B (en) | Sample feature dimension reduction processing method for detecting abnormal behaviors of power monitoring system | |
CN110011990B (en) | Intelligent analysis method for intranet security threats | |
Xu et al. | A lof-based method for abnormal segment detection in machinery condition monitoring | |
CN112949735A (en) | Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining | |
CN113269327A (en) | Flow anomaly prediction method based on machine learning | |
CN116304957A (en) | On-line identification method for monitoring state mutation of power supply and transformation equipment | |
CN114004331A (en) | Fault analysis method based on key indexes and deep learning | |
CN113205146A (en) | Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison | |
CN110308713A (en) | A kind of industrial process failure identification variables method based on k neighbour reconstruct | |
CN117093944A (en) | Time sequence data template self-adaptive abnormal mode identification method and system | |
CN115935285A (en) | Multi-element time series anomaly detection method and system based on mask map neural network model | |
Bach et al. | Automatic case capturing for problematic drilling situations | |
JP7128232B2 (en) | Factor analysis device and factor analysis method | |
CN114638039A (en) | Structural health monitoring characteristic data interpretation method based on low-rank matrix recovery |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210803 |