CN113205146A - Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison - Google Patents

Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison Download PDF

Info

Publication number
CN113205146A
CN113205146A CN202110545508.XA CN202110545508A CN113205146A CN 113205146 A CN113205146 A CN 113205146A CN 202110545508 A CN202110545508 A CN 202110545508A CN 113205146 A CN113205146 A CN 113205146A
Authority
CN
China
Prior art keywords
data
algorithm
abnormal fluctuation
segment
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110545508.XA
Other languages
Chinese (zh)
Inventor
孙栓柱
周春蕾
李逗
孙彬
王林
王其祥
高进
李春岩
沈洋
黄治军
张磊
傅高健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Fangtian Power Technology Co Ltd
Original Assignee
Jiangsu Fangtian Power Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Fangtian Power Technology Co Ltd filed Critical Jiangsu Fangtian Power Technology Co Ltd
Priority to CN202110545508.XA priority Critical patent/CN113205146A/en
Publication of CN113205146A publication Critical patent/CN113205146A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Operations Research (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a novel abnormal fluctuation detection method, namely a time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison. The algorithm includes five parts: preparing data; constructing a data fragment; calculating the segment characteristics; detecting and executing; and outputting the result. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in the industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and combines the advantages of a detection method based on a statistical model and similarity measurement; the calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time.

Description

Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison
Technical Field
The invention relates to a novel data abnormal fluctuation detection algorithm, in particular to a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison.
Background
Data anomaly detection is one of the important research contents of data analysis mining. Since the introduction of the industrial internet concept, the industry has accumulated a large amount of industrial equipment operational data. The data are analyzed and mined to obtain the abnormity, the operation characteristics of the industrial equipment are further analyzed, and the understanding of the operation state of the equipment becomes important research content. Especially, the method has important significance in the aspect of guaranteeing the safe operation of the equipment by finding the data abnormity and diagnosing the potential abnormity of the equipment in time.
In the conventional equipment safety guarantee work, regular maintenance and overhaul are usually performed on the equipment by manpower. The primary task in securing industrial equipment through data diagnostic techniques is to discover anomalies in the data. In recent years, the industry has also tried to perform related research using accumulated equipment data of industrial equipment running and real-time monitoring to realize more efficient equipment monitoring and data abnormality diagnosis functions.
Currently, algorithms for detecting data anomalies can be mainly classified into the following categories: statistical model-based, cluster-based, similarity metric-based, constraint rule-based, and the like. However, these common anomaly detection methods are usually applied to a limited data set scale, and have no real-time detection capability, so that it is generally difficult to meet the requirement of large-data-volume real-time industrial data stream anomaly detection. Especially in the face of complex industrial scenes, the algorithms are to be improved in terms of calculation amount and detection effect. The traditional anomaly detection method is not designed for industrial real-time big data, for example, the anomaly detection method based on statistics is suitable for detecting outlier anomalies, fluctuation anomalies and other situations, but can not effectively identify continuous anomaly intervals appearing in industrial production. The clustering-based anomaly detection method mainly quantifies the distance between an anomaly point and a normal point to judge an outlier, and is difficult to be suitable for anomaly detection on a large data set and a real-time data stream. And the calculated amount is generally large, and the detection effect depends on the quality of clustering. The anomaly detection method based on the similarity measurement mainly judges whether target detection data are abnormal or not by calculating the similarity between sequences, but the method has higher calculation time cost and low timeliness. In the rule constraint-based anomaly detection method, the time sequence characteristics in a time sequence are mainly and effectively utilized through sequence dependence and speed constraint skills, and highly anomalous data is repaired, but the method is generally difficult to meet the requirement of variable-mode sequence anomaly detection.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a new data abnormal fluctuation detection algorithm, namely a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison, aiming at the requirement of abnormal fluctuation detection of an industrial time sequence data stream. The algorithm meets the requirements on timeliness and accuracy of data abnormal fluctuation in industrial data flow. Compared with the traditional anomaly detection algorithm, the algorithm is specially designed for industrial time sequence data flow, and the advantages of the detection method based on the statistical model and similarity measurement are combined. The calculation amount is reduced, the method can be suitable for large data flow, and abnormal fluctuation in real-time data flow can be detected and identified in time. The practical application proves that the algorithm has higher accuracy, the algorithm can adapt to the working condition change in the industrial data flow, the false alarm rate is reduced, and the abnormal fluctuation of the industrial time sequence data flow is accurately identified.
The technical scheme is as follows: a time sequence data abnormal fluctuation detection algorithm based on segment statistical characteristic comparison mainly comprises the following steps:
step 1: preparing data;
step 2: constructing a data fragment;
and step 3: calculating the statistical characteristics of the data segments;
and 4, step 4: detecting and executing;
and 5: outputting the result;
according to one aspect of the invention, the build target data segment is represented as: ft T:<D,T,t>。
According to one aspect of the invention, the statistical feature calculation on the data segments includes, but is not limited to, data itself, time stamp, mean, and variance, and the target data segment features are constructed as a target data segment feature set:
Figure BDA0003073473340000021
according to one aspect of the invention, according to GdGenerating a neighbor set data segment feature group G with N number by the contained data features:
Figure BDA0003073473340000022
according to one aspect of the present invention, G and G are obtaineddThen, for the target data segment GdMainly using the evaluation of G based on Min-distancedSimilarity with its neighbor set G, and obtain the result dataset: r ═ USi
According to one aspect of the invention, a detection result data set R is obtained, which is used to determine a data segment F containing a detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdAllowed (permitted)Upper distance limit, i.e., when R < ε (R ∈ R), data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiWhen N/N is lower than lambda, we consider the data fragment Ft TAn anomaly exists.
Has the advantages that: the method has the obvious advantage that the abnormal fluctuation detection is carried out on the detection target by constructing the statistical characteristics of the data segments. Compared with the existing anomaly detection, the target data segment can be detected through the limited neighbor set data segment, so that the calculation time is reduced, the detection efficiency is improved, and the timeliness of industrial big data detection is met. Meanwhile, the detection target is optimized from the traditional single-point target detection into fragment detection, and the accuracy of identifying the abnormal fluctuation of the detection target is improved.
Drawings
Fig. 1 is a general structural view of the present invention.
Fig. 2 is a flowchart of abnormal fluctuation detection of the present invention.
Detailed Description
As shown in fig. 1, in this embodiment, a time series data abnormal fluctuation detection algorithm based on segment statistical feature comparison mainly includes five parts:
preparing data: preparing data for a detection task, determining a detection target, and performing preparation work for algorithm execution;
constructing a data fragment: the method comprises two parts, wherein one part is the construction of a data fragment of a detection target; constructing a neighbor set data segment of a detection target;
calculating the characteristics of the fragments: the data segments are constructed by diagnosis, and statistical characteristic calculation is carried out;
detection execution: calculating the feature similarity of the data fragment features of the neighbor set and the data fragment of the detection target, and judging whether the data has abnormal fluctuation or not through similarity measurement;
and (4) outputting a result: and outputting the result, and judging the abnormality of the detection target.
The following is a detailed description.
Step 1: and (4) preparing data.
Real-time data stream Dt={...,xt-3,xt-2,xt-1,xtEvery data in the data has a corresponding unique time t. And the data fluctuation abnormity detection takes data at the time t in the real-time data stream as a detection target.
Step 2: and constructing a data fragment.
In a real-time data stream Dt={...,xt-3,xt-2,xt-1,xtIn with xtAnd constructing a detection target data segment with the time length of T for the target to be detected. The method for constructing the target data fragment comprises the following steps: ft T< D, T, T >, where D is the current data stream, T is the fragment size, and T is the termination time of the fragment. Ft TFor the constructed data fragment: ft T=xt-T,...,xt
And step 3: and calculating the statistical characteristics of the data segments.
The data segment characteristics refer to data information including statistical characteristics and the like of the data segments constructed by the algorithm, such as data, time marks, mean values, variances and the like.
Where the data itself refers to the original data in the fragment, i.e. Ft T=xt-T,...,xt
Time-stamped finger fragment Ft T=xt-T,...,xtTime t in (1);
mean value: the average reflects the general trend of the data. The calculation formula is as follows:
Figure BDA0003073473340000041
variance: variance is an important characteristic for measuring the degree of data dispersion, and the variance varies greatly in the whole data segment if data deviates from a general trend. The calculation formula is as follows:
Figure BDA0003073473340000042
maximum value: upper limit of reaction data: maxF
Minimum value: lower limit of reaction data: minF
In actual operation, other statistical features capable of reflecting data features can be selected according to needs, and target data segment features are constructed into a target data segment feature group:
Figure BDA0003073473340000043
by constructing the target data segment feature group, abnormal fluctuation detection of data is converted into detection of the data segment feature group, so that data features can be effectively mined, and the detection accuracy is improved.
And 4, step 4: and (6) detecting and executing.
First from the historical data D according to step 3t-1={...,xt-4,xt-3,xt-2,xt-1In accordance with GdGenerating N neighbor set data segment feature groups G by the contained data features; the data are as follows:
Figure BDA0003073473340000044
in the process of constructing the neighbor set data segment feature group G, the following four aspects are mainly considered:
one is as follows: and (3) timeliness: for time-sequential data, when GdTime mark of (1) is t, GiShould be within the valid range and should not be further away from t;
the second step is as follows: the periodicity is as follows: for time-sequence data, whether the data has periodic characteristics or not should be fully considered, and corresponding G is constructed according to the periodic characteristicsi
And thirdly: randomness: in consideration of the first two requirements, G should be constructed as randomly as possible in the time dimensioni
Fourthly, the method comprises the following steps: setting a reasonable parameter N: in obtaining G and GdThen, for the target data segment GdDetection of (2), mainTo use the evaluation of G based on Min-style distancedSimilarity to its neighbor set G:
Figure BDA0003073473340000051
wherein iuFinger GiThe u-th feature of (1), duFinger GdThe u-th feature of (1);
the advantage of using weighted Min-style distances is that the parameter omega can be adjusted as desireduBy adjusting the algorithm pair
Figure BDA0003073473340000052
The sensitivity of a feature of (1);
after distance calculation is carried out on data of the detection point feature group, a result data set R ═ US is obtainedi
And 5: and outputting the result.
In step 4, we obtain a detection result data set R, which we want to use to determine a data segment F containing the detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdThe upper bound of the distance allowed, i.e. when R < ε (R ∈ R), the data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiNumber N, when N/N is lower than λ, we consider data fragment Ft TAn anomaly exists.
Although the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the details of the embodiments, and various equivalent modifications can be made within the technical spirit of the present invention, and the scope of the present invention is also within the scope of the present invention.

Claims (6)

1. An abnormal fluctuation detection algorithm for time series data based on segment statistical characteristic comparison. The method is characterized by mainly comprising the following steps:
step 1: preparing data;
step 2: constructing a data fragment;
and step 3: calculating the statistical characteristics of the data segments;
and 4, step 4: detecting and executing;
and 5: and outputting the result.
2. The algorithm for detecting the abnormal fluctuation of the time series data based on the comparison of the segment statistical characteristics as claimed in claim 1, wherein the construction target data segment is represented as:
Ft T:<D,T,t>。
3. the algorithm for detecting abnormal fluctuation of time series data based on segment statistical characteristic comparison as claimed in claim 1, wherein the statistical characteristic calculation for data segments includes but is not limited to data itself, time stamp, mean, and variance, and constructs the target data segment characteristics as target data segment characteristic set:
Figure FDA0003073473330000011
4. the algorithm for detecting abnormal fluctuation of time series data based on comparison of statistical characteristics of fragments as claimed in claim 1, wherein the algorithm is based on GdGenerating a neighbor set data segment feature group G with N number by the contained data features:
Figure FDA0003073473330000012
5. the algorithm for detecting abnormal fluctuation of time series data based on comparison of statistical characteristics of fragments as claimed in claim 1, wherein G and G are obtaineddThen, for the target data segment GdMainly using the evaluation of G based on Min-distancedAdjacent theretoSimilarity between the population sets G, and obtaining a result data set: r ═ USi
6. The algorithm of claim 1, wherein a detection result data set R is obtained, which is used to determine the data segment F containing the detection targett TWhether an anomaly exists:
setting a parameter epsilon (epsilon)>0) And λ (λ ∈ (0,1)), where ε represents GiAnd GdThe upper bound of the distance allowed, i.e. when R < ε (R ∈ R), the data G is considereddNormal;
calculation of the considered data G in the structural data set RdNormal GiWhen N/N is lower than lambda, we consider the data fragment Ft TAn anomaly exists.
CN202110545508.XA 2021-05-19 2021-05-19 Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison Withdrawn CN113205146A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110545508.XA CN113205146A (en) 2021-05-19 2021-05-19 Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110545508.XA CN113205146A (en) 2021-05-19 2021-05-19 Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison

Publications (1)

Publication Number Publication Date
CN113205146A true CN113205146A (en) 2021-08-03

Family

ID=77031772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110545508.XA Withdrawn CN113205146A (en) 2021-05-19 2021-05-19 Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison

Country Status (1)

Country Link
CN (1) CN113205146A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648232A (en) * 2023-12-11 2024-03-05 武汉天宝莱信息技术有限公司 Application program data monitoring method, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117648232A (en) * 2023-12-11 2024-03-05 武汉天宝莱信息技术有限公司 Application program data monitoring method, device and storage medium
CN117648232B (en) * 2023-12-11 2024-05-24 武汉天宝莱信息技术有限公司 Application program data monitoring method, device and storage medium

Similar Documents

Publication Publication Date Title
CN109614576B (en) Transformer anomaly detection method based on multi-dimensional Gaussian distribution and trend segmentation
CN110018670B (en) Industrial process abnormal working condition prediction method based on dynamic association rule mining
US10719577B2 (en) System analyzing device, system analyzing method and storage medium
JP6141235B2 (en) How to detect anomalies in time series data
US8566070B2 (en) Apparatus abnormality monitoring method and system
JP4394286B2 (en) Multidimensional method and system for statistical process management
CN110895526A (en) Method for correcting data abnormity in atmosphere monitoring system
CN111353482A (en) LSTM-based fatigue factor recessive anomaly detection and fault diagnosis method
US20150220847A1 (en) Information Processing Apparatus, Diagnosis Method, and Program
Zhang et al. Data anomaly detection for structural health monitoring by multi-view representation based on local binary patterns
CN112766429B (en) Method, device, computer equipment and medium for anomaly detection
CN109784668B (en) Sample feature dimension reduction processing method for detecting abnormal behaviors of power monitoring system
CN110011990B (en) Intelligent analysis method for intranet security threats
Xu et al. A lof-based method for abnormal segment detection in machinery condition monitoring
CN112949735A (en) Liquid hazardous chemical substance volatile concentration abnormity discovery method based on outlier data mining
CN113269327A (en) Flow anomaly prediction method based on machine learning
CN116304957A (en) On-line identification method for monitoring state mutation of power supply and transformation equipment
CN114004331A (en) Fault analysis method based on key indexes and deep learning
CN113205146A (en) Time sequence data abnormal fluctuation detection algorithm based on fragment statistical characteristic comparison
CN110308713A (en) A kind of industrial process failure identification variables method based on k neighbour reconstruct
CN117093944A (en) Time sequence data template self-adaptive abnormal mode identification method and system
CN115935285A (en) Multi-element time series anomaly detection method and system based on mask map neural network model
Bach et al. Automatic case capturing for problematic drilling situations
JP7128232B2 (en) Factor analysis device and factor analysis method
CN114638039A (en) Structural health monitoring characteristic data interpretation method based on low-rank matrix recovery

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210803