CN107682319B - Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method - Google Patents

Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method Download PDF

Info

Publication number
CN107682319B
CN107682319B CN201710823063.0A CN201710823063A CN107682319B CN 107682319 B CN107682319 B CN 107682319B CN 201710823063 A CN201710823063 A CN 201710823063A CN 107682319 B CN107682319 B CN 107682319B
Authority
CN
China
Prior art keywords
data
cluster
point
points
neighborhood
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710823063.0A
Other languages
Chinese (zh)
Other versions
CN107682319A (en
Inventor
首照宇
�田�浩
邹风波
张彤
程夏威
文辉
赵晖
莫建文
汪延国
曾情
李希成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin Yuhui Information Technology Co ltd
Guilin University of Electronic Technology
Original Assignee
Guilin Yuhui Information Technology Co ltd
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin Yuhui Information Technology Co ltd, Guilin University of Electronic Technology filed Critical Guilin Yuhui Information Technology Co ltd
Priority to CN201710823063.0A priority Critical patent/CN107682319B/en
Publication of CN107682319A publication Critical patent/CN107682319A/en
Application granted granted Critical
Publication of CN107682319B publication Critical patent/CN107682319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
  • External Artificial Organs (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The method is characterized by comprising the following steps of 1) processing a real-time data stream, 2) setting a data set S in a sliding window, 3) initializing parameters k, r and ξ, 4) obtaining a distance matrix dist, 5) obtaining a r neighborhood point set, and 6) obtaining an angle factor of the r neighborhood point set
Figure DDA0001406797110000011
And local density
Figure DDA0001406797110000012
7) Obtaining dissimilarity degrees; 8) acquiring a cluster center factor of each data point; 9) acquiring an attribution matrix; 10) determining a cluster center and clustering; 11) respectively carrying out anomaly detection on each clustered cluster; 12) and (5) performing multiple verification. The method applies sliding window and basic window technologies, constructs an efficient data stream processing model, reduces the occupancy rate of the memory, and has good real-time performance, high accuracy of abnormal detection and low time complexity.

Description

Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method
Technical Field
The invention relates to data flow anomaly detection and data clustering, in particular to a data flow anomaly detection and multiple verification method based on enhanced angle anomaly factors.
Background
The rapid development of network technology and the continuous improvement of social informatization lead to the explosive increase of information quantity, so that various industries generate massive, high-speed and dynamic stream data, such as network intrusion monitoring, commercial transaction management and analysis, video monitoring, sensor network monitoring and the like. Due to the characteristics of real-time infinite dynamic data flow and the like, the traditional static data anomaly detection method cannot accurately and effectively analyze and process the large-scale dynamically-increased flow data, so that the construction of a real-time effective anomaly detection method suitable for the data flow becomes particularly important.
For the practical problems faced by different stages, different data stream anomaly detection methods are provided by scientific and technological workers. The conventional data flow anomaly detection methods can be roughly classified into density-based data flow anomaly detection methods, angle-based data flow anomaly detection methods, and cluster-based data flow anomaly detection methods. The density-based data flow anomaly detection method applies density as the most basic anomaly measurement mode and constructs an anomaly factor which can be dynamically updated and is used for measuring the data anomaly degree, Pokrajac et al quotes a static data anomaly detection method LOF into a data flow and researches an incremental local anomaly detection method INCLOF which can be applied to the dynamic data flow, and the INCLOF deletes historical data and dynamically updates the anomaly factor of each data point along with the insertion of new data; the method of improving INCLOF by Ke Gao et al introduces the idea of sliding window, and proposes an n-INCLOF method, wherein the n-INCLOF method only updates the abnormal factors of each data object in the sliding window at the current moment; in some cases, some data points show abnormality at a certain moment, but do not show abnormality at the next moment, based on the problem, Karimian S H et al propose an I-IncLOF method, the I-IncLOF method introduces a multiple verification idea, the I-IncLOF method judges data objects which show abnormality all the time in the whole sliding process of a window as determined abnormal points, the I-IncLOF method greatly reduces the false judgment rate, but the I-IncLOF method is poor in effectiveness under a multi-dimensional condition; xinjielilu et al propose an INCLOCI method, which introduces a multi-granularity abnormal factor MDEF, and can detect not only scattered abnormal points but also abnormal clusters. In order to solve the problem that the effectiveness of similarity measurement modes such as distance and density is reduced in a high-dimensional data space, some scientific researchers provide angle measurement modes, the basic idea of the angle similarity measurement is that the angle formed by an abnormal point and other points is generally small and the fluctuation range is small, the angle formed by a conventional point and other points is large and the fluctuation range is large, HP Kriegel et al provide an angle-based abnormality detection method ABOD, the ABOD method takes the variance of the angle as an abnormality factor ABOF for measuring the abnormality degree of a data point, and the ABOD method still has high detection accuracy in the high-dimensional space; YeH provides an angle-based data stream anomaly detection method DSABOD, the DSABOD dynamically updates an anomaly factor of each data point relative to a neighborhood point of the data point along with the continuous flow of the data point of the data stream into a memory, the DSABOD provides a new idea for anomaly detection in a high-dimensional data stream, but the traditional angle-based data stream anomaly detection method has the problem of low anomaly detection rate. The data flow abnormity detection method based on clustering comprises two stages of clustering data points and carrying out abnormity detection on the data points in each cluster, Elahi M et al propose a data flow abnormity detection method based on clustering, a method combining K-Means and LOF is adopted, abnormity factors are defined by regions in the method, and the accuracy of the method on abnormity detection is improved; thakran Y et al also propose a method combining the DBSCAN method with the W-K-Means method, the method clusters the data block at the current moment by using a DBSCAN method to obtain candidate abnormal points and initial clusters, the method combines the candidate abnormal points to be subjected to multiple verification obtained at the previous moment, uses a W-K-Means method to perform clustering again to obtain the candidate abnormal points and the conventional point clusters at the current moment, meanwhile, the method adopts multiple verification to delete the misjudged abnormal points to the candidate abnormal points to release the memory, the method dynamically adjusts the attribute weight of the parameters MinPts, Epsilon and W-K-Means method required by the DBSCAN method in the whole process, the method has higher accuracy on anomaly detection, but too many manually set parameters are needed, manual intervention is serious, the complexity of the method is higher, and the effectiveness of the method in a multi-dimensional space is poorer.
Data flow anomaly detection is a research hotspot and difficulty in the field of data mining nowadays, and the main aim is to accurately detect information which does not conform to a conventional mode in real time from a complex data environment which is dynamically changed.
Disclosure of Invention
Aiming at the problems of high time complexity, large memory occupation, low use efficiency, excessive human parameter intervention, low effectiveness in a multi-dimensional data environment and the like of the traditional method, the invention provides a data flow abnormity detection and multi-verification method based on an enhanced angle abnormity factor. The method can reduce the occupancy rate of the memory, and has good real-time performance, high accuracy rate of abnormal detection and low time complexity.
The technical scheme for realizing the purpose of the invention is as follows:
a method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor comprises the following steps:
1) processing the real-time data stream: for various real-time data streams acquired by a data acquisition terminal, according to the minimum unit forming data and the time sequence during acquisition, forming data blocks from data acquired in a time period, forming a data set S processed by a sliding window from a plurality of data blocks, and preparing for the processing in the step 2);
2) setting the result obtained after the processing in the step 1) to obtain a data set S in the current sliding window, wherein S is { X ═ X1,X2,...,XnN data points, each data point being represented by its attribute
Figure GDA0002479153140000031
Wherein
Figure GDA0002479153140000032
Represents the data point xiD attributes are used for subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an abnormal decision threshold adjustment coefficient, and an abnormal decision threshold theta is mu + ξ. delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
Figure GDA0002479153140000033
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point set
Figure GDA0002479153140000034
And local density
Figure GDA0002479153140000035
Obtaining an angle factor of the r neighborhood point set by combining the distance matrix dist
Figure GDA0002479153140000036
And local density of r neighborhood point set
Figure GDA0002479153140000037
Wherein N isrData point xiR neighborhood of (a);
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)
Figure GDA0002479153140000038
After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) The calculation formula is formula (5):
Figure GDA0002479153140000039
cluster heart factor τ (x)i) To measure how well the data points are at the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set, namely a cluster, from all data points with the same class label to obtain m-Ccenter_idAn individual cluster C1,C2,...,CmIn which C iscenter_idThe cluster center is the serial number of the cluster center, and the clustering of the data set S is completed;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster in step 10)Ci( i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factor
Figure GDA0002479153140000041
Is formula (7):
Figure GDA0002479153140000042
where A, B, C are the data points in the data set,
Figure GDA0002479153140000043
a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,
Figure GDA0002479153140000044
is the number of data points in the neighborhood of data point q,
local increment value H (X)j) Is formula (8):
Figure GDA0002479153140000045
distance sum of k nearest neighbors L (X)j) Is of formula (9):
Figure GDA0002479153140000046
wherein the content of the first and second substances,
Figure GDA0002479153140000047
represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
Figure GDA0002479153140000051
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,
Figure GDA0002479153140000052
represents a cluster Ci(i-1, 2, …, m) angle factor of each data point relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E0,E1,E2,... the data blocks, each data block represents a basic window, each sliding window W contains 2 basic windows, the insertion and deletion of data are realized by combining the basic window and the sliding window, and the process of combining the basic window and the sliding window is as follows: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
Figure GDA0002479153140000053
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
Figure GDA0002479153140000061
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4):
Figure GDA0002479153140000062
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
The attribution matrix F ═ F) in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
Figure GDA0002479153140000063
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
The data flow abnormity detection method is divided into 2 processes, namely a data flow processing process and a data flow abnormity detection process. In the data flow processing process, dynamic data flow is converted into static data blocks, so that subsequent abnormal detection is facilitated, and the real-time performance and the high efficiency of the whole detection are ensured; the data flow abnormity detection process is used for carrying out abnormity detection on the static data set processed in the data flow processing process, and in order to improve the abnormity detection accuracy, a method of clustering firstly and then carrying out abnormity detection is adopted. In the technical scheme, the real-time data stream processing method combining the sliding window and the basic window is the core of the data stream processing process, the memory occupancy rate is reduced, the quality of subsequent abnormal detection is improved, the cluster center factor and the attribution matrix are two parameters which are newly introduced in the technical scheme and used for determining the cluster center and clustering, the cluster center of the multidimensional data space can be rapidly and effectively determined, and accurate clustering is carried out according to the determined cluster center; the enhanced angle anomaly factor is another important parameter in the technical scheme, makes up for partial defects of the traditional anomaly factor, retains the effectiveness of an angle measurement mode in a multi-dimensional space, and is the core of an anomaly detection part.
The method applies sliding window and basic window technologies, constructs an efficient data stream processing model, reduces the occupancy rate of the memory, and has good real-time performance, high accuracy of abnormal detection and low time complexity.
Drawings
FIG. 1 is a schematic flow chart of the method in the example;
FIG. 2 shows example t1A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 3 shows example t2A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 4 is a diagram illustrating the combination of the sliding window and the base window to process the real-time data stream and the multiple verification processes in one embodiment;
FIG. 5 is a graph illustrating an exemplary angular measure of data points;
FIG. 6 is a schematic diagram illustrating a data point distribution of the U-shaped cluster data based on the conventional angle measurement method in the embodiment;
FIG. 7 is a schematic diagram illustrating a data point distribution of multi-cluster data misjudged based on a conventional angle measurement method in an embodiment;
FIG. 8 is a diagram illustrating the distribution of original coordinates of a data set in an embodiment;
FIG. 9 is a schematic diagram showing a local density-degree of dissimilarity distribution in the example;
FIG. 10 is a diagram showing the distribution of the cluster cofactors in the example;
FIG. 11a is a schematic diagram of the distribution of the data set 1 in the example;
FIG. 11b is a diagram showing the distribution of outliers in the data set 1 according to the example;
FIG. 11c is a schematic diagram showing the abnormal point identifiers detected by the abnormal detection of the data set 1 in the embodiment;
FIG. 11d is a schematic diagram illustrating the data set 1 in the embodiment where the abnormal detection is performed by using the normal point as the abnormal point identifier;
FIG. 12a is a schematic diagram of the distribution of the data set 2 in the example;
FIG. 12b is a diagram illustrating an actual abnormal distribution of the data set 2 in the embodiment;
FIG. 12c is a schematic diagram showing the abnormal point identifiers detected by the abnormal detection of the data set 2 in the embodiment;
FIG. 12d is a diagram illustrating the data set 2 with abnormal points detected as normal point identifiers by abnormal detection in the embodiment.
Detailed Description
The invention will be further illustrated, but not limited, by the following description of the embodiments with reference to the accompanying drawings.
Referring to fig. 1, a method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factors includes the following steps:
1) processing the real-time data stream: processing various real-time data streams acquired by a data acquisition terminal, wherein the real-time data streams have dynamic and changeable characteristics, and some data objects are represented as abnormal in a current sliding window but are represented as normal points in a sliding window at the next moment, as shown in fig. 2 and 3, and t is t in fig. 21A profile of the time-of-day sliding-window data points, where point P 'appears abnormal, but as data points continue to flow in, more and more data points accumulate around point P', fig. 3, t2The distribution diagram of the data points of the time sliding window shows that the point P' is normal at the moment;
2) setting a data set S in a sliding window: step 1), processing to obtain a data set S in the current sliding window: let S ═ X1,X2,...,XnN data points, each data point being represented by its attribute
Figure GDA0002479153140000081
For subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the k nearest neighbor numbers of data points, r is the spatial neighborhood radius of the data points, ξ is an abnormal decision threshold adjusting coefficient, and an abnormal decision threshold theta is mu + ξ delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
Figure GDA0002479153140000082
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point set
Figure GDA0002479153140000083
And local density
Figure GDA0002479153140000084
Obtaining an angle factor of the r neighborhood point set by combining the distance matrix dist
Figure GDA0002479153140000085
And local density of r neighborhood point set
Figure GDA0002479153140000086
Wherein N isrData point xiR neighborhood of (a); as shown in FIG. 5, the method is based on the angle measurement idea, which calculates the angle between the data point and each other pair of data points, and then takes the variance to find the core region point A1The angle change range formed by the point pair and other points is large, so the variance is large; for anomaliesPoint A3The angle change range formed by the point pair and other points is very small, so the variance is small; and for the boundary point A2The angle between it and other point pairs is in the range of A1And A3The variance is between the range of variation, so the variance is between the core region point and the outlier, but this has some defects, as shown in fig. 6 and 7, the outlier B in fig. 61Located in the center of the U-shaped cluster, and the angle formed by the U-shaped cluster and the surrounding point pair has a large change range, namely, the variance is large, and the edge point B2The angle change range formed by the point pairs with other points is small, namely the variance is small; similarly, the abnormal point D in FIG. 71Located in the middle of the two clusters, the angle formed by the point pair between the point and the two clusters is wide, and the edge point D2The angle change range formed by other points is small; the obtained result is just opposite to the actual result, and missing and misjudgment occur;
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)
Figure GDA0002479153140000091
After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) The calculation formula is formula (5):
Figure GDA0002479153140000092
cluster heart factor τ (x)i) The method is used for measuring the degree of a data point in a cluster center, the cluster center factor is an improved parameter factor for quickly and effectively determining the cluster center of a multidimensional data space in the embodiment method, and is a crucial step in clustering, the implementation process is shown in fig. 8, 9 and 10, and it can be seen that the data set is composed of two clusters, wherein a point 13 and a point 25 are the cluster centers of the two clusters respectively; fig. 9 is a graph showing ρ - δ (local density-dissimilarity) distributions of points in the data set obtained by the equations (3) and (4), and it can be seen that the local densities and dissimilarities of the points 13 and 25 are large; FIG. 10 shows the cluster centers of respective points obtained by the formula (5)The distribution diagram after the descending sorting of the factors shows that the cluster center factor of the point 13 and the point 25 is the largest, and therefore the cluster center is most likely to be the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set of all data points with the same class label, namely clustering to obtain m (m is C)center_id) An individual cluster C1,C2,...,CmCompleting the clustering of the data set S;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster C in step 10)i( i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factor
Figure GDA0002479153140000093
Is formula (7):
Figure GDA0002479153140000102
where A, B, C are the data points in the data set,
Figure GDA0002479153140000103
a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,
Figure GDA0002479153140000104
is the number of data points in the neighborhood of data point q, the local increment value H (X)j) Is formula (8):
Figure GDA0002479153140000105
distance sum of k nearest neighbors L (X)j) Is of formula (9):
Figure GDA0002479153140000106
wherein the content of the first and second substances,
Figure GDA0002479153140000107
represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
Figure GDA0002479153140000108
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,
Figure GDA0002479153140000109
is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process, so that the effect of the accuracy rate of abnormal detection can be improved.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E0,E1,E2,... The data blocks each represent a basic window, each sliding window W contains 2 basic windows, the basic window and the sliding window are combined to realize the insertion and deletion of data, and the basic window and the sliding window are combinedThe process of (2) is shown in fig. 4: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
Figure GDA0002479153140000111
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
Figure GDA0002479153140000112
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4): the dissimilarity is a measure of the probability of different clusters between data points, and is obtained by sorting the local densities obtained in step 6) in descending order from a given data set S
Figure GDA0002479153140000113
Wherein, { piDenotes local density
Figure GDA0002479153140000114
One of the descending original subscript numbers, d (p)i,pj) Representing a data point piAnd pjThe Euclidean distance between them, a certain data point piThe degree of dissimilarity of (a) can be defined as follows:
Figure GDA0002479153140000121
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
The attribution matrix F ═ F) in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
Figure GDA0002479153140000122
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
The step 10) of determining cluster centers and clustering refers to defining the serial number of the cluster centers as C firstcenter_idData points are labeled Ccluster_labelAnd initializes the cluster core number to 1, i.e., Ccenter_id1 is ═ 1; the data point with the largest cluster center factor obtained in step 8) is labeled as 1, that is
Figure GDA0002479153140000123
Then according to the descending subscript number { p) obtained in the step 8)iFourthly, the condition traversal is carried out on the whole data set S, if yes, the condition traversal is carried out
Figure GDA0002479153140000124
And
Figure GDA0002479153140000125
the distances of all points satisfy
Figure GDA0002479153140000126
(wherein r is the initial parameter value neighborhood radius), redefining the point as a new cluster center, increasing the class label of the point by 1, and accordingly obtaining all cluster centers; then, according to the obtained cluster center, reusing the attribution matrix F ═ F) in the step 9)1,f2,...,fn]Pasting the same points belonging to the same cluster centerA label (i.e., class label) by the following method: by the descending subscript number { p) obtained in step 9)iFourthly, the condition traversal is carried out on the whole data set S, if p isiNon-clustered centers, based on the home matrix
Figure GDA0002479153140000127
Corresponding label is assigned to piElse piThe label of (a) is itself, and finally all data points with the same class label are grouped into a set, i.e. a cluster, to obtain m (m ═ C)center_id) An individual cluster C1,C2,...,CmAnd finishing clustering the data set S.
Step 11) is to perform anomaly detection on each clustered cluster, and the anomaly detection specifically includes the following steps:
① for any cluster Ci(i-1, 2, …, m), calculating an angle factor for each data point within the cluster relative to the cluster
Figure GDA0002479153140000131
Is formula (7):
Figure GDA0002479153140000132
wherein, Ci(i ═ 1,2, …, m) represents any cluster after clustering;
② calculate the local delta value H (X) for each data point in the cluster relative to its neighborhood in space rj) Is formula (8):
Figure GDA0002479153140000133
the local increment is to reflect how dense the data points are within the spatial neighborhood of the cluster to which they belong, wherein,
Figure GDA0002479153140000134
data points X are representedjIn the r neighborhood of its cluster
Figure GDA0002479153140000135
Number of data points in
③) calculating the distance dist (o, X) between each data point and the cluster center of the cluster according to the cluster centers confirmed in the step 10)j);
④ calculate the sum of the distance L (X) of each data point from its k nearest neighborsj) Is of formula (9):
Figure GDA0002479153140000136
wherein the content of the first and second substances,
Figure GDA0002479153140000137
represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbor belongs, and the sum of distances L (X) of the k nearest neighborsj) Reflecting how far and near the data point is from the surrounding data points, so as to avoid the angle-based abnormality factor appearing similarly to B in FIG. 61The presence of defects;
⑤ calculate an enhanced angular anomaly factor EAOF (X) for each data pointj) Is formula (10):
Figure GDA0002479153140000141
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,
Figure GDA0002479153140000142
is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value; the enhanced angle anomaly factor EAOF not only has excellent measurement performance of an angle measurement mode in a multi-dimensional space, but also introduces the ideas of distance and density, and makes up the defects of the traditional angle anomaly factor-based method;
⑥, calculating a mean value mu and a standard deviation delta of all data point enhanced angle anomaly factors obtained from ⑤, and calculating an anomaly decision threshold value theta by using the mean value and the standard deviation, wherein theta is mu + ξ. delta, and ξ is an initially set anomaly decision threshold value adjustment coefficient;
⑦ enhancing the angle anomaly factor EAOF (X) obtained from ⑤j) Comparing with the decision threshold theta obtained in ⑥, if EAOF (X) is satisfiedj) If > theta, marking the point as a candidate abnormal object in the cluster and storing a candidate abnormal point set O of the clusteriIn (1).
The embodiment provides a data stream anomaly detection and multiple verification method based on enhanced angle anomaly factors, which adopts a technology of combining a sliding window and a basic window, constructs a high-efficiency real-time data stream processing technology, and introduces the enhanced angle anomaly factors, thereby solving the problems of high memory occupancy rate and low data processing efficiency of the traditional method, and simultaneously ensuring the advantages of high real-time performance, high anomaly detection accuracy and low time complexity.
In order to verify the effectiveness of the method of the present embodiment, the following will be further explained by comparing the simulation results:
in this embodiment, verification is performed on both a manually generated data set and a real data set, and the verification is compared with a weighted clustering-based data flow unsupervised anomaly detection method (abbreviated as method I) proposed by the traditional methods I-IncLOF, Thakran and the like, experimental data set information is shown in table 1, table 1 is experimental data set information, and the three data sets are data sets with different dimensions, different data amounts and different data characteristics.
The data distribution of the artificial data set 1 is shown in FIG. 11a, which has 1615 data points in total, and consists of 5 clusters and 15 discrete points, wherein the cluster 1 is a Gaussian distribution N1(u11) The 500 data points generated are composed, and the cluster 2 is a Gaussian distribution N2(u22) The 500 data points generated are composed, and the cluster 3 is a Gaussian distribution N3(u33) 500 data points are generated, and the cluster 4 and the cluster 5 are respectively composed of Gaussian distribution N4(u44) And N5(u55) 50 data points generated are composed, and N is4And N5The number of data points is very small and is therefore considered an outlier cluster. Meanwhile, according to the distribution characteristics of the data set, 15 discrete abnormal points are randomly generated, so the data set contains 115 abnormal points in total, the distribution situation is shown in fig. 11b, the abnormal points are marked by circles, in the experimental process, the abnormal clusters and the discrete abnormal points are randomly mixed into the normal clusters, and the following parameters are used for generating the data set 1 by gaussian distribution:
μ1=[+1 +1],μ2=[-1 -1],μ3=[+1 -1],μ4=[-1 +1],μ5=[0 0]
Figure GDA0002479153140000151
the data distribution of the artificial data set 2 is shown in fig. 12a, and there are 860 data points, which are composed of 3 normal clusters and 1 abnormal cluster, and 48 discrete abnormal points, wherein the abnormal cluster is composed of 21 abnormal points. Therefore, the data set has 69 abnormal points, and the distribution of the abnormal points is shown in fig. 12 b.
The real data set Breast Cancer is shown in Table 1, the data set is derived from a UCI machine learning library, comprises 699 data points, and consists of two normal clusters, wherein in order to verify the validity of the method, 34 abnormal points are added to the real data set according to statistical characteristics such as mean, variance and the like, and are used for comparison and verification of abnormal detection.
In the verification experiment of the method of this embodiment, the length of a basic window is set to be 20, two basic windows form a sliding window, the number k of nearest neighbor points is 3, the radius of a spatial neighborhood is determined as the mean value of the first 20% distance values of descending order of the distance values between data points in the sliding window at the current moment, the adjustment coefficient of an anomaly decision threshold is 2.5, the number of times of multiple verification is determined as 3, and meanwhile, the detection rate and the false decision rate which can most reflect the effectiveness of the anomaly detection method are selected for comparison, as shown in fig. 11a to 11d and fig. 12a to 12d, which are visualization experiment results of a data set 1 and a data set 2.
For the artificial data set 1, as can be seen from fig. 11a to 11d, 2 abnormal clusters and 15 discrete abnormal points can be effectively detected by using the method, and the effect of zero missing detection is achieved, as can be seen from fig. 11d, 3 normal points are mistakenly detected as abnormal points because the normal points are generated by normal gaussian distribution, but are slightly far away from the normal clusters and are all represented as abnormal in 3 consecutive times of multiple verifications, so that the abnormal points are determined as abnormal points;
for the artificial data set 2, as can be seen from fig. 12a to 12d, the method still maintains good effectiveness in the three-dimensional data space, and as can be seen from fig. 12b, 12c, and 12d, all the points in the abnormal cluster can be detected, and 47 of the 48 discrete abnormal points are detected, and one discrete abnormal point is missed, and the reason for the missed detection is that the missed detection point is closer to the normal cluster, so that a certain time appears normal in the multi-verification, and therefore the point is determined to be the normal point.
While the effectiveness of the method of the present embodiment is verified, the method of the present embodiment is compared with a conventional method, and the advantages of the method of the present embodiment are further verified, as shown in table 2, table 2 is statistical information of experimental results, and detailed statistical results of comparative experiments on three data sets are obtained. As can be seen from table 2, the method provided by this embodiment has high detection rate, low false positive rate, and effectiveness is significantly better than the other two methods, and the superiority of the method is more significant when the dimension of the data set is higher, method I combines W-K-Means and DBSCAN methods, and dynamically updates parameters and weights of each dimension required by DBSCAN, so method I has good adaptability to dynamic data streams, but because it uses a conventional distance and density-based abnormal measurement mode, the effectiveness is reduced when the dimension increases; the I-IncLOF method uses the idea based on local density, is also influenced by dimension disasters, and has better performance when the data dimension is lower, but has poorer effectiveness when the dimension is increased.
Through the verification of different data sets and the comparative analysis with the traditional method, it can be seen that the method for data stream anomaly detection and multi-verification based on the enhanced angle anomaly factor provided by the embodiment has better effectiveness and feasibility.
TABLE 1
Figure GDA0002479153140000161
TABLE 2
Figure GDA0002479153140000162

Claims (6)

1. A method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor is characterized by comprising the following steps:
1) processing the real-time data stream: forming data obtained in a time period into data blocks according to the minimum unit forming the data and the time sequence during acquisition of various real-time data streams acquired by a data acquisition terminal, and forming a data set S processed by a sliding window by using a plurality of data blocks;
2) setting the result obtained after the processing in the step 1) to obtain a data set S in the current sliding window, wherein S is { X ═ X1,X2,...,XnN data points, each data point being represented by its attribute
Figure FDA0002479153130000011
Wherein
Figure FDA0002479153130000012
Represents the data point xiD attributes are used for subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an abnormal decision threshold adjustment coefficient, and an abnormal decision threshold theta is mu + ξ. delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
Figure FDA0002479153130000013
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point set
Figure FDA0002479153130000014
And local density
Figure FDA0002479153130000015
Obtaining an angle factor of the r neighborhood point set by combining the distance matrix dist
Figure FDA0002479153130000016
And local density of r neighborhood point set
Figure FDA0002479153130000017
Wherein N isrData point xiR neighborhood of (a);
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)
Figure FDA0002479153130000018
After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) Is formula (5):
Figure FDA0002479153130000019
cluster heart factor τ (x)i) To measure how well the data points are at the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set, namely a cluster, from all data points with the same class label to obtain m-Ccenter_idAn individual cluster C1,C2,...,CmIn which C iscenter_idThe cluster center is the serial number of the cluster center, and the clustering of the data set S is completed;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster C in step 10)i(i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factor
Figure FDA0002479153130000021
Is formula (7):
Figure FDA0002479153130000022
where A, B, C are the data points in the data set,
Figure FDA0002479153130000023
a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,
Figure FDA0002479153130000024
is within the neighborhood of the data point qThe number of data points of (a),
local increment value H (X)j) Is formula (8):
Figure FDA0002479153130000025
distance sum of k nearest neighbors L (X)j) Is of formula (9):
Figure FDA0002479153130000026
wherein the content of the first and second substances,
Figure FDA0002479153130000027
represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
Figure FDA0002479153130000031
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,
Figure FDA0002479153130000032
is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process.
2. The method for data stream anomaly detection and multi-validation based on enhanced angle anomaly factor as claimed in claim 1, wherein said processing in step 1) refers to data collected by a data collection terminalBuffering in stream form, and dividing buffered data into E0,E1,E2,... each data block represents a basic window, each sliding window W contains 2 basic windows, and the process of inserting and deleting data is realized by combining the basic window and the sliding window, and the process of combining the basic window and the sliding window is as follows: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
3. The method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factor as claimed in claim 1, wherein the calculation formula of the angle factor of the r neighborhood point set in step 6) is formula (2):
Figure FDA0002479153130000033
4. the method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factor as claimed in claim 1, wherein said local density calculation formula of r neighborhood point set in step 6) is formula (3):
Figure FDA0002479153130000041
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
5. The enhanced angle anomaly factor based data flow anomaly according to claim 1The method for detection and multiple verification, characterized in that the dissimilarity degree delta (x) in the step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4):
Figure FDA0002479153130000042
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
6. The method for enhanced angle anomaly factor based data stream anomaly detection and multi-verification according to claim 1, wherein said home matrix F ═ F in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
Figure FDA0002479153130000043
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
CN201710823063.0A 2017-09-13 2017-09-13 Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method Active CN107682319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710823063.0A CN107682319B (en) 2017-09-13 2017-09-13 Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710823063.0A CN107682319B (en) 2017-09-13 2017-09-13 Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method

Publications (2)

Publication Number Publication Date
CN107682319A CN107682319A (en) 2018-02-09
CN107682319B true CN107682319B (en) 2020-07-03

Family

ID=61136410

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710823063.0A Active CN107682319B (en) 2017-09-13 2017-09-13 Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method

Country Status (1)

Country Link
CN (1) CN107682319B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110311879B (en) * 2018-03-20 2022-02-22 重庆邮电大学 Data flow abnormity identification method based on random projection angle distribution
CN108667684B (en) * 2018-03-30 2021-04-30 桂林电子科技大学 Data flow anomaly detection method based on local vector dot product density
CN109344171A (en) * 2018-12-21 2019-02-15 中国计量大学 A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing
CN109978070A (en) * 2019-04-03 2019-07-05 北京市天元网络技术股份有限公司 A kind of improved K-means rejecting outliers method and device
CN112800101A (en) * 2019-11-13 2021-05-14 中国信托登记有限责任公司 FP-growth algorithm based abnormal behavior detection method and model applying same
CN111125470A (en) * 2019-12-25 2020-05-08 成都康赛信息技术有限公司 Method for improving abnormal data mining and screening
CN111680751B (en) * 2020-06-09 2023-05-30 南京农业大学 Abnormal data detection algorithm for grain yield map
CN112286951A (en) * 2020-11-26 2021-01-29 杭州数梦工场科技有限公司 Data detection method and device
CN112381181B (en) * 2020-12-11 2022-10-04 桂林电子科技大学 Dynamic detection method for building energy consumption abnormity
CN113225391B (en) * 2021-04-27 2022-11-08 东莞中山大学研究院 Atmospheric environment monitoring quality monitoring method based on sliding window anomaly detection and computing equipment
CN113537061B (en) * 2021-07-16 2024-03-26 中天通信技术有限公司 Method, device and storage medium for identifying format of two-dimensional quadrature amplitude modulation signal
CN115271003B (en) * 2022-09-30 2023-01-03 江苏云天新材料制造有限公司 Abnormal data analysis method and system for automatic environment monitoring equipment
CN116089846B (en) * 2023-04-03 2023-07-25 北京智蚁杨帆科技有限公司 New energy settlement data anomaly detection and early warning method based on data clustering
CN116502169B (en) * 2023-06-28 2023-08-22 深圳特力自动化工程有限公司 Centrifugal dehydrator working state detection method based on data detection
CN116628729B (en) * 2023-07-25 2023-09-29 天津市城市规划设计研究总院有限公司 Method and system for improving data security according to data characteristic differentiation
CN117313957B (en) * 2023-11-28 2024-02-27 威海华创软件有限公司 Intelligent prediction method for production flow task amount based on big data analysis

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336906A (en) * 2013-07-15 2013-10-02 哈尔滨工业大学 Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor
CN103974311A (en) * 2014-05-21 2014-08-06 哈尔滨工业大学 Condition monitoring data stream anomaly detection method based on improved gaussian process regression model
CN104283737A (en) * 2014-09-30 2015-01-14 杭州华为数字技术有限公司 Data flow processing method and device
CN104809594A (en) * 2015-05-13 2015-07-29 中国电力科学研究院 Distribution network data online cleaning method based on dynamic outlier detection
CN104902509A (en) * 2015-05-19 2015-09-09 浙江农林大学 Abnormal data detection method based on top-k(sigma) algorithm
CN106506556A (en) * 2016-12-29 2017-03-15 北京神州绿盟信息安全科技股份有限公司 A kind of network flow abnormal detecting method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7809513B2 (en) * 2007-04-16 2010-10-05 Acellent Technologies, Inc. Environmental change compensation in a structural health monitoring system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103336906A (en) * 2013-07-15 2013-10-02 哈尔滨工业大学 Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor
CN103974311A (en) * 2014-05-21 2014-08-06 哈尔滨工业大学 Condition monitoring data stream anomaly detection method based on improved gaussian process regression model
CN104283737A (en) * 2014-09-30 2015-01-14 杭州华为数字技术有限公司 Data flow processing method and device
CN104809594A (en) * 2015-05-13 2015-07-29 中国电力科学研究院 Distribution network data online cleaning method based on dynamic outlier detection
CN104902509A (en) * 2015-05-19 2015-09-09 浙江农林大学 Abnormal data detection method based on top-k(sigma) algorithm
CN106506556A (en) * 2016-12-29 2017-03-15 北京神州绿盟信息安全科技股份有限公司 A kind of network flow abnormal detecting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于聚类的异常挖掘算法研究;苏晓珂;《中国博士学位论文全文数据库信息科技辑》;20110815(第08期);全文 *

Also Published As

Publication number Publication date
CN107682319A (en) 2018-02-09

Similar Documents

Publication Publication Date Title
CN107682319B (en) Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method
CN108667684B (en) Data flow anomaly detection method based on local vector dot product density
CN109191922B (en) Large-scale four-dimensional track dynamic prediction method and device
CN115577275A (en) Time sequence data anomaly monitoring system and method based on LOF and isolated forest
CN108985380B (en) Point switch fault identification method based on cluster integration
CN109218223B (en) Robust network traffic classification method and system based on active learning
CN109977895B (en) Wild animal video target detection method based on multi-feature map fusion
CN111046968B (en) Road network track clustering analysis method based on improved DPC algorithm
CN107609105B (en) Construction method of big data acceleration structure
CN108154158B (en) Building image segmentation method for augmented reality application
CN110263834B (en) Method for detecting abnormal value of new energy power quality
CN111368867B (en) File classifying method and system and computer readable storage medium
CN112381181A (en) Dynamic detection method for building energy consumption abnormity
CN110879881A (en) Mouse track recognition method based on feature component hierarchy and semi-supervised random forest
CN105139031A (en) Data processing method based on subspace clustering
CN106570104B (en) Multi-partition clustering preprocessing method for stream data
CN109597757B (en) Method for measuring similarity between software networks based on multidimensional time series entropy
CN110544047A (en) Bad data identification method
CN115880337A (en) Target tracking method and system based on heavy parameter convolution and feature filter
CN117078048A (en) Digital twinning-based intelligent city resource management method and system
CN113537321A (en) Network traffic anomaly detection method based on isolated forest and X-means
CN110458094B (en) Equipment classification method based on fingerprint similarity
CN109389172B (en) Radio signal data clustering method based on non-parameter grid
CN113128584B (en) Mode-level unsupervised sorting method of multifunctional radar pulse sequence
CN107104747B (en) Clustering method of multipath components in wireless time-varying channel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant