CN107682319B - Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method - Google Patents
Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method Download PDFInfo
- Publication number
- CN107682319B CN107682319B CN201710823063.0A CN201710823063A CN107682319B CN 107682319 B CN107682319 B CN 107682319B CN 201710823063 A CN201710823063 A CN 201710823063A CN 107682319 B CN107682319 B CN 107682319B
- Authority
- CN
- China
- Prior art keywords
- data
- cluster
- point
- points
- neighborhood
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
- G06F16/24568—Data stream processing; Continuous queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Medicines Containing Antibodies Or Antigens For Use As Internal Diagnostic Agents (AREA)
- External Artificial Organs (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The method is characterized by comprising the following steps of 1) processing a real-time data stream, 2) setting a data set S in a sliding window, 3) initializing parameters k, r and ξ, 4) obtaining a distance matrix dist, 5) obtaining a r neighborhood point set, and 6) obtaining an angle factor of the r neighborhood point setAnd local density7) Obtaining dissimilarity degrees; 8) acquiring a cluster center factor of each data point; 9) acquiring an attribution matrix; 10) determining a cluster center and clustering; 11) respectively carrying out anomaly detection on each clustered cluster; 12) and (5) performing multiple verification. The method applies sliding window and basic window technologies, constructs an efficient data stream processing model, reduces the occupancy rate of the memory, and has good real-time performance, high accuracy of abnormal detection and low time complexity.
Description
Technical Field
The invention relates to data flow anomaly detection and data clustering, in particular to a data flow anomaly detection and multiple verification method based on enhanced angle anomaly factors.
Background
The rapid development of network technology and the continuous improvement of social informatization lead to the explosive increase of information quantity, so that various industries generate massive, high-speed and dynamic stream data, such as network intrusion monitoring, commercial transaction management and analysis, video monitoring, sensor network monitoring and the like. Due to the characteristics of real-time infinite dynamic data flow and the like, the traditional static data anomaly detection method cannot accurately and effectively analyze and process the large-scale dynamically-increased flow data, so that the construction of a real-time effective anomaly detection method suitable for the data flow becomes particularly important.
For the practical problems faced by different stages, different data stream anomaly detection methods are provided by scientific and technological workers. The conventional data flow anomaly detection methods can be roughly classified into density-based data flow anomaly detection methods, angle-based data flow anomaly detection methods, and cluster-based data flow anomaly detection methods. The density-based data flow anomaly detection method applies density as the most basic anomaly measurement mode and constructs an anomaly factor which can be dynamically updated and is used for measuring the data anomaly degree, Pokrajac et al quotes a static data anomaly detection method LOF into a data flow and researches an incremental local anomaly detection method INCLOF which can be applied to the dynamic data flow, and the INCLOF deletes historical data and dynamically updates the anomaly factor of each data point along with the insertion of new data; the method of improving INCLOF by Ke Gao et al introduces the idea of sliding window, and proposes an n-INCLOF method, wherein the n-INCLOF method only updates the abnormal factors of each data object in the sliding window at the current moment; in some cases, some data points show abnormality at a certain moment, but do not show abnormality at the next moment, based on the problem, Karimian S H et al propose an I-IncLOF method, the I-IncLOF method introduces a multiple verification idea, the I-IncLOF method judges data objects which show abnormality all the time in the whole sliding process of a window as determined abnormal points, the I-IncLOF method greatly reduces the false judgment rate, but the I-IncLOF method is poor in effectiveness under a multi-dimensional condition; xinjielilu et al propose an INCLOCI method, which introduces a multi-granularity abnormal factor MDEF, and can detect not only scattered abnormal points but also abnormal clusters. In order to solve the problem that the effectiveness of similarity measurement modes such as distance and density is reduced in a high-dimensional data space, some scientific researchers provide angle measurement modes, the basic idea of the angle similarity measurement is that the angle formed by an abnormal point and other points is generally small and the fluctuation range is small, the angle formed by a conventional point and other points is large and the fluctuation range is large, HP Kriegel et al provide an angle-based abnormality detection method ABOD, the ABOD method takes the variance of the angle as an abnormality factor ABOF for measuring the abnormality degree of a data point, and the ABOD method still has high detection accuracy in the high-dimensional space; YeH provides an angle-based data stream anomaly detection method DSABOD, the DSABOD dynamically updates an anomaly factor of each data point relative to a neighborhood point of the data point along with the continuous flow of the data point of the data stream into a memory, the DSABOD provides a new idea for anomaly detection in a high-dimensional data stream, but the traditional angle-based data stream anomaly detection method has the problem of low anomaly detection rate. The data flow abnormity detection method based on clustering comprises two stages of clustering data points and carrying out abnormity detection on the data points in each cluster, Elahi M et al propose a data flow abnormity detection method based on clustering, a method combining K-Means and LOF is adopted, abnormity factors are defined by regions in the method, and the accuracy of the method on abnormity detection is improved; thakran Y et al also propose a method combining the DBSCAN method with the W-K-Means method, the method clusters the data block at the current moment by using a DBSCAN method to obtain candidate abnormal points and initial clusters, the method combines the candidate abnormal points to be subjected to multiple verification obtained at the previous moment, uses a W-K-Means method to perform clustering again to obtain the candidate abnormal points and the conventional point clusters at the current moment, meanwhile, the method adopts multiple verification to delete the misjudged abnormal points to the candidate abnormal points to release the memory, the method dynamically adjusts the attribute weight of the parameters MinPts, Epsilon and W-K-Means method required by the DBSCAN method in the whole process, the method has higher accuracy on anomaly detection, but too many manually set parameters are needed, manual intervention is serious, the complexity of the method is higher, and the effectiveness of the method in a multi-dimensional space is poorer.
Data flow anomaly detection is a research hotspot and difficulty in the field of data mining nowadays, and the main aim is to accurately detect information which does not conform to a conventional mode in real time from a complex data environment which is dynamically changed.
Disclosure of Invention
Aiming at the problems of high time complexity, large memory occupation, low use efficiency, excessive human parameter intervention, low effectiveness in a multi-dimensional data environment and the like of the traditional method, the invention provides a data flow abnormity detection and multi-verification method based on an enhanced angle abnormity factor. The method can reduce the occupancy rate of the memory, and has good real-time performance, high accuracy rate of abnormal detection and low time complexity.
The technical scheme for realizing the purpose of the invention is as follows:
a method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor comprises the following steps:
1) processing the real-time data stream: for various real-time data streams acquired by a data acquisition terminal, according to the minimum unit forming data and the time sequence during acquisition, forming data blocks from data acquired in a time period, forming a data set S processed by a sliding window from a plurality of data blocks, and preparing for the processing in the step 2);
2) setting the result obtained after the processing in the step 1) to obtain a data set S in the current sliding window, wherein S is { X ═ X1,X2,...,XnN data points, each data point being represented by its attributeWhereinRepresents the data point xiD attributes are used for subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an abnormal decision threshold adjustment coefficient, and an abnormal decision threshold theta is mu + ξ. delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point setAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point setWherein N isrData point xiR neighborhood of (a);
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) The calculation formula is formula (5):cluster heart factor τ (x)i) To measure how well the data points are at the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set, namely a cluster, from all data points with the same class label to obtain m-Ccenter_idAn individual cluster C1,C2,...,CmIn which C iscenter_idThe cluster center is the serial number of the cluster center, and the clustering of the data set S is completed;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster in step 10)Ci( i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factorIs formula (7):
where A, B, C are the data points in the data set,a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,is the number of data points in the neighborhood of data point q,
distance sum of k nearest neighbors L (X)j) Is of formula (9):
wherein the content of the first and second substances,represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,represents a cluster Ci(i-1, 2, …, m) angle factor of each data point relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E0,E1,E2,... the data blocks, each data block represents a basic window, each sliding window W contains 2 basic windows, the insertion and deletion of data are realized by combining the basic window and the sliding window, and the process of combining the basic window and the sliding window is as follows: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4):
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
The attribution matrix F ═ F) in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
The data flow abnormity detection method is divided into 2 processes, namely a data flow processing process and a data flow abnormity detection process. In the data flow processing process, dynamic data flow is converted into static data blocks, so that subsequent abnormal detection is facilitated, and the real-time performance and the high efficiency of the whole detection are ensured; the data flow abnormity detection process is used for carrying out abnormity detection on the static data set processed in the data flow processing process, and in order to improve the abnormity detection accuracy, a method of clustering firstly and then carrying out abnormity detection is adopted. In the technical scheme, the real-time data stream processing method combining the sliding window and the basic window is the core of the data stream processing process, the memory occupancy rate is reduced, the quality of subsequent abnormal detection is improved, the cluster center factor and the attribution matrix are two parameters which are newly introduced in the technical scheme and used for determining the cluster center and clustering, the cluster center of the multidimensional data space can be rapidly and effectively determined, and accurate clustering is carried out according to the determined cluster center; the enhanced angle anomaly factor is another important parameter in the technical scheme, makes up for partial defects of the traditional anomaly factor, retains the effectiveness of an angle measurement mode in a multi-dimensional space, and is the core of an anomaly detection part.
The method applies sliding window and basic window technologies, constructs an efficient data stream processing model, reduces the occupancy rate of the memory, and has good real-time performance, high accuracy of abnormal detection and low time complexity.
Drawings
FIG. 1 is a schematic flow chart of the method in the example;
FIG. 2 shows example t1A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 3 shows example t2A schematic diagram of a data point distribution diagram in a time sliding window;
FIG. 4 is a diagram illustrating the combination of the sliding window and the base window to process the real-time data stream and the multiple verification processes in one embodiment;
FIG. 5 is a graph illustrating an exemplary angular measure of data points;
FIG. 6 is a schematic diagram illustrating a data point distribution of the U-shaped cluster data based on the conventional angle measurement method in the embodiment;
FIG. 7 is a schematic diagram illustrating a data point distribution of multi-cluster data misjudged based on a conventional angle measurement method in an embodiment;
FIG. 8 is a diagram illustrating the distribution of original coordinates of a data set in an embodiment;
FIG. 9 is a schematic diagram showing a local density-degree of dissimilarity distribution in the example;
FIG. 10 is a diagram showing the distribution of the cluster cofactors in the example;
FIG. 11a is a schematic diagram of the distribution of the data set 1 in the example;
FIG. 11b is a diagram showing the distribution of outliers in the data set 1 according to the example;
FIG. 11c is a schematic diagram showing the abnormal point identifiers detected by the abnormal detection of the data set 1 in the embodiment;
FIG. 11d is a schematic diagram illustrating the data set 1 in the embodiment where the abnormal detection is performed by using the normal point as the abnormal point identifier;
FIG. 12a is a schematic diagram of the distribution of the data set 2 in the example;
FIG. 12b is a diagram illustrating an actual abnormal distribution of the data set 2 in the embodiment;
FIG. 12c is a schematic diagram showing the abnormal point identifiers detected by the abnormal detection of the data set 2 in the embodiment;
FIG. 12d is a diagram illustrating the data set 2 with abnormal points detected as normal point identifiers by abnormal detection in the embodiment.
Detailed Description
The invention will be further illustrated, but not limited, by the following description of the embodiments with reference to the accompanying drawings.
Referring to fig. 1, a method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factors includes the following steps:
1) processing the real-time data stream: processing various real-time data streams acquired by a data acquisition terminal, wherein the real-time data streams have dynamic and changeable characteristics, and some data objects are represented as abnormal in a current sliding window but are represented as normal points in a sliding window at the next moment, as shown in fig. 2 and 3, and t is t in fig. 21A profile of the time-of-day sliding-window data points, where point P 'appears abnormal, but as data points continue to flow in, more and more data points accumulate around point P', fig. 3, t2The distribution diagram of the data points of the time sliding window shows that the point P' is normal at the moment;
2) setting a data set S in a sliding window: step 1), processing to obtain a data set S in the current sliding window: let S ═ X1,X2,...,XnN data points, each data point being represented by its attributeFor subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the k nearest neighbor numbers of data points, r is the spatial neighborhood radius of the data points, ξ is an abnormal decision threshold adjusting coefficient, and an abnormal decision threshold theta is mu + ξ delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point setAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point setWherein N isrData point xiR neighborhood of (a); as shown in FIG. 5, the method is based on the angle measurement idea, which calculates the angle between the data point and each other pair of data points, and then takes the variance to find the core region point A1The angle change range formed by the point pair and other points is large, so the variance is large; for anomaliesPoint A3The angle change range formed by the point pair and other points is very small, so the variance is small; and for the boundary point A2The angle between it and other point pairs is in the range of A1And A3The variance is between the range of variation, so the variance is between the core region point and the outlier, but this has some defects, as shown in fig. 6 and 7, the outlier B in fig. 61Located in the center of the U-shaped cluster, and the angle formed by the U-shaped cluster and the surrounding point pair has a large change range, namely, the variance is large, and the edge point B2The angle change range formed by the point pairs with other points is small, namely the variance is small; similarly, the abnormal point D in FIG. 71Located in the middle of the two clusters, the angle formed by the point pair between the point and the two clusters is wide, and the edge point D2The angle change range formed by other points is small; the obtained result is just opposite to the actual result, and missing and misjudgment occur;
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) The calculation formula is formula (5):cluster heart factor τ (x)i) The method is used for measuring the degree of a data point in a cluster center, the cluster center factor is an improved parameter factor for quickly and effectively determining the cluster center of a multidimensional data space in the embodiment method, and is a crucial step in clustering, the implementation process is shown in fig. 8, 9 and 10, and it can be seen that the data set is composed of two clusters, wherein a point 13 and a point 25 are the cluster centers of the two clusters respectively; fig. 9 is a graph showing ρ - δ (local density-dissimilarity) distributions of points in the data set obtained by the equations (3) and (4), and it can be seen that the local densities and dissimilarities of the points 13 and 25 are large; FIG. 10 shows the cluster centers of respective points obtained by the formula (5)The distribution diagram after the descending sorting of the factors shows that the cluster center factor of the point 13 and the point 25 is the largest, and therefore the cluster center is most likely to be the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set of all data points with the same class label, namely clustering to obtain m (m is C)center_id) An individual cluster C1,C2,...,CmCompleting the clustering of the data set S;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster C in step 10)i( i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factorIs formula (7):
where A, B, C are the data points in the data set,a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,is the number of data points in the neighborhood of data point q, the local increment value H (X)j) Is formula (8):
distance sum of k nearest neighbors L (X)j) Is of formula (9):
wherein the content of the first and second substances,represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process, so that the effect of the accuracy rate of abnormal detection can be improved.
The processing in the step 1) means that the data acquired by the data acquisition terminal is cached in a stream form, and the cached data is divided into E0,E1,E2,... The data blocks each represent a basic window, each sliding window W contains 2 basic windows, the basic window and the sliding window are combined to realize the insertion and deletion of data, and the basic window and the sliding window are combinedThe process of (2) is shown in fig. 4: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
The angle factor calculation formula of the r neighborhood point set in the step 6) is a formula (2):
the local density calculation formula of the r neighborhood point set in the step 6) is a formula (3):
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
Dissimilarity δ (x) described in step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4): the dissimilarity is a measure of the probability of different clusters between data points, and is obtained by sorting the local densities obtained in step 6) in descending order from a given data set SWherein, { piDenotes local densityOne of the descending original subscript numbers, d (p)i,pj) Representing a data point piAnd pjThe Euclidean distance between them, a certain data point piThe degree of dissimilarity of (a) can be defined as follows:
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
The attribution matrix F ═ F) in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
The step 10) of determining cluster centers and clustering refers to defining the serial number of the cluster centers as C firstcenter_idData points are labeled Ccluster_labelAnd initializes the cluster core number to 1, i.e., Ccenter_id1 is ═ 1; the data point with the largest cluster center factor obtained in step 8) is labeled as 1, that isThen according to the descending subscript number { p) obtained in the step 8)iFourthly, the condition traversal is carried out on the whole data set S, if yes, the condition traversal is carried outAndthe distances of all points satisfy(wherein r is the initial parameter value neighborhood radius), redefining the point as a new cluster center, increasing the class label of the point by 1, and accordingly obtaining all cluster centers; then, according to the obtained cluster center, reusing the attribution matrix F ═ F) in the step 9)1,f2,...,fn]Pasting the same points belonging to the same cluster centerA label (i.e., class label) by the following method: by the descending subscript number { p) obtained in step 9)iFourthly, the condition traversal is carried out on the whole data set S, if p isiNon-clustered centers, based on the home matrixCorresponding label is assigned to piElse piThe label of (a) is itself, and finally all data points with the same class label are grouped into a set, i.e. a cluster, to obtain m (m ═ C)center_id) An individual cluster C1,C2,...,CmAnd finishing clustering the data set S.
Step 11) is to perform anomaly detection on each clustered cluster, and the anomaly detection specifically includes the following steps:
① for any cluster Ci(i-1, 2, …, m), calculating an angle factor for each data point within the cluster relative to the cluster
wherein, Ci(i ═ 1,2, …, m) represents any cluster after clustering;
② calculate the local delta value H (X) for each data point in the cluster relative to its neighborhood in space rj) Is formula (8):
the local increment is to reflect how dense the data points are within the spatial neighborhood of the cluster to which they belong, wherein,data points X are representedjIn the r neighborhood of its clusterNumber of data points in
③) calculating the distance dist (o, X) between each data point and the cluster center of the cluster according to the cluster centers confirmed in the step 10)j);
④ calculate the sum of the distance L (X) of each data point from its k nearest neighborsj) Is of formula (9):
wherein the content of the first and second substances,represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbor belongs, and the sum of distances L (X) of the k nearest neighborsj) Reflecting how far and near the data point is from the surrounding data points, so as to avoid the angle-based abnormality factor appearing similarly to B in FIG. 61The presence of defects;
⑤ calculate an enhanced angular anomaly factor EAOF (X) for each data pointj) Is formula (10):
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value; the enhanced angle anomaly factor EAOF not only has excellent measurement performance of an angle measurement mode in a multi-dimensional space, but also introduces the ideas of distance and density, and makes up the defects of the traditional angle anomaly factor-based method;
⑥, calculating a mean value mu and a standard deviation delta of all data point enhanced angle anomaly factors obtained from ⑤, and calculating an anomaly decision threshold value theta by using the mean value and the standard deviation, wherein theta is mu + ξ. delta, and ξ is an initially set anomaly decision threshold value adjustment coefficient;
⑦ enhancing the angle anomaly factor EAOF (X) obtained from ⑤j) Comparing with the decision threshold theta obtained in ⑥, if EAOF (X) is satisfiedj) If > theta, marking the point as a candidate abnormal object in the cluster and storing a candidate abnormal point set O of the clusteriIn (1).
The embodiment provides a data stream anomaly detection and multiple verification method based on enhanced angle anomaly factors, which adopts a technology of combining a sliding window and a basic window, constructs a high-efficiency real-time data stream processing technology, and introduces the enhanced angle anomaly factors, thereby solving the problems of high memory occupancy rate and low data processing efficiency of the traditional method, and simultaneously ensuring the advantages of high real-time performance, high anomaly detection accuracy and low time complexity.
In order to verify the effectiveness of the method of the present embodiment, the following will be further explained by comparing the simulation results:
in this embodiment, verification is performed on both a manually generated data set and a real data set, and the verification is compared with a weighted clustering-based data flow unsupervised anomaly detection method (abbreviated as method I) proposed by the traditional methods I-IncLOF, Thakran and the like, experimental data set information is shown in table 1, table 1 is experimental data set information, and the three data sets are data sets with different dimensions, different data amounts and different data characteristics.
The data distribution of the artificial data set 1 is shown in FIG. 11a, which has 1615 data points in total, and consists of 5 clusters and 15 discrete points, wherein the cluster 1 is a Gaussian distribution N1(u1,Σ1) The 500 data points generated are composed, and the cluster 2 is a Gaussian distribution N2(u2,Σ2) The 500 data points generated are composed, and the cluster 3 is a Gaussian distribution N3(u3,Σ3) 500 data points are generated, and the cluster 4 and the cluster 5 are respectively composed of Gaussian distribution N4(u4,Σ4) And N5(u5,Σ5) 50 data points generated are composed, and N is4And N5The number of data points is very small and is therefore considered an outlier cluster. Meanwhile, according to the distribution characteristics of the data set, 15 discrete abnormal points are randomly generated, so the data set contains 115 abnormal points in total, the distribution situation is shown in fig. 11b, the abnormal points are marked by circles, in the experimental process, the abnormal clusters and the discrete abnormal points are randomly mixed into the normal clusters, and the following parameters are used for generating the data set 1 by gaussian distribution:
μ1=[+1 +1],μ2=[-1 -1],μ3=[+1 -1],μ4=[-1 +1],μ5=[0 0]
the data distribution of the artificial data set 2 is shown in fig. 12a, and there are 860 data points, which are composed of 3 normal clusters and 1 abnormal cluster, and 48 discrete abnormal points, wherein the abnormal cluster is composed of 21 abnormal points. Therefore, the data set has 69 abnormal points, and the distribution of the abnormal points is shown in fig. 12 b.
The real data set Breast Cancer is shown in Table 1, the data set is derived from a UCI machine learning library, comprises 699 data points, and consists of two normal clusters, wherein in order to verify the validity of the method, 34 abnormal points are added to the real data set according to statistical characteristics such as mean, variance and the like, and are used for comparison and verification of abnormal detection.
In the verification experiment of the method of this embodiment, the length of a basic window is set to be 20, two basic windows form a sliding window, the number k of nearest neighbor points is 3, the radius of a spatial neighborhood is determined as the mean value of the first 20% distance values of descending order of the distance values between data points in the sliding window at the current moment, the adjustment coefficient of an anomaly decision threshold is 2.5, the number of times of multiple verification is determined as 3, and meanwhile, the detection rate and the false decision rate which can most reflect the effectiveness of the anomaly detection method are selected for comparison, as shown in fig. 11a to 11d and fig. 12a to 12d, which are visualization experiment results of a data set 1 and a data set 2.
For the artificial data set 1, as can be seen from fig. 11a to 11d, 2 abnormal clusters and 15 discrete abnormal points can be effectively detected by using the method, and the effect of zero missing detection is achieved, as can be seen from fig. 11d, 3 normal points are mistakenly detected as abnormal points because the normal points are generated by normal gaussian distribution, but are slightly far away from the normal clusters and are all represented as abnormal in 3 consecutive times of multiple verifications, so that the abnormal points are determined as abnormal points;
for the artificial data set 2, as can be seen from fig. 12a to 12d, the method still maintains good effectiveness in the three-dimensional data space, and as can be seen from fig. 12b, 12c, and 12d, all the points in the abnormal cluster can be detected, and 47 of the 48 discrete abnormal points are detected, and one discrete abnormal point is missed, and the reason for the missed detection is that the missed detection point is closer to the normal cluster, so that a certain time appears normal in the multi-verification, and therefore the point is determined to be the normal point.
While the effectiveness of the method of the present embodiment is verified, the method of the present embodiment is compared with a conventional method, and the advantages of the method of the present embodiment are further verified, as shown in table 2, table 2 is statistical information of experimental results, and detailed statistical results of comparative experiments on three data sets are obtained. As can be seen from table 2, the method provided by this embodiment has high detection rate, low false positive rate, and effectiveness is significantly better than the other two methods, and the superiority of the method is more significant when the dimension of the data set is higher, method I combines W-K-Means and DBSCAN methods, and dynamically updates parameters and weights of each dimension required by DBSCAN, so method I has good adaptability to dynamic data streams, but because it uses a conventional distance and density-based abnormal measurement mode, the effectiveness is reduced when the dimension increases; the I-IncLOF method uses the idea based on local density, is also influenced by dimension disasters, and has better performance when the data dimension is lower, but has poorer effectiveness when the dimension is increased.
Through the verification of different data sets and the comparative analysis with the traditional method, it can be seen that the method for data stream anomaly detection and multi-verification based on the enhanced angle anomaly factor provided by the embodiment has better effectiveness and feasibility.
TABLE 1
TABLE 2
Claims (6)
1. A method for data flow abnormity detection and multiple verification based on an enhanced angle abnormity factor is characterized by comprising the following steps:
1) processing the real-time data stream: forming data obtained in a time period into data blocks according to the minimum unit forming the data and the time sequence during acquisition of various real-time data streams acquired by a data acquisition terminal, and forming a data set S processed by a sliding window by using a plurality of data blocks;
2) setting the result obtained after the processing in the step 1) to obtain a data set S in the current sliding window, wherein S is { X ═ X1,X2,...,XnN data points, each data point being represented by its attributeWhereinRepresents the data point xiD attributes are used for subsequent clustering and anomaly detection;
3) setting initialization parameters k, r and ξ, wherein k represents the number of k nearest neighbors of a data point, r is the spatial neighborhood radius of the data point, ξ is an abnormal decision threshold adjustment coefficient, and an abnormal decision threshold theta is mu + ξ. delta, wherein mu and delta correspond to the mean value and standard deviation of all data point enhanced angle abnormal factors;
4) obtaining a distance matrix dist, namely combining the data set S in the step 2), calculating the distances among all data points, and obtaining a distance matrix dist of n × n, wherein the dist is [ dij]n×nThe calculation formula is formula (1):
wherein XiAnd YjAre all data points in the set S, and xikRepresents the data point xiThere are k attributes, yjkRepresents the data point yjThere are k attributes;
5) obtaining a r neighborhood point set: according to the spatial neighborhood radius r, obtaining an r neighborhood point set of each data point, namely a set of all circled data points at the point by taking the neighborhood radius r as the radius;
6) obtaining an angle factor of a r neighborhood point setAnd local densityObtaining an angle factor of the r neighborhood point set by combining the distance matrix distAnd local density of r neighborhood point setWherein N isrData point xiR neighborhood of (a);
7) obtaining a dissimilarity degree delta (x)i): according to the local density of the r neighborhood point set obtained in the step 6)After sorting, the corresponding dissimilarity degree delta (x) is calculatedi);
8) Obtaining a cluster heart factor τ (x) for each data pointi): combining the step 6) and the step 7) to obtain the cluster heart factor tau (x)i) Is formula (5):cluster heart factor τ (x)i) To measure how well the data points are at the cluster center;
9) acquiring an attribution matrix: sorting all the data point cluster heart factors obtained in the step 8) in a descending order to obtain tau (p)1)≥τ(p2)≥…≥τ(pn) Wherein p isnOriginal serial numbers representing corresponding data points, resulting in a home matrix F ═ F for clustering1,f2,...,fn];
10) Determining cluster centers and clustering: performing cluster center determination and clustering on the data set S by using the cluster center factor and the attribution matrix, forming a set, namely a cluster, from all data points with the same class label to obtain m-Ccenter_idAn individual cluster C1,C2,...,CmIn which C iscenter_idThe cluster center is the serial number of the cluster center, and the clustering of the data set S is completed;
11) and respectively carrying out anomaly detection on each clustered cluster: obtaining each cluster C in step 10)i(i 1,2, …, m), each cluster C in the clustered data set S is first sorted1,C2,...,CmRespectively carrying out anomaly detection to obtain a cluster of anomaly point set OiFinally, all abnormal point sets O ═ O { O in the data set S are obtained1,...,OmThe formula involved in anomaly detection is: intra cluster angle factorIs formula (7):
where A, B, C are the data points in the data set,a represents the ith cluster and each data point in the cluster has a d-dimensional attribute,is within the neighborhood of the data point qThe number of data points of (a),
distance sum of k nearest neighbors L (X)j) Is of formula (9):
wherein the content of the first and second substances,represents the data point XjK neighborhoods consisting of k nearest neighbors in the cluster to which the neighbors belong;
enhanced angular anomaly factor EAOF (X)j) Is formula (10):
wherein o is the data point XjCluster center of the cluster, dist (o, X)j) Is a data point XjThe distance from the cluster center of the cluster,is represented by Ci(i-1, 2, …, m) angle factor of each data point within a cluster relative to the cluster, H (X)j) Is a local delta value;
12) and (3) multiple verification: and verifying all candidate abnormal points for multiple times, judging the candidate abnormal points which are still shown to be abnormal after limited verification as determined abnormal points, outputting and storing the determined abnormal points, and directly discarding the abnormal points if the candidate abnormal points are shown to be normal points in the verification process.
2. The method for data stream anomaly detection and multi-validation based on enhanced angle anomaly factor as claimed in claim 1, wherein said processing in step 1) refers to data collected by a data collection terminalBuffering in stream form, and dividing buffered data into E0,E1,E2,... each data block represents a basic window, each sliding window W contains 2 basic windows, and the process of inserting and deleting data is realized by combining the basic window and the sliding window, and the process of combining the basic window and the sliding window is as follows: at TiTime of day transition to Ti+1At the moment, the sliding window is formed by WiSlide to Wi+1Accompanied by a new basic window Ei+1Merging and history base window Ei-1While removing TiTime WiIncorporation of detected candidate outliers into Wi+1In (3) performing multiple validations.
4. the method for data stream anomaly detection and multi-verification based on enhanced angle anomaly factor as claimed in claim 1, wherein said local density calculation formula of r neighborhood point set in step 6) is formula (3):
wherein N isr(p) is the r neighborhood of the data point p, q is any one data point in the r neighborhood set of the data point p, the local density is related to the number of neighborhood data points and the position of the neighborhood data points, and the more the number of neighborhood data points is, the more the neighborhood data points are located in the center of the data set, the larger the local density is.
5. The enhanced angle anomaly factor based data flow anomaly according to claim 1The method for detection and multiple verification, characterized in that the dissimilarity degree delta (x) in the step 7)i) The local densities of all data points are sorted in descending order, and the dissimilarity degree delta (x)i) The calculation formula of (2) is formula (4):
wherein p isiAnd pjIs the serial number of the corresponding data point, when i is 1, j is more than or equal to 2; when i is larger than or equal to 2, j is smaller than i.
6. The method for enhanced angle anomaly factor based data stream anomaly detection and multi-verification according to claim 1, wherein said home matrix F ═ F in step 9)1,f2,...,fn]The formula is used for recording the attribution relationship between data points, and the expression formula of each element is formula (6):
wherein, { piDenotes the cluster heart factor τ (x)i) The original subscript numbers sorted in descending order.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710823063.0A CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710823063.0A CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107682319A CN107682319A (en) | 2018-02-09 |
CN107682319B true CN107682319B (en) | 2020-07-03 |
Family
ID=61136410
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710823063.0A Active CN107682319B (en) | 2017-09-13 | 2017-09-13 | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107682319B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110311879B (en) * | 2018-03-20 | 2022-02-22 | 重庆邮电大学 | Data flow abnormity identification method based on random projection angle distribution |
CN108667684B (en) * | 2018-03-30 | 2021-04-30 | 桂林电子科技大学 | Data flow anomaly detection method based on local vector dot product density |
CN109344171A (en) * | 2018-12-21 | 2019-02-15 | 中国计量大学 | A kind of nonlinear system characteristic variable conspicuousness mining method based on Data Stream Processing |
CN109978070A (en) * | 2019-04-03 | 2019-07-05 | 北京市天元网络技术股份有限公司 | A kind of improved K-means rejecting outliers method and device |
CN112800101A (en) * | 2019-11-13 | 2021-05-14 | 中国信托登记有限责任公司 | FP-growth algorithm based abnormal behavior detection method and model applying same |
CN111125470A (en) * | 2019-12-25 | 2020-05-08 | 成都康赛信息技术有限公司 | Method for improving abnormal data mining and screening |
CN111680751B (en) * | 2020-06-09 | 2023-05-30 | 南京农业大学 | Abnormal data detection algorithm for grain yield map |
CN112286951A (en) * | 2020-11-26 | 2021-01-29 | 杭州数梦工场科技有限公司 | Data detection method and device |
CN112381181B (en) * | 2020-12-11 | 2022-10-04 | 桂林电子科技大学 | Dynamic detection method for building energy consumption abnormity |
CN113225391B (en) * | 2021-04-27 | 2022-11-08 | 东莞中山大学研究院 | Atmospheric environment monitoring quality monitoring method based on sliding window anomaly detection and computing equipment |
CN113537061B (en) * | 2021-07-16 | 2024-03-26 | 中天通信技术有限公司 | Method, device and storage medium for identifying format of two-dimensional quadrature amplitude modulation signal |
CN115271003B (en) * | 2022-09-30 | 2023-01-03 | 江苏云天新材料制造有限公司 | Abnormal data analysis method and system for automatic environment monitoring equipment |
CN116089846B (en) * | 2023-04-03 | 2023-07-25 | 北京智蚁杨帆科技有限公司 | New energy settlement data anomaly detection and early warning method based on data clustering |
CN116502169B (en) * | 2023-06-28 | 2023-08-22 | 深圳特力自动化工程有限公司 | Centrifugal dehydrator working state detection method based on data detection |
CN116628729B (en) * | 2023-07-25 | 2023-09-29 | 天津市城市规划设计研究总院有限公司 | Method and system for improving data security according to data characteristic differentiation |
CN117313957B (en) * | 2023-11-28 | 2024-02-27 | 威海华创软件有限公司 | Intelligent prediction method for production flow task amount based on big data analysis |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN103974311A (en) * | 2014-05-21 | 2014-08-06 | 哈尔滨工业大学 | Condition monitoring data stream anomaly detection method based on improved gaussian process regression model |
CN104283737A (en) * | 2014-09-30 | 2015-01-14 | 杭州华为数字技术有限公司 | Data flow processing method and device |
CN104809594A (en) * | 2015-05-13 | 2015-07-29 | 中国电力科学研究院 | Distribution network data online cleaning method based on dynamic outlier detection |
CN104902509A (en) * | 2015-05-19 | 2015-09-09 | 浙江农林大学 | Abnormal data detection method based on top-k(sigma) algorithm |
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7809513B2 (en) * | 2007-04-16 | 2010-10-05 | Acellent Technologies, Inc. | Environmental change compensation in a structural health monitoring system |
-
2017
- 2017-09-13 CN CN201710823063.0A patent/CN107682319B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103336906A (en) * | 2013-07-15 | 2013-10-02 | 哈尔滨工业大学 | Sampling GPR method of continuous anomaly detection in collecting data flow of environment sensor |
CN103974311A (en) * | 2014-05-21 | 2014-08-06 | 哈尔滨工业大学 | Condition monitoring data stream anomaly detection method based on improved gaussian process regression model |
CN104283737A (en) * | 2014-09-30 | 2015-01-14 | 杭州华为数字技术有限公司 | Data flow processing method and device |
CN104809594A (en) * | 2015-05-13 | 2015-07-29 | 中国电力科学研究院 | Distribution network data online cleaning method based on dynamic outlier detection |
CN104902509A (en) * | 2015-05-19 | 2015-09-09 | 浙江农林大学 | Abnormal data detection method based on top-k(sigma) algorithm |
CN106506556A (en) * | 2016-12-29 | 2017-03-15 | 北京神州绿盟信息安全科技股份有限公司 | A kind of network flow abnormal detecting method and device |
Non-Patent Citations (1)
Title |
---|
基于聚类的异常挖掘算法研究;苏晓珂;《中国博士学位论文全文数据库信息科技辑》;20110815(第08期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN107682319A (en) | 2018-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107682319B (en) | Enhanced angle anomaly factor-based data flow anomaly detection and multi-verification method | |
CN108667684B (en) | Data flow anomaly detection method based on local vector dot product density | |
CN109191922B (en) | Large-scale four-dimensional track dynamic prediction method and device | |
CN115577275A (en) | Time sequence data anomaly monitoring system and method based on LOF and isolated forest | |
CN108985380B (en) | Point switch fault identification method based on cluster integration | |
CN109218223B (en) | Robust network traffic classification method and system based on active learning | |
CN109977895B (en) | Wild animal video target detection method based on multi-feature map fusion | |
CN111046968B (en) | Road network track clustering analysis method based on improved DPC algorithm | |
CN107609105B (en) | Construction method of big data acceleration structure | |
CN108154158B (en) | Building image segmentation method for augmented reality application | |
CN110263834B (en) | Method for detecting abnormal value of new energy power quality | |
CN111368867B (en) | File classifying method and system and computer readable storage medium | |
CN112381181A (en) | Dynamic detection method for building energy consumption abnormity | |
CN110879881A (en) | Mouse track recognition method based on feature component hierarchy and semi-supervised random forest | |
CN105139031A (en) | Data processing method based on subspace clustering | |
CN106570104B (en) | Multi-partition clustering preprocessing method for stream data | |
CN109597757B (en) | Method for measuring similarity between software networks based on multidimensional time series entropy | |
CN110544047A (en) | Bad data identification method | |
CN115880337A (en) | Target tracking method and system based on heavy parameter convolution and feature filter | |
CN117078048A (en) | Digital twinning-based intelligent city resource management method and system | |
CN113537321A (en) | Network traffic anomaly detection method based on isolated forest and X-means | |
CN110458094B (en) | Equipment classification method based on fingerprint similarity | |
CN109389172B (en) | Radio signal data clustering method based on non-parameter grid | |
CN113128584B (en) | Mode-level unsupervised sorting method of multifunctional radar pulse sequence | |
CN107104747B (en) | Clustering method of multipath components in wireless time-varying channel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |