CN110941542A

CN110941542A - Sequence integration high-dimensional data anomaly detection system and method based on elastic network

Info

Publication number: CN110941542A
Application number: CN201911076540.7A
Authority: CN
Inventors: 陈南; 钱偲书; 张晶; 张露维; 宋轶慧; 刘文意; 陈晨; 邵佳炜; 李科心; 李静
Original assignee: Nanjing University of Aeronautics and Astronautics; State Grid Corp of China SGCC; State Grid Shanghai Electric Power Co Ltd
Current assignee: Nanjing University of Aeronautics and Astronautics; State Grid Corp of China SGCC; State Grid Shanghai Electric Power Co Ltd
Priority date: 2019-11-06
Filing date: 2019-11-06
Publication date: 2020-03-31
Anticipated expiration: 2039-11-06
Also published as: CN110941542B

Abstract

The invention discloses an integrated high-dimensional data anomaly detection system based on an elastic network, which comprises a single-layer system corresponding to each dimension in high-dimensional data and an assembly integration module connected with the single-layer system of each dimension; the single layer system comprises: a data module; the first input end of the abnormity scoring module is connected with the data module; the input end of the selection module is connected with the first output end of the abnormity scoring module; the input end of the elastic network module is connected with the selection module, and the output end of the elastic network module is connected with the second input end of the abnormity scoring module; the single-layer integrated module is connected with the second output end of the abnormity scoring module; the assembly integrated module is connected with the single-layer integrated module of each dimension. The method solves the problems of large individual prediction error, low detection precision and poor stability of high-dimensional data anomaly detection, realizes small error and high precision of the high-dimensional data individual prediction model, and ensures the stability of anomaly detection.

Description

Sequence integration high-dimensional data anomaly detection system and method based on elastic network

Technical Field

The invention relates to the technical field of high-dimensional data anomaly detection, in particular to a sequence integration high-dimensional data anomaly detection system and method based on an elastic network.

Background

Anomalous data detection typically identifies data objects that do not meet a general data distribution or identify data objects that have significant deviations from the majority of the data objects. The abnormal data detection can provide important reference basis for wide application in a series of fields such as medical diagnosis, fraud detection, information security and the like. Data generated in these application fields are high-dimensional numerical data, such as thousands of molecular or gene expression characteristics in bioinformatics, thousands of data characteristics in transaction fraud, various complex information characteristics in network attacks, and the like.

By high-dimensional data is meant data of higher dimensions, which typically can reach hundreds of thousands or even higher. There are two major difficulties in analyzing and processing high-dimensional numerical data: one is the problem of the unavailability of euclidean distances. In a low-dimensional space, euclidean distance is meaningful and can be used to measure similarity between data, but in a high-dimensional space distance is not significant. The second is dimension disaster problem. As dimensions increase, the computational load increases rapidly, and the complexity and cost of analyzing and processing high dimensional data increases exponentially. Therefore, the following challenges are faced in the detection of anomalous data in high dimensional numerical data:

(1) high-dimensional numerical data typically contains features and noisy data that are not related to outlier data. These extraneous features and noisy data can contribute to anomaly detection in high-dimensional numerical data.

(2) As the dimension of data increases, the related concepts in the low-dimensional space such as the neighborhood, the distance and the nearest neighbor cannot be used, so that the conventional abnormal data detection method based on the distance, the density and the like cannot be used.

(3) The method for extracting the features is used for reducing the dimensions of high-dimensional data, and how to measure the accuracy of the extracted features is a problem.

There are also many methods for abnormal data detection, such as distance-based methods, density-based methods, tree-based methods, etc. However, due to the problems of computational complexity and efficiency of these methods, it takes a large cost to detect abnormal data in high-dimensional data, and the method does not perform particularly well in terms of the abnormal detection effect of the high-dimensional data. Therefore, these methods cannot be applied to anomaly detection of high-dimensional data simply, and it is necessary to process the high-dimensional data and then detect the high-dimensional data by using these methods.

For anomalous data detection of high-dimensional numerical data, the high-dimensional data is typically mapped into a low-dimensional space, thereby retaining information related to the anomalous data for detection of the anomalous data in the low-dimensional space. Later, techniques based on unsupervised representation learning began to emerge, such as subspace feature selection methods, neural networks, and stream learning methods.

The subspace-based feature selection method is to find feature subsets related to abnormal data to reduce the influence of irrelevant features, and then perform conventional abnormal data detection on the feature subsets. This approach typically separates subset selection from anomalous data detection, which may result in features unrelated to the anomalous data being used to perform the detection of the anomalous data. This method may therefore result in a reduced accuracy and a greater deviation in the detection of anomalous data.

Neural network and flow learning based methods focus on preserving the regularity information (e.g., data structure, neighborhood information) of the data, which is then used for learning tasks such as clustering and data compression. Therefore, the information they retain often contains redundant data.

Aiming at the limitations of the above methods and the challenges faced by anomaly detection of high-dimensional numerical data, an anomaly data detection method based on ensemble learning later appears. These methods aim to combine multiple predictive models together to exploit "the power of numerous" to enable detection of anomalous data. Although the ensemble learning-based method can reduce the detection error of the entire prediction model to some extent, it cannot improve the error of each prediction model. Although the CARE method based on reduction of the error of the individual prediction model solves the problem that the individual prediction model has the error, the method has undesirable performance when dealing with the abnormal detection problem of high-dimensional data. The CINFO method based on sequence integration realizes the feature extraction and abnormal data detection of high-dimensional data by constructing an abnormal data detection model of a sequence. However, this method uses a fixed threshold value when selecting abnormal data by using a sequence ensemble learning method, and such a method is suitable for a data set in which the abnormal data proportion and the threshold value correspond to each other. In addition, when the method utilizes Lasso regression (Lasso) to extract the characteristics of the variables or the characteristics, only one of the variables or the characteristics is selected from any variables or characteristics when the variables or the characteristics with multiple collinearity are faced, so that the variables or the characteristics are selected too randomly and the stability cannot be guaranteed.

Disclosure of Invention

The invention aims to provide a sequence integration high-dimensional data anomaly detection system and method based on an elastic network. The system and the method aim to solve the problems of large individual prediction error, low detection precision and poor stability of high-dimensional data anomaly detection, realize small error and high precision of a high-dimensional data individual prediction model and ensure the stability of anomaly detection.

The dimensionality of the high-dimensional data is high, the calculation amount can rise rapidly when the dimensionality is more and more, and in order to simplify the calculation amount, anomaly detection is performed in each dimensionality of the high-dimensional data. In order to achieve the above object, the present invention provides an integrated high-dimensional data anomaly detection system based on elastic network, which includes a single-layer system corresponding to each dimension in the high-dimensional data and an assembly integration module connected to the single-layer system of each dimension;

the single layer system comprises:

the data module is used for receiving single-layer initial data of each dimension in the high-dimensional data;

the first input end of the anomaly scoring module is connected with the data module and used for performing first anomaly scoring on the single-layer initial data to obtain an anomaly score vector in the single-layer initial data;

the input end of the selection module is connected with the first output end of the abnormity scoring module and used for selecting the single-layer initial data according to the abnormity score vector to obtain an abnormity data set;

the input end of the elastic network module is connected with the selection module, the output end of the elastic network module is connected with the second input end of the abnormity scoring module, and the elastic network module is used for extracting the characteristics of the abnormal data set according to the abnormity score vector to generate a characteristic vector and a mean square error;

the anomaly scoring module is further used for performing second anomaly scoring on the feature vectors and the mean square error to obtain abnormal feature vectors with abnormal scores;

the single-layer integration module is connected with the second output end of the abnormity scoring module and is used for performing first integration on the output abnormal characteristic vectors with the mean square error and the fraction abnormity to obtain a single-layer abnormity result;

and the assembly integration module is connected with the single-layer integration module of each single-layer system, and is used for carrying out secondary integration on the single-layer abnormal results output by each single-layer system to obtain final abnormal results.

The invention also provides an integrated high-dimensional data anomaly detection method based on the elastic network, which comprises the following steps:

receiving single-layer initial data of each dimension in the high-dimensional data, and performing first anomaly scoring on the single-layer initial data to obtain an anomaly score vector in the single-layer initial data;

selecting single-layer initial data according to the abnormal score vector to obtain an abnormal data set;

extracting the features of the abnormal data set according to the abnormal score vector to generate a feature vector and a mean square error;

performing second anomaly scoring according to the eigenvector and the mean square error to obtain an abnormal eigenvector with abnormal scores;

comparing the mean square error with a mean square error initial value set by the elastic network module, and outputting the mean square error when the mean square error is greater than the mean square error initial value; when the mean square error is smaller than the mean square error initial value, the single-layer system repeatedly circulates the operation on the single-layer initial data of the dimensionality until the mean square error is larger than the last mean square error, and the mean square error of the time is output;

performing first integration on the output abnormal feature vectors of the mean square error and the fraction abnormality to obtain a single-layer abnormality result of each dimension;

and performing second integration on the single-layer abnormal result of each dimension in the high-dimensional data to obtain a final abnormal result.

Most preferably, the single layer initial data is X_iN, and satisfies:

X_i＝(x₁，x₂，…，x_M)

wherein M is the number of features in the single-layer initial data; the high-dimensional data is X and satisfies the following conditions:

X＝{X₁，X₁，…，X_N}

wherein N is the number of dimensions in the high dimensional data.

Most preferably, the first and/or second anomaly scoring is based on a forest isolation approach, which includes sampling, building an isolation tree, calculating a path length, and normalizing the path length.

Most preferably, the selecting of the single-layer initial data according to the abnormal score vector to obtain the abnormal data set comprises the following steps:

calculating an anomaly score vector S_iDesired E (S)_i) μ and variance D (S)_i)＝σ²；

According to expectation E (S)_i) Sum variance D (S)_i) Calculating an outlier candidate function; the outlier candidate function is H (S)_iα), and satisfies:

H(S_i，α)＝S_i-μ-ασ

wherein α is the threshold value set by the selection module in each dimension, and sigma is the square root of the variance;

according to expectation E (S)_i) Sum variance D (S)_i) Scoring the anomaly vector S using the Chebyshev inequality_iAnd (4) carrying out selection judgment, judging that the result P (S is more than or equal to mu + α sigma) meets the following conditions:

wherein epsilon is any positive number small enough and satisfies the condition of α sigma;

according to the judgment result P (S is more than or equal to mu + α sigma), the abnormal score vector S is calculated_iAnd carrying out selective differentiation, generating an abnormal data set C, and meeting the following conditions:

most preferably, the feature extraction further comprises the steps of:

vector S of abnormal score_iAs target characteristics, the abnormal data set C is used as a prediction factor, a sparse regression model is constructed, and a regression coefficient omega is solved; the sparse regression model is ElN (C, λ), and satisfies:

wherein, λ is a nonnegative regularization parameter, and K is the number of data in the abnormal data set C; t is the cycle number when the cycle operation is finished;

extracting features which are most relevant to the regression coefficient omega from the abnormal data set C as a feature vector F and a mean square error mse; the feature vector F satisfies:

F＝{X_i|ω_i≠0，1＜i＜K}

wherein ,ω_iThe regression coefficient of the ith abnormal data in the abnormal data set C.

Most preferably, the calculation of the mean square error mse further comprises the following steps:

when the mean square error mse is smaller than a preset mean square error initial value mse⁰The operation is repeatedly executed on the single-layer initial data of the dimension until the mean square error mse of T times of circulation^TMean square error mse greater than last^T-1Outputting cyclic T-times mean square error mse^T。

Eigenvector F and output mean square error mse^tAnd (T is more than or equal to 1 and less than or equal to T), scoring according to the second abnormity, and acquiring an abnormal characteristic vector Q with abnormal scores through the steps of sampling, establishing an isolation tree, calculating the path length and normalizing the path length.

Most preferably, the first integration comprises the steps of:

mean square error mse for t cycles^tAnd summing, wherein T is more than or equal to 1 and less than or equal to T, obtaining the mean square error and the SUM, and satisfying the following conditions:

wherein T is the cycle number at the end of the cycle operation;

subtracting the mean squared error mse of t cycles from the SUM of the mean squared error SUM SUM^tObtaining an error term MSE^tAnd satisfies the following conditions:

MSE^t＝SUM-mse^t，1≤t≤T；

for error term MSE^tCarrying out normalization operation to obtain weights gamma under different cycle times^tAnd satisfies the following conditions:

for abnormal feature vector Q of cycle t times^tUnitization is carried out to obtain unit abnormal feature vector tau^tAnd satisfies the following conditions:

according to the weight gamma^tAnd unit anomaly feature vector τ^tComputing single-layer anomaly results for the ith dimension

And satisfies the following conditions:

most preferably, the second integration is by averaging the N-dimensional single layer anomaly results; the final abnormal result is Z, and the following conditions are met:

by applying the method, the problems of large individual prediction error, low detection precision and poor stability of high-dimensional data anomaly detection are solved, small error and high precision of a high-dimensional data individual prediction model are realized, and the stability of anomaly detection is ensured.

Compared with the prior art, the invention has the following beneficial effects:

1. the system provided by the invention has the advantages that a multi-level sequence ensemble learning Model (MRENSE) based on an elastic network is used for detecting the data abnormality of each dimension, so that the calculated amount is simplified, and the abnormality detection of high-dimensional numerical data is realized.

2. The system extracts the characteristics of the data through the elastic network module, and then performs abnormity scoring on the extracted characteristic vectors, so that the problem of large individual prediction error of high-dimensional data abnormity detection is solved, and small error of an individual prediction model in the high-dimensional data abnormity detection system is realized.

3. The system disclosed by the invention obtains the abnormal characteristic vector by performing abnormal scoring on the data of each dimension twice, solves the problems of low precision and poor stability of high-dimensional data abnormal detection, and ensures the high precision and stability of the high-dimensional data abnormal detection system.

Drawings

FIG. 1 is a schematic structural diagram of an integrated high-dimensional data anomaly detection system according to the present invention;

FIG. 2 is a flowchart of the integrated high-dimensional data anomaly detection method provided by the present invention.

Detailed Description

The invention will be further described by the following specific examples in conjunction with the drawings, which are provided for illustration only and are not intended to limit the scope of the invention.

Example 1

The dimensionality of the high-dimensional data is high, the calculation amount can rise rapidly when the dimensionality is more and more, and in order to simplify the calculation amount, anomaly detection is performed in each dimensionality of the high-dimensional data.

The invention provides an integrated high-dimensional data anomaly detection system based on an elastic network, which comprises a single-layer system 1 corresponding to each dimension in high-dimensional data and an assembly integration module 2 connected with the single-layer system of each dimension, as shown in figure 1.

The single-layer system 1 comprises a data module 3, an abnormity scoring module 4, a selection module 5, an elastic network module 6 and a single-layer integration module 7; the data module 3 is used for receiving single-layer initial data of each dimension in the high-dimensional data; the first input end of the anomaly scoring module 4 is connected with the data module 3 and is used for carrying out first anomaly scoring on the single-layer initial data to obtain an anomaly score vector in the single-layer initial data; the input end of the selection module 5 is connected with the first output end of the anomaly scoring module 4 and is used for selecting single-layer initial data according to the anomaly score vector to obtain an anomaly data set; the input end of the elastic network module 6 is connected with the selection module 5, the output end of the elastic network module is connected with the second input end of the anomaly scoring module 4, and the elastic network module is used for extracting the features of the anomaly data set according to the anomaly score vector to generate a feature vector and a mean square error; the anomaly scoring module 4 is further used for performing second anomaly scoring on the feature vectors and the mean square error to obtain abnormal feature vectors with abnormal scores; the single-layer integration module 7 is connected with the second output end of the anomaly scoring module 4, and is used for performing first integration on the output abnormal feature vectors with the mean square error and the fraction anomaly to obtain a single-layer anomaly result.

The assembly integration module 2 is connected with the single-layer integration module 7 of each single-layer system, and is used for carrying out secondary integration on single-layer abnormal results output by each single-layer system to obtain final abnormal results.

Example 2

Based on the same inventive concept, the invention also provides an integrated high-dimensional data anomaly detection method based on the elastic network, as shown in fig. 2, the method comprises the following steps:

receiving single-layer initial data of each dimension in high-dimensional data X, wherein the single-layer initial data is X_iN, and satisfies:

X_i＝(x₁，x₂，…，x_M)

X＝{X₁，X₁，…，X_N}

wherein N is the dimension number in the high-dimensional data; and for single layer initial data X_iTransmitting to an abnormity scoring module for carrying out first abnormity scoring to obtainObtaining single-layer initial data X_iAbnormal score vector S in (1)_i(ii) a The first abnormal scoring is based on a forest isolation mode, and the forest isolation mode comprises the steps of sampling, establishing an isolation tree, calculating the path length and normalizing the path length. Based on the anomaly score vector S_iFor the single layer initial data X_iSelecting to obtain an abnormal data set C; selecting single-layer initial data according to the abnormal score vector to obtain an abnormal data set C, and the method comprises the following steps:

calculating an anomaly score vector S_iDesired E (S)_i) μ and variance D (S)_i)＝σ²。

Based on the anomaly score vector S_iDesired E (S)_i) Sum variance D (S)_i) Calculating an outlier candidate function; the outlier candidate function is H (S)_iα), and satisfies:

H(S_i，α)＝S_i-μ-ασ

α is a threshold value set by a selection module in each dimension, α values take different values in each dimension and can be specified by a user, and sigma is the square root of the variance;

the outlier data set C is data that differs from the distribution of the majority of the high dimensional data or is significantly biased from the majority of the high dimensional data objects, and is only a small portion of the entire data set, therefore, we control the number of elements K in the outlier data set C by setting the selection module 5 threshold α.

In each dimension, the values of the selection module 5 threshold α are different, so that the number K of abnormal data sets C in each dimension is different, and the single-layer abnormal results in each dimension are integrated for the first time, so that the final abnormal results are more reliable.

Based on the anomaly score vector S_iDesired E (S)_i) Sum variance D (S)_i) Scoring the anomaly vector S using the Chebyshev inequality_iAnd carrying out selection judgment, wherein the judgment result is P (S is more than or equal to mu + α sigma), and the following conditions are met:

wherein epsilon is any sufficiently small positive number, and epsilon is α sigma.

based on the anomaly score vector S_iExtracting the characteristics of the abnormal data set C to generate a characteristic vector F and a mean square error mse; the feature extraction further comprises the following steps:

wherein, λ is a nonnegative regularization parameter, and K is the number of data in the abnormal data set C; t is the number of cycles at the end of the cycling operation.

With the gradual increase of the regularization parameter lambda, the number of nonzero coefficients in the regression coefficient omega is gradually reduced, so that sparse regression on high-dimensional data is completed.

The regularization parameter λ is selected in the elastic network module 6, and an inappropriate regularization parameter λ may cause over-fitting or under-fitting. And selecting an optimal regularization parameter lambda on the abnormal data set C in a mode of 10 times of cross validation, so that the mean square error mse is minimum.

F＝{X_i|ω_i≠0，1＜i＜K}

Transmitting the feature vector F and the mean square error mse back to the anomaly scoring module for secondary anomaly scoring to obtain an anomaly feature vector Q with abnormal scores; and the feature vector F is scored according to the second abnormity, and an abnormal feature vector Q with abnormal scores is obtained through the steps of sampling, establishing an isolation tree, calculating the path length and normalizing the path length.

Through the feature extraction of the elastic network module 6, the dimension of the high-dimensional data is reduced to a certain degree, and the second abnormal scoring of the forest isolation method is easier than the first abnormal scoring.

The calculation of the mean square error mse further comprises the following steps: when the mean square error mse is smaller than a preset mean square error initial value mse⁰The operation is repeatedly executed on the single-layer initial data of the dimension until the mean square error mse of T times of circulation^TMean square error mse greater than last^T-1Outputting cyclic T-times mean square error mse^T；

Mean square error mse to t cycles of output^t(T is more than or equal to 1 and less than or equal to T) and abnormal feature vectors Q of fractional abnormality are integrated for the first time to obtain a single-layer abnormality result of each dimension

The first integration comprises the following steps:

mean square error mse for t cycles^tAnd (T is more than or equal to 1 and less than or equal to T) summing to obtain the mean square error and the SUM, and satisfying the following conditions:

wherein T is the cycle number at the end of the cycle operation;

MSE^t＝SUM-mse^t，1≤t≤T；

And satisfies the following conditions:

single-layer abnormal result for each dimension in high-dimensional data X

Transmitting the data to an assembly integration module for second integration to obtain a final abnormal result; the second integration is carried out by averaging single-layer abnormal results of N dimensionalities; the final abnormal result is Z, and the following conditions are met:

the working principle of the invention is as follows:

receiving single-layer initial data of each dimension in the high-dimensional data, and performing first anomaly scoring on the single-layer initial data to obtain an anomaly score vector in the single-layer initial data; selecting single-layer initial data according to the abnormal score vector to obtain an abnormal data set; extracting the features of the abnormal data set according to the abnormal score vector to generate a feature vector and a mean square error; performing second anomaly scoring on the eigenvectors and the mean square error to obtain abnormal eigenvectors with abnormal scores; performing first integration on the output abnormal feature vectors of the mean square error and the fraction abnormality to obtain a single-layer abnormality result; and performing second integration on the single-layer abnormal result of each dimension in the high-dimensional data to obtain a final abnormal result.

In conclusion, the method and the device solve the problems of large individual prediction error, low detection precision and poor stability of high-dimensional data anomaly detection, realize small error and high precision of the high-dimensional data individual prediction model, and ensure the stability of anomaly detection.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the present invention has been described in detail with reference to the preferred embodiments, it should be understood that the above description should not be taken as limiting the invention. Various modifications and alterations to this invention will become apparent to those skilled in the art upon reading the foregoing description. Accordingly, the scope of the invention should be determined from the following claims.

Claims

1. An integrated high-dimensional data anomaly detection system based on an elastic network is characterized by comprising a single-layer system corresponding to each dimension in high-dimensional data and an assembly integration module connected with the single-layer system of each dimension;

the single layer system comprises:

the input end of the elastic network module is connected with the selection module, the output end of the elastic network module is connected with the second input end of the abnormity scoring module, and the elastic network module is used for performing characteristic extraction on the abnormal data set according to the abnormity score vector to generate a characteristic vector and a mean square error;

the single-layer integration module is connected with the second output end of the abnormity scoring module and is used for performing first integration on the output mean square error and the abnormal characteristic vector to obtain a single-layer abnormity result;

2. An integrated high-dimensional data anomaly detection method based on an elastic network is characterized by comprising the following steps:

receiving single-layer initial data of each dimension in high-dimensional data, and performing first anomaly scoring on the single-layer initial data to obtain an anomaly score vector in the single-layer initial data;

selecting the single-layer initial data according to the abnormal score vector to obtain an abnormal data set;

performing first integration on the output mean square error and the abnormal feature vector to obtain a single-layer abnormal result of each dimension;

and performing second integration on the single-layer abnormal result of each dimension to obtain a final abnormal result.

3. The method for integrated anomaly detection of high-dimensional data based on elastic network as claimed in claim 2, wherein said single layer of initial data is X_iN, and satisfies:

X_i＝(x₁，x₂，…，x_M)

X＝{X₁，X₁，…，X_N}

wherein N is the number of dimensions in the high dimensional data.

4. An integrated elastic network-based high-dimensional data anomaly detection method according to claim 2, characterized in that said first scoring and/or said second scoring of anomalies is based on an isolated forest approach comprising: sampling, establishing an isolation tree, calculating the path length and normalizing the path length.

5. The method for detecting the anomaly of the integrated high-dimensional data based on the elastic network as claimed in claim 2, wherein the step of selecting the single-layer initial data according to the anomaly score vector to obtain an anomaly data set comprises the following steps:

calculating the abnormality score vector S_iDesired E (S)_i) μ and variance D (S)_i)＝σ²；

According to the expectation E (S)_i) Sum variance D (S)_i) Calculating an outlier candidate function; the outlier candidate function is H (S)_iα), and satisfies:

H(S_i，α)＝S_i-μ-ασ

wherein α is the threshold value set by the selection module in each layer, and sigma is the square root of the variance;

according to the expectation E (S)_i) Sum variance D (S)_i) Using Chebyshev inequality to score the abnormal vector S_iAnd (4) carrying out selection judgment, judging that the result P (S is more than or equal to mu + α sigma) meets the following conditions:

according to the judgment result P (S is more than or equal to mu + α sigma), the abnormal score vector S is subjected to_iAnd carrying out selective differentiation, generating an abnormal data set C, and meeting the following conditions:

6. the method for detecting the anomaly of the integrated high-dimensional data based on the elastic network as claimed in claim 2, wherein the said method for extracting the features of the said anomaly data set according to the said anomaly score vector to generate feature vector and mean square error comprises the following steps:

wherein λ is a nonnegative regularization parameter, and K is the number of data in the abnormal data set C; t is the cycle number when the cycle operation is finished;

extracting features which are most relevant to the regression coefficient omega from the abnormal data set C, wherein the features are a feature vector F and the mean square error mse; the feature vector F satisfies:

F＝{X_i|ω_i≠0，1＜i＜K}

wherein ,ω_iAnd the regression coefficient is the regression coefficient of the ith abnormal data in the abnormal data set C.

7. The method for integrated high-dimensional data anomaly detection based on elastic network according to claim 2, characterized in that said calculation of mean square error mse further comprises the following steps:

when the mean square error mse is smaller than a preset mean square errorInitial value mse⁰Repeatedly performing the above operations on the single-layer initial data of the dimension until the mean square error mse circulating for T times^TThe mean square error mse greater than the last time^T-1Outputting said mean square error mse for T cycles^T。

8. The method of claim 2, wherein the eigenvector F and the outputted mean square error mse^tAnd (T is more than or equal to 1 and less than or equal to T), scoring according to the second abnormity, and acquiring an abnormal characteristic vector Q with abnormal scores through the steps of sampling, establishing an isolation tree, calculating the path length and normalizing the path length.

9. The method for integrated high-dimensional data anomaly detection based on elastic network according to claim 2, characterized in that said first integration comprises the following steps:

wherein T is the cycle number at the end of the cycle operation;

subtracting the SUM of the mean square error mse of t cycles from the SUM of the mean square error SUM SUM^tObtaining an error term MSE^tAnd satisfies the following conditions:

MSE^t＝SUM-mse^t，1≤t≤T；

for the error term MSE^tCarrying out normalization operation to obtain weights gamma under different cycle times^tAnd satisfies the following conditions:

for the abnormal feature vector Q of t times of circulation^tUnitization is carried out to obtain unit abnormal feature vector tau^tAnd satisfies the following conditions:

according to the weight gamma^tAnd the unit anomaly feature vector tau^tComputing the single-layer anomaly result for the ith dimension

And satisfies the following conditions:

10. the elastic network-based sequence integration high-dimensional data anomaly detection method according to claim 2, wherein the second integration is performed by averaging the single-layer anomaly results in N dimensions; the final abnormal result is Z and satisfies the following conditions: