CN112788038A - Method for distinguishing DDoS attack and elephant flow based on PCA and random forest - Google Patents
Method for distinguishing DDoS attack and elephant flow based on PCA and random forest Download PDFInfo
- Publication number
- CN112788038A CN112788038A CN202110051338.XA CN202110051338A CN112788038A CN 112788038 A CN112788038 A CN 112788038A CN 202110051338 A CN202110051338 A CN 202110051338A CN 112788038 A CN112788038 A CN 112788038A
- Authority
- CN
- China
- Prior art keywords
- random forest
- data
- pca
- matrix
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000007637 random forest analysis Methods 0.000 title claims abstract description 38
- 241000406668 Loxodonta cyclotis Species 0.000 title claims abstract description 30
- 238000000034 method Methods 0.000 title claims abstract description 26
- 239000011159 matrix material Substances 0.000 claims abstract description 37
- 238000000513 principal component analysis Methods 0.000 claims abstract description 21
- 238000012360 testing method Methods 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims description 5
- 238000001514 detection method Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012847 principal component analysis method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2135—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
- H04L63/1458—Denial of Service
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a method for distinguishing DDoS attacks and elephant flows based on PCA and random forest, belonging to the technical field of attack detection in networks. Firstly, selecting a training set and a testing set from a DDoS data set, and simultaneously adding an elephant flow data set into the training set and the testing set respectively; then, carrying out PCA (principal component analysis) processing on the data in the training set to reduce the dimension to obtain a low-dimensional feature matrix; then putting the low-dimensional feature matrix into a random forest model for training to obtain a random forest classifier; and finally, inputting the test set sample into a trained random forest classifier to obtain a classification result. When DDoS attack occurs, a random forest is utilized to distinguish legal elephant flow and DDoS attack flow.
Description
Technical Field
The invention relates to a method for distinguishing DDoS attacks and elephant flows based on PCA and random forest, belonging to the technical field of attack detection in networks.
Background
Distributed Denial of Service (DDoS) attacks are an increasing problem in the internet. An attacker targets some servers (also called victims) and uses multiple puppet hosts to launch an attack, thereby preventing normal use of their services. DDoS attacks are of many types, but many legitimate flows also have similar characteristics to DDoS flows, and thus many detection methods discard legitimate flows with similar characteristics to DDoS flows. For example, elephant streams, generally carry large amounts of data and last for a long time. They are often used for bulk data transmission, and elephant flow is popular in certain networks, such as data center networks. Approximately 90% of the data bytes in the network are contributed by the elephant flow, but they account for only 1% of the total flow. Elephant flows can generate a large number of packets (in different time spans) and consume a large amount of server bandwidth, making it behave similarly to a DDoS attack. However, it is a fully legitimate normal stream. Therefore, the elephant flow and the DDoS flow should be distinguished to avoid blocking the DDoS attack when it is stopped.
Disclosure of Invention
In order to make up the defects of the prior art, the invention provides a method for distinguishing DDoS attacks and elephant flows based on PCA and random forests.
Principal component analysis can reduce the dimensionality of the data space under study. I.e. to replace the p-dimensional X space (m) with the m-dimensional Y space<p) and less information is lost by the low-dimensional Y space instead of the high-dimensional x space. Even if there is only one principal component Yl(i.e., m is 1), this Y islAgain using all X variables (p). The invention processes data in advance, extracts their characteristics, analyzes data flow by using a principal component analysis method, and then puts into a random forest model for training, thereby distinguishing the elephant flow and the DDoS attack flow.
A method for distinguishing DDoS attacks and elephant flows based on PCA and random forest comprises the following steps:
the method comprises the following steps: and selecting a training set and a testing set from the DDoS data set, and simultaneously adding the elephant flow data set into the training set and the testing set respectively.
Step two: carrying out PCA (principal component analysis) processing on the data in the training set to reduce the dimension to obtain a low-dimensional feature matrix;
step three: putting the low-dimensional feature matrix into a random forest model for training to obtain a random forest classifier;
step four: and inputting the test set sample into a trained random forest classifier to obtain a classification result.
Specifically, the specific process of the second step is as follows:
(1) and solving the average value of each sample feature word and the sample average value.
(2) After the sample mean value is obtained, the feature sample mean value of the column needs to be subtracted from each dimension to obtain a new feature matrix.
(3) And after a new characteristic matrix is obtained, calculating a covariance matrix of the characteristic matrix to obtain a low-dimensional characteristic matrix.
Specifically, the specific process of the third step is as follows:
(1) inputting the eigenvector matrix obtained in the step two into a random forest model, training, and still remaining k characteristics of each piece of data after dimensionality reduction;
(2) if k characteristics of a piece of data are the same, marking the characteristics as a corresponding category of the data, and if the k characteristics are different, entering the step (3);
(3) and selecting a division basis, dividing the data, and distinguishing the characteristics of judging whether the data is a DDoS attack flow or an elephant flow to obtain a random forest classifier.
The characteristic vector matrix is a set of the most obvious characteristics of the flow characteristics, the method can extract the obvious characteristic difference of the DDoS attack flow and the legal elephant flow through a large amount of data, and then the DDoS attack flow and the legal elephant flow are placed into a random forest for training and classification, so that the DDoS attack flow and the legal elephant flow can be distinguished.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a flow chart of a method for processing data by PCA in the present invention.
Detailed Description
In order to more clearly understand the technical features, objects, and effects of the present invention, embodiments of the present invention will now be described with reference to the accompanying drawings.
Example 1: as shown in fig. 1 and 2, a method for distinguishing DDoS attacks from elephant flows based on PCA and random forest includes the following steps:
the method comprises the following steps: and selecting a training set and a testing set from the DDoS data set, and simultaneously adding the elephant flow data set into the training set and the testing set respectively.
Step two: carrying out PCA (principal component analysis) processing on the data in the training set to reduce the dimension to obtain a low-dimensional feature matrix;
step three: putting the low-dimensional feature matrix into a random forest model for training to obtain a random forest classifier;
step four: and inputting the test set sample into a trained random forest classifier to obtain a classification result.
Further, as shown in fig. 2, in the second step, the PCA process is used to reduce the dimensions of the data in the training set to obtain the low-dimensional feature matrix, and the specific process is as follows:
the first step is as follows: and extracting sample characteristics, namely basic characteristic data of the flow.
The second step is that: and sorting the sample characteristics, and generating a characteristic matrix for the sample characteristics. Suppose there are n samples (x, y, z, w represent sample features, here 4 features are taken as an example, actually more than 4 features)
Each column representing data of the same feature type, each row representing a different feature of the data at the same time, X1The first generated feature matrix is shown (the generation of other feature matrices will be described below using a subscript).
The third step: the mean of the samples for each column is calculated.
First, the average value of each column of samples needs to be calculated:
after the mean value is calculated, the mean value of the samples in each column is calculated:
the fourth step: subtracting the characteristic sample mean value of the column from each dimension to obtain a new characteristic matrix X2。
x1i=xi-σxi,y1i=yi-σyi,z1i=zi-σzi,w1i=wi-σwi
(x1i,y1i,z1iAnd w1iThe middle subscript "1" indicates that the mean value of the features of the column is subtracted from each dimension to obtain each corresponding element in the new feature matrix. )
The fifth step: computing a feature matrix X2Covariance matrix of (2):
(XTrepresenting the transpose of the matrix. )
And a sixth step: after the covariance matrix is obtained, eigenvalues and eigenvectors are obtained, and the eigenvalues are sorted in descending order.
The seventh step: and selecting the largest k eigenvectors, and then taking the k eigenvectors corresponding to the k eigenvectors as column vectors to form an eigenvector matrix to obtain the low-dimensional eigenvector matrix.
Further, the third step of training the low-dimensional feature matrix in a random forest model to obtain a random forest classifier comprises the following specific processes:
(1) inputting the feature vector matrix obtained in the second step into a random forest model, training, and still remaining k features of each piece of data after dimensionality reduction;
(2) if k characteristics of a piece of data are the same, marking the characteristics as a corresponding category of the data, and if the k characteristics are different, entering the step (3);
(3) and selecting a division basis, dividing the data, and distinguishing the characteristics of judging whether the data is a DDoS attack flow or an elephant flow to obtain a random forest classifier.
The characteristic vector matrix is a set of the most obvious characteristics of the flow characteristics, the method can extract the obvious characteristic difference characteristics of the DDoS attack flow and the legal elephant flow through a large amount of data, then put the DDoS attack flow and the legal elephant flow into a random forest model for training, and finally put a test data set into a trained random forest classifier for classifying the data set. When DDoS attack occurs, a random forest is utilized to distinguish legal elephant flow and DDoS attack flow, and the method is simple and efficient.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit and scope of the present invention.
Claims (3)
1. A method for distinguishing DDoS attack and elephant flow based on PCA and random forest is characterized in that: the method comprises the following specific steps:
the first step is as follows: selecting a training set and a testing set from the DDoS data set, and simultaneously adding the elephant flow data set into the training set and the testing set respectively;
the second step is that: carrying out PCA (principal component analysis) processing on the data in the training set to reduce the dimension to obtain a low-dimensional feature matrix;
the third step: putting the low-dimensional feature matrix into a random forest model for training to obtain a random forest classifier;
the fourth step: and inputting the test set sample into a trained random forest classifier to obtain a classification result.
2. The method of differentiating DDoS attacks from elephant flow based on PCA and random forest as claimed in claim 1, wherein: the second step is to perform PCA processing on the data in the training set to reduce the dimension and obtain a low-dimensional feature matrix, and the specific process is as follows:
(1) solving the average value of each sample feature word and the sample average value;
(2) after the sample mean value is solved, subtracting the characteristic sample mean value of the column from each dimension to obtain a new characteristic matrix;
(3) and after a new characteristic matrix is obtained, calculating a covariance matrix of the characteristic matrix to obtain a low-dimensional characteristic matrix.
3. The method of differentiating DDoS attacks from elephant flow based on PCA and random forest as claimed in claim 2, wherein: and the third step of putting the low-dimensional feature matrix into a random forest model for training, wherein the specific process of obtaining a random forest classifier is as follows:
(1) inputting the feature vector matrix obtained in the second step into a random forest model, training, and still remaining k features of each piece of data after dimensionality reduction;
(2) if k characteristics of a piece of data are the same, marking the characteristics as a corresponding category of the data, and if the k characteristics are different, entering the step (3);
(3) and selecting a division basis, dividing the data, and distinguishing the characteristics of judging whether the data is a DDoS attack flow or an elephant flow to obtain a random forest classifier.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051338.XA CN112788038A (en) | 2021-01-15 | 2021-01-15 | Method for distinguishing DDoS attack and elephant flow based on PCA and random forest |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110051338.XA CN112788038A (en) | 2021-01-15 | 2021-01-15 | Method for distinguishing DDoS attack and elephant flow based on PCA and random forest |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112788038A true CN112788038A (en) | 2021-05-11 |
Family
ID=75756725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110051338.XA Pending CN112788038A (en) | 2021-01-15 | 2021-01-15 | Method for distinguishing DDoS attack and elephant flow based on PCA and random forest |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112788038A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113422766A (en) * | 2021-06-18 | 2021-09-21 | 北京理工大学 | Network system security risk assessment method under DDoS attack |
CN113645182A (en) * | 2021-06-21 | 2021-11-12 | 上海电力大学 | Random forest detection method for denial of service attack based on secondary feature screening |
CN113746700A (en) * | 2021-09-02 | 2021-12-03 | 中国人民解放军国防科技大学 | Elephant flow rapid detection method and system based on probability sampling |
CN114726653A (en) * | 2022-05-24 | 2022-07-08 | 深圳市永达电子信息股份有限公司 | Abnormal flow detection method and system based on distributed random forest |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107395590A (en) * | 2017-07-19 | 2017-11-24 | 福州大学 | A kind of intrusion detection method classified based on PCA and random forest |
CN107872460A (en) * | 2017-11-10 | 2018-04-03 | 重庆邮电大学 | A kind of wireless sense network dos attack lightweight detection method based on random forest |
CN108632279A (en) * | 2018-05-08 | 2018-10-09 | 北京理工大学 | A kind of multilayer method for detecting abnormality based on network flow |
US20190253442A1 (en) * | 2018-02-13 | 2019-08-15 | Cisco Technology, Inc. | Assessing detectability of malware related traffic |
-
2021
- 2021-01-15 CN CN202110051338.XA patent/CN112788038A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107395590A (en) * | 2017-07-19 | 2017-11-24 | 福州大学 | A kind of intrusion detection method classified based on PCA and random forest |
CN107872460A (en) * | 2017-11-10 | 2018-04-03 | 重庆邮电大学 | A kind of wireless sense network dos attack lightweight detection method based on random forest |
US20190253442A1 (en) * | 2018-02-13 | 2019-08-15 | Cisco Technology, Inc. | Assessing detectability of malware related traffic |
CN108632279A (en) * | 2018-05-08 | 2018-10-09 | 北京理工大学 | A kind of multilayer method for detecting abnormality based on network flow |
Non-Patent Citations (2)
Title |
---|
RAZAN ABDULHAMMED ET AL.: "Efficient Network Intrusion Detection Using PCA-Based Dimensionality Reduction of Features", 《2019 INTERNATIONAL SYMPOSIUM ON NETWORKS, COMPUTERS AND COMMUNICATIONS (ISNCC)》 * |
S. REVATHI ET AL.: "Detecting Denial of Service Attack Using Principal Component Analysis with Random Forest Classifier", 《INTERNATIONAL JOURNAL OF COMPUTER SCIENCE & ENGINEERING TECHNOLOGY (IJCSET)》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113422766A (en) * | 2021-06-18 | 2021-09-21 | 北京理工大学 | Network system security risk assessment method under DDoS attack |
CN113422766B (en) * | 2021-06-18 | 2022-08-23 | 北京理工大学 | Network system security risk assessment method under DDoS attack |
CN113645182A (en) * | 2021-06-21 | 2021-11-12 | 上海电力大学 | Random forest detection method for denial of service attack based on secondary feature screening |
CN113645182B (en) * | 2021-06-21 | 2023-07-14 | 上海电力大学 | Denial of service attack random forest detection method based on secondary feature screening |
CN113746700A (en) * | 2021-09-02 | 2021-12-03 | 中国人民解放军国防科技大学 | Elephant flow rapid detection method and system based on probability sampling |
CN113746700B (en) * | 2021-09-02 | 2023-04-07 | 中国人民解放军国防科技大学 | Elephant flow rapid detection method and system based on probability sampling |
CN114726653A (en) * | 2022-05-24 | 2022-07-08 | 深圳市永达电子信息股份有限公司 | Abnormal flow detection method and system based on distributed random forest |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112788038A (en) | Method for distinguishing DDoS attack and elephant flow based on PCA and random forest | |
CN111340191B (en) | Bot network malicious traffic classification method and system based on ensemble learning | |
CN110311829B (en) | Network traffic classification method based on machine learning acceleration | |
CN110391958B (en) | Method for automatically extracting and identifying characteristics of network encrypted flow | |
CN110796196A (en) | Network traffic classification system and method based on depth discrimination characteristics | |
CN111740971A (en) | Network intrusion detection model SGM-CNN based on class imbalance processing | |
CN110808971B (en) | Deep embedding-based unknown malicious traffic active detection system and method | |
US10187412B2 (en) | Robust representation of network traffic for detecting malware variations | |
CN110751222A (en) | Online encrypted traffic classification method based on CNN and LSTM | |
CN111885059A (en) | Method for detecting and positioning abnormal industrial network flow | |
CN113489685B (en) | Secondary feature extraction and malicious attack identification method based on kernel principal component analysis | |
CN111786951B (en) | Traffic data feature extraction method, malicious traffic identification method and network system | |
CN111817971B (en) | Data center network flow splicing method based on deep learning | |
CN112597141B (en) | Network flow detection method based on public opinion analysis | |
CN108629183A (en) | Multi-model malicious code detecting method based on Credibility probability section | |
Sarraf | Analysis and detection of ddos attacks using machine learning techniques | |
Guo et al. | A Black‐Box Attack Method against Machine‐Learning‐Based Anomaly Network Flow Detection Models | |
CN116192523A (en) | Industrial control abnormal flow monitoring method and system based on neural network | |
Wu et al. | Bottrinet: A unified and efficient embedding for social bots detection via metric learning | |
McCarthy et al. | Feature vulnerability and robustness assessment against adversarial machine learning attacks | |
Jia et al. | MMF: A loss extension for feature learning in open set recognition | |
CN111224998A (en) | Botnet identification method based on extreme learning machine | |
Kim et al. | High‐Performance Internet Traffic Classification Using a Markov Model and Kullback‐Leibler Divergence | |
CN108494620A (en) | Network service flow feature selecting and sorting technique based on multiple target Adaptive evolvement arithmetic | |
CN113128626A (en) | Multimedia stream fine classification method based on one-dimensional convolutional neural network model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210511 |
|
RJ01 | Rejection of invention patent application after publication |