CN110719279A - Network anomaly detection system and method based on neural network - Google Patents

Network anomaly detection system and method based on neural network Download PDF

Info

Publication number
CN110719279A
CN110719279A CN201910953413.4A CN201910953413A CN110719279A CN 110719279 A CN110719279 A CN 110719279A CN 201910953413 A CN201910953413 A CN 201910953413A CN 110719279 A CN110719279 A CN 110719279A
Authority
CN
China
Prior art keywords
data
kddcup99
neural network
module
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910953413.4A
Other languages
Chinese (zh)
Inventor
张钧桓
任涛
刘子瑜
杨可舟
丁匀泰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN201910953413.4A priority Critical patent/CN110719279A/en
Publication of CN110719279A publication Critical patent/CN110719279A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a network anomaly detection system and a detection method based on a neural network. The detection system comprises an encoding processing module, a data normalization module, a feature selection module, an accuracy rate module and an observer operating characteristic curve drawing module, wherein the detection method comprises the steps of firstly, carrying out one-hot encoding processing on discrete features in a KDDCUP99 data set to form numerical values, then carrying out feature processing by adopting Min-Max, carrying out dimension reduction processing, inputting the dimension reduction processing into an MLPClasifier multilayer perceptron classifier to obtain a prediction result, finally inputting into the observer operating characteristic curve drawing module to draw an ROC curve, adopting a multilayer perceptron neural network, preventing overfitting by an L2 regularization method, adjusting hidden nodes, continuously training and debugging by adopting a cross validation method, comparing with KNN and SVM, and verifying the superiority of the invention in terms of running time and accuracy rate.

Description

Network anomaly detection system and method based on neural network
Technical Field
The technology relates to the field of neural networks, in particular to a network anomaly detection system and a network anomaly detection method based on a neural network.
Background
Due to the wide application of computer networks, detection of network attacks and protection of information security become inevitable, a great deal of threats are caused by the massive use of computer systems, and various types of attacks, such as zero-day vulnerability attacks, are caused by the wide spread of networks. The development of computer networks has greatly exacerbated computer security problems, particularly in today's network environments and advanced computing devices, where network administrators must now deal with the massive intrusion of individuals and large botnets with malicious intent, even though the internet protocol suite is not designed for security problems. According to the Sametak network security threat report, the malicious software attacks reported in 2010 exceed 30 hundred million times, the number of the denial of service type attacks in 2013 is obviously increased, and according to the data leakage investigation report of the Wildison company 2014, a hacker implements 63437 security holes. The global information security survey in 2015 also indicates that the number of incidents is increased, security incidents are increased year by year, and the magnitude of the incidents is greatly increased in recent years, so that the detection of network attacks becomes a serious concern today. In addition, as cracking tools are easier to use, the professional skills required for cyber crime are also reduced.
Anomaly detection is an important data analysis task that can detect anomalies or anomalous data from a given data set. This is an interesting field of data mining research, as it involves discovering engaging and rare patterns of data. The ever-changing characteristics of network attacks require a flexible defense system, and with the continuous maturity of the technology, the ways for realizing the network attack intrusion detection system are more and more diverse.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a network anomaly detection system and a detection method based on a neural network. The invention carries out analysis and data processing based on KDDCup99 data set, realizes and optimizes network intrusion detection based on neural network, and compares the network intrusion detection with a network intrusion detection method based on SVM and KNN.
In order to achieve the technical purpose, the technical scheme of the invention is as follows:
a network anomaly detection system based on a neural network comprises a data preprocessing unit and a data unit for identifying network attack anomalies, wherein the data preprocessing unit is used for carrying out coding processing, data normalization and feature selection on numerous disordered original data in a KDDCup99 data set so as to remove redundancy and noise, the data unit for identifying network attack anomalies is used for identifying anomaly attack categories including DOS, R2L, U2R and PROBING on the KDDCup99 data set.
The data preprocessing unit comprises an encoding processing module, a data normalization module and a feature selection module, wherein the encoding processing module is used for encoding non-numerical data in numerous disordered original data in a KDDCup99 data set, converting the non-numerical data in the original data into numerical data, compressing each line of data in all the numerical data into a numerical value between [0 and 1] through the normalization module to obtain a KDDCup99 data set in an interval with an output range of [0 and 1], and finally reducing the dimensionality of the data in the data set through the feature selection module by using the normalized KDDCup99 data set and screening to obtain main component data;
the coding processing module is used for converting non-numerical data in the KDDCup99 data set into numerical data;
the data normalization module is used for performing normalization processing on the numerical data by adopting a MinMax method to obtain a KDDCup99 data set with an output range of [0,1] interval;
the characteristic selection module is used for reducing the dimensionality of the data after normalization processing and reducing data redundancy and noise, so that the main components of the processed data are independent.
The data unit for identifying the network attack abnormity comprises an accuracy rate module and an observer operating characteristic curve drawing module, a KDDCup99 data set which is output by the data preprocessing unit and subjected to dimensionality reduction processing is input into an MLPClasifier multi-layer perceptron classifier, then a prediction result of each piece of data in the KDDCup99 data set is obtained by continuously adjusting hyper-parameters, the accuracy rate and the recall rate of a network abnormity detection system based on a neural network are obtained according to the prediction result, finally the prediction result is input into the observer operating characteristic curve drawing module, an ROC curve is drawn according to the obtained accuracy rate and recall rate, and the difference between the predicted value and the true value of the data in the KDDCup99 data set is obtained through the ROC curve in a visual mode;
the accuracy module is used for comparing the prediction result with the original data, calculating the accuracy and the recall rate of the network abnormity detection system based on the neural network, and obtaining the detection result of the network abnormity detection system based on the neural network;
the observer operation characteristic curve drawing module is used for drawing an ROC curve for the detection result of the network abnormality detection system based on the neural network, visually displaying data in a KDDCup99 data set, and visually displaying the difference between a predicted value and a true value obtained through the network abnormality detection system based on the neural network.
A detection method of a network anomaly detection system based on a neural network comprises the following steps:
step 1: inputting the original data in the KDDCup99 data set into an encoding processing module, performing one-hot encoding on all non-numerical data in the original data, and converting all non-numerical data in the original data into numerical data;
step 2: inputting numerical data in the original data and numerical data obtained by coding into a data normalization module, and performing normalization processing by adopting a MinMax method to obtain a KDDCup99 data set in an output range of [0,1 ];
and step 3: inputting the normalized KDDCup99 data set into a feature selection module, performing dimensionality reduction processing on the data in the normalized KDDCup99 data set, firstly calculating the variance value of the data in the normalized KDDCup99 data set under a specified dimensionality reduction threshold, then defining the data larger than the dimensionality reduction threshold as principal component data, and finally adding a feature to each obtained principal component data, wherein the feature is that the normal principal component data is marked as 1, and the abnormal principal component data is marked as-1;
and 4, step 4: inputting the KDDCup99 data set subjected to the dimensionality reduction processing into an MLPClasifier multi-layer perceptron classifier, continuously adjusting hyper-parameters to obtain a prediction result of each piece of data in the KDDCup99 data set, wherein the prediction result comprises normal data and abnormal data in abnormal attack categories, obtaining the correct rate of the prediction result according to the ratio of the number of the normal data to the total number of original data in the KDDCup99 data set, and obtaining the error rate of the prediction result according to the ratio of the number of the abnormal data to the total number of the original data in the KDDCup99 data set;
and 5: inputting the correct rate and the error rate into an accuracy rate module to obtain the accurate rate and the recall rate of the network abnormality detection system based on the neural network;
step 6: and inputting a prediction result obtained by detecting the KDDCup99 data set original data through a network anomaly detection system based on a neural network into an observer operating characteristic curve drawing module, drawing an ROC curve according to the obtained precision rate and recall rate, and intuitively obtaining the difference between the predicted value and the actual value of the KDDCup99 data set data through the ROC curve.
The MLPClasifier multi-layer perceptron classifier in the step 4 is designed based on a neural network algorithm, and the specific outline is as follows:
the MLPClasefi multi-layer perception machine classifier based on the neural network algorithm is designed with five hidden layers, the input of the first hidden layer is a KDDCup99 data set after dimension reduction processing, the input of the second hidden layer to the fourth hidden layer is output data of the previous hidden layer, the activation function of each hidden layer adopts a RELU function and adopts an L2 regularization mode to prevent overfitting, the regularization parameter is set to be 0.0001, the output layer adopts a linear function, the output of the last hidden layer is used as the input of the output layer to obtain the probability of each neural network state of each piece of data, the neural network state comprises a normal state and four abnormal states of DOS, R2L, U2R and PROBING, and the neural network state corresponding to the maximum probability is output, and the neural network state corresponding to the maximum probability is the prediction result of the classifier.
In the step 5, the accuracy and the error rate are input to an accuracy module to obtain the accuracy and the recall rate of the network anomaly detection system based on the neural network, which is specifically expressed as:
inputting the correct rate and the error rate into an accuracy rate module, counting the number of the normal data as m, counting the number of the original data in the normal data as n, and calculatingObtaining the precision rate of the network anomaly detection system based on the neural network, counting the number of normal data samples in the original data as s, and comparing the normal data samples with the sample number of the normal data samples
Figure BDA0002226467420000032
The size of the network anomaly detection system is obtained according to the recall rate of the network anomaly detection system based on the neural network.
The invention has the beneficial effects that:
compared with the traditional KNN (K neighbor) algorithm and SVM (support vector machine) algorithm, the network anomaly detection algorithm based on the neural network has the advantages that the accuracy of the network anomaly data prediction is higher, the running time is shorter, and the detection efficiency is greatly improved.
Drawings
Fig. 1 is a flowchart of a detection method of the neural network based network anomaly detection system in this embodiment.
Fig. 2 shows the types and numbers of the parts of the flag feature in the KDDCUP99 data set in this embodiment.
Fig. 3 shows the result of processing all the characteristics of a piece of data in the KDDCUP99 dataset in this embodiment.
Fig. 4 is a diagram illustrating a ratio of variance values of dimensions to total dimensions after data projection without dimension reduction of the KDDCUP99 data set in the present embodiment.
Fig. 5 is a diagram illustrating a ratio of variance values of each dimension to a total dimension after data projection is performed on the KDDCUP99 data set in this embodiment after setting the dimensionality reduction threshold to 0.999.
Fig. 6 is a ROC graph of the network anomaly detection system based on the neural network in the present embodiment.
Fig. 7 is a ROC graph based on KNN and SVM in the present embodiment.
Detailed Description
In the following detailed description of the technical solution of the present invention, with reference to the accompanying drawings and specific embodiments, first, data preprocessing is performed on KDDCup99 data sets, and data processing is performed on discrete features and continuity features, then PCA dimension reduction is performed on the data, then a neural network is used to perform anomaly detection on the data, and the data is compared with KNN (K nearest neighbor algorithm) and SVM (support vector machine algorithm). Several common evaluation indexes are adopted, mainly including accuracy, precision, recall, F-score and ROC curve.
A detection system of a network anomaly detection system based on a neural network comprises a data preprocessing unit and a data unit for identifying network attack anomalies, wherein the data preprocessing unit is used for carrying out coding processing, data normalization and feature selection on numerous disordered original data in a KDDCup99 data set so as to remove redundancy and noise, and the data unit for identifying network attack anomalies is used for identifying anomaly attack categories on the KDDCup99 data set, wherein the anomaly attack categories comprise four categories of DOS (denial of service attack), R2L (illegal access from a remote machine), U2R (illegal access of a common user to local super user privileges) and PROBING (monitoring and other detection activities).
The data preprocessing unit comprises an encoding processing module, a data normalization module and a feature selection module, wherein the encoding processing module is used for encoding non-numerical data in numerous disordered original data in a KDDCup99 data set, converting the non-numerical data in the original data into numerical data, compressing each line of data in all the numerical data into a numerical value between [0 and 1] through the normalization module to obtain a KDDCup99 data set in an interval with an output range of [0 and 1], and finally reducing the dimensionality of the data in the data set through the feature selection module by using the normalized KDDCup99 data set and screening to obtain main component data;
the coding processing module is used for converting non-numerical data in the KDDCup99 data set into numerical data;
the data normalization module is used for performing normalization processing on the numerical data by adopting a MinMax method to obtain a KDDCup99 data set with an output range of [0,1] interval;
the characteristic selection module is used for reducing the dimensionality of the data after normalization processing and reducing data redundancy and noise, so that the principal components of the processed data are independent of each other.
The data unit for identifying the network attack abnormity comprises an accuracy rate module and an observer operating characteristic curve drawing module, a KDDCup99 data set which is output by the data preprocessing unit and subjected to dimensionality reduction processing is input into an MLPClasifier multi-layer perceptron classifier, then a prediction result of each piece of data in the KDDCup99 data set is obtained by continuously adjusting hyper-parameters, the accuracy rate and the recall rate of a network abnormity detection system based on a neural network are obtained according to the prediction result, finally the prediction result is input into the observer operating characteristic curve drawing module, an ROC curve is drawn according to the obtained accuracy rate and recall rate, and the difference between the predicted value and the true value of the data in the KDDCup99 data set is obtained through the ROC curve in a visual mode;
the accuracy module is used for comparing the prediction result with the original data, calculating the accuracy and the recall rate of the network abnormity detection system based on the neural network, and obtaining the detection result of the network abnormity detection system based on the neural network;
the observer operation characteristic curve drawing module is used for drawing an ROC curve for the detection result of the network abnormality detection system based on the neural network, visually displaying data in a KDDCup99 data set, and visually displaying the difference between a predicted value and a true value obtained through the network abnormality detection system based on the neural network.
A method for detecting a network anomaly detection system based on a neural network, such as the flow chart of the detection method of the network anomaly detection system based on the neural network shown in fig. 1, includes the following steps:
step 1: the method comprises the steps of inputting original data in a KDDCup99 data set into an encoding processing module, carrying out one-hot encoding on all non-numerical data in the original data, and converting all non-numerical data in the original data into numerical data, wherein in the embodiment, 6 numerical features and 3 non-numerical features are included in discrete features in a KDDCup99 data set, and one-hot encoding processing is directly adopted for the numerical discrete features, and values of the numerical discrete features are only selected from 0 or 1, so that 12 features are obtained after the one-hot encoding processing is finished, and the types and the number of protocol features in a KDDCUP99 data set are obtained by carrying out text processing on files, wherein 283602 protocols of icmp types, 190065 protocols of tcp types and 20354 protocols of udp types are obtained.
The service features have 66 types, wherein the flag types are 11, the exec types are 99, the name types are 98, the kshell types are 98, the ctf types are 97, the netstat types are 95, the Z39-50 types are 92, the IRC types are 43, and the partial types and the number of the flag types are shown in FIG. 2.
The non-numerical discrete features have 3, 66 and 11 different values, so that the non-numerical discrete features after processing become 80 features.
Step 2: inputting numerical data in the original data and numerical data obtained by coding into a data normalization module, performing normalization processing by adopting a preprocessing. MinMaxScalter method in a sklern library to obtain a KDDCup99 data set with an output range of [0,1], changing the original 41 features of the data into 124 features until now, and totally having 124 feature values after processing, wherein the processed data result is shown in figure 3.
And step 3: inputting the normalized KDDCup99 data set into a feature selection module, performing dimensionality reduction processing on the data in the normalized KDDCup99 data set, firstly calculating the variance value of the data in the normalized KDDCup99 data set under a specified dimensionality reduction threshold, then defining the data larger than the dimensionality reduction threshold as principal component data, and finally adding a feature to each obtained principal component data, wherein the feature is that the normal principal component data is marked as 1, and the abnormal principal component data is marked as-1;
first, principal component processing is performed on data, and only the data is subjected to projection processing without dimension reduction, and the variance distribution of each dimension after projection is observed as shown in fig. 4.
After SVD is decomposed, a designated dimension reduction threshold value n _ components is set to be 0.9999, 72 features are obtained, the components of the attribute values of the 72 features exceed 0.9999 of all the feature attribute values, data containing the 72 features are data after dimension reduction, when the designated dimension reduction threshold value n _ components is 0.99, namely the designated principal component accounts for 99%, the principal component variance and the proportion are obtained, the principal component variance values and the proportion of the total variance are obtained, and it can be seen that 17 attribute features exist after dimension reduction of the data, and the components of the 17 attribute features exceed ninety-nine percent.
When the designated dimension reduction threshold n _ components is set to be 0.9999, 72 features exist, the principal component variance is obtained, the ratio of each principal component variance value to the total variance after dimension reduction is shown in fig. 5, and after dimension reduction processing is performed on data by using the designated dimension reduction threshold n _ components of 0.9999, classification can be better achieved.
And 4, step 4: inputting the KDDCup99 data set subjected to the dimensionality reduction processing into an MLPClasifier multi-layer perceptron classifier, continuously adjusting hyper-parameters to obtain a prediction result of each piece of data in the KDDCup99 data set, wherein the prediction result comprises normal data and abnormal data in abnormal attack categories, obtaining the correct rate of the prediction result according to the ratio of the number of the normal data to the total number of original data in the KDDCup99 data set, and obtaining the error rate of the prediction result according to the ratio of the number of the abnormal data to the total number of the original data in the KDDCup99 data set;
the MLPClasifier multi-layer perceptron classifier is designed based on a neural network algorithm, optimizes a loss function through LBFGS or random gradient descent, and concretely comprises the following steps:
the MLPClasefi multi-layer perception machine classifier based on the neural network algorithm is designed with five hidden layers, the input of the first hidden layer is a KDDCup99 data set after dimension reduction processing, the input of the second hidden layer to the fourth hidden layer is output data of the previous hidden layer, the activation function of each hidden layer adopts a RELU function and adopts an L2 regularization mode to prevent overfitting, the regularization parameter is set to be 0.0001, the output layer adopts a linear function, the output of the last hidden layer is used as the input of the output layer to obtain the probability of each neural network state of each piece of data, the neural network state comprises a normal state and four abnormal states of DOS, R2L, U2R and PROBING, and the neural network state corresponding to the maximum probability is output, and the neural network state corresponding to the maximum probability is the prediction result of the classifier.
In the experiment, an ideal model effect is obtained by adjusting a series of hyper-parameters of the MLPClasefi multi-layer perceptron classifier, wherein alpha is a float type parameter, MLP is regularization supporting, alpha is a regularization item parameter, namely an L2 parameter, and a default value is set to be 0.001.
When the solvent is set to be the random gradient descending, the hidden _ layer _ sizes is set to be 5, the alpha is set to be 1e-5, the random _ state is set to be 1, the early positioning is set to be True, and the other values are default values, the evaluation index result of the common model is obtained.
And 5: inputting the accuracy and the error rate into an accuracy module to obtain the accuracy and the recall rate of the network anomaly detection system based on the neural network, which is specifically expressed as follows:
inputting the accuracy and error rate into an accuracy moduleCounting the number of the normal data as m, counting the number of the original data in the normal data as n, and calculating
Figure BDA0002226467420000071
Obtaining the precision rate of the network anomaly detection system based on the neural network, counting the number of normal data samples in the original data as s, and comparing the normal data samples with the sample number of the normal data samples
Figure BDA0002226467420000072
The size of the network anomaly detection system is obtained according to the recall rate of the network anomaly detection system based on the neural network.
Step 6: inputting a prediction result obtained by detecting the KDDCup99 data set raw data through a network anomaly detection system based on a neural network into an observer operating characteristic curve drawing module, drawing an ROC curve according to the obtained precision rate and recall rate, and visually obtaining the difference between a predicted value and a true value of the KDDCup99 data set data through the ROC curve, wherein the detection result obtained by the network anomaly detection system based on the neural network in the embodiment is as follows: acc (accuracy) is 0.920152140154, the micro average accuracy is 0.920152140154, the macro average accuracy is 0.854771813775, the micro average recall is 0.920152140154, the macro average recall is 0.947451480342, F1 is 0.924726460069, AUC (area Under Curve) is 0.979521219622, and the ROC graph is shown in FIG. 6.
And then, carrying out anomaly detection on the data by using the KNN, the SVM and the neural network respectively, wherein a plurality of commonly used evaluation indexes are adopted, and the evaluation indexes mainly comprise accuracy, precision, recall rate, F-score and ROC-AUC curves.
Both positive and negative are labels referring to categories, FN indicates true positive, and prediction negative; TP represents true positive, while predicted positive; TN represents predicted negative, and true negative; FP indicates prediction positive and true negative.
The precision rate is also called precision rate, which is defined as:
Figure BDA0002226467420000073
in the formula, P represents the accuracy rate, TP represents true positive and prediction positive, FP represents prediction positive and true negative;
recall, also known as recall, is defined as:
Figure BDA0002226467420000074
wherein R represents recall, TP represents true positive and prediction positive, FN represents true positive and prediction negative;
since recall ratio and precision ratio are a pair of contradictory indexes, in order to solve the problem, F is introduced1(F1Score) value, formula defined as:
Figure BDA0002226467420000081
in the formula, F1A metric representing the classification problem, TP represents true positive and prediction positive, FN represents true positive and prediction negative, and FP represents prediction positive and true negative.
The ROC is called Receiver Operating characterization, the curve is usually used by people for comparison among different binary classifiers, firstly, a P-R curve is introduced, the P-R curve takes the recall ratio as the horizontal axis and the precision ratio as the vertical axis, and the method comprises the following steps: firstly, expected results of all test samples are obtained through a learner, all result sets are sorted in a descending order, so that the situation that the samples are positive in front and negative in back can exist, then the samples are processed according to the positive class according to the order, and the current two rates, namely the recall ratio and the precision ratio, are calculated.
The vertical axis of ROC is TPR, i.e., the true rate, so a larger value of TPR means that more actual positive classes are included in the samples of the predicted positive class; the ROC horizontal axis is FPR, i.e. false positive rate, the larger FPR means the more actual negative classes in the samples of the predicted positive classes, obviously the ideal target is that TPR is as large as possible or even close to 1, if FPR approaches 0, then the curve shows continuous approaching to the vertical axis (0,1) point and increasingly deviating from the diagonal, and the better the effect is.
In the experiment, an ideal model effect is obtained by adjusting a series of hyper-parameters of the MLPClasefi multi-layer perceptron classifier, wherein alpha is a float type parameter, MLP is regularization supporting, alpha is a regularization item parameter, namely an L2 parameter, and a default value is set to be 0.001.
When the solvent is set to be the random gradient descending, the hidden _ layer _ sizes is set to be 5, the alpha is set to be 1e-5, the random _ state is set to be 1, the early positioning is set to be True, and the other values are default values, the evaluation index result of the common model is obtained.
The comparison effect of the abnormal detection result obtained by comparing the detection method of the network abnormal detection system based on the neural network with the KNN algorithm and the SVM algorithm is shown in fig. 7.
Network intrusion detection based on KNN:
setting the value of k, namely n _ neighbors, as 5 by using a Kneighbors classifier function in skleern, namely, looking at that more normal points or more abnormal points exist in 5 points closest to a sample to be measured, P represents a distance measurement mode, P2 represents an Euclidean distance, P1 represents a Manhattan distance, wherein the Euclidean distance is selected, and other parameters in the function are set as default values, so that the accuracy, the precision rate, the recall rate and the F rate can be seen1The results of the evaluation indices such as values commonly used are: acc (accuracy) is 0.921520787357, micro average accuracy is 0.921521787357, macro average accuracy is 0.856794414575, micro average recall is 0.921521787357, macro average recall is 0.944986709142, F10.925806386238, the ROC curve is shown in FIG. 7 (a).
Network intrusion detection based on SVM:
the SVM mainly has three parameters in a sklern library, wherein C is a penalty term and is the tolerance degree of errors, and a kernel function can be Linear, poly and RBF, and is commonly used as RBF; gama is a parameter of the RBF function selected as the kernel, and the function has more support vectors as the function is smaller, and less support vectors as the function is larger, so that the ama determines the distribution of the data after being converted into a new space, and the number of the distribution affects the speed of the whole process, wherein the experimental parameter is selected to be: the results of the commonly used evaluation indexes of accuracy, precision, recall, F1 values, etc. can be found in { ' C ':1.0, ' cache _ size ':200, ' class _ weight ': None, ' coef0':0.0, ' decision _ function _ shape ': ovr ', ' default ':3, ' gamma ': auto ', ' kernel ': rbf ', ' max _ iter ':1, ' probability ': False ', ' random _ state ': None, ' sharpening ': True ', ' tol ':0.001, ' upside ': False }, trained through the training data set, and the model is used to predict the KDDCup99 data set to see the following results: acc (accuracy) is 0.921521787357, micro average accuracy is 0.921521787375, macro average accuracy is 0.856794414575, micro average recall is 0.921521787357, macro average recall is 0.944986709142, F10.925806386238, wherein the ROC graph is shown in FIG. 7 (b).
Comparing fig. 6 with fig. 7, it can be seen that, from the AUC value of the ROC curve, the AUC value of the neural network is 0.9795, the AUC value of the SVM is 0.9669, and the AUC value of KNN is 0.9581, the network anomaly detection based on the neural network performs better, and from the cost of network intrusion detection time, the network anomaly detection based on KNN and SVM has a longer running time than the running time of the neural network model, where SVM is difficult to converge and the running time is very long, and in the experiment, it can be obviously observed that the running time is long, and the SVM is not suitable for processing a particularly large data set.

Claims (6)

1. The network anomaly detection system based on the neural network is characterized by comprising a data preprocessing unit and a data unit for identifying network attack anomalies, wherein the data preprocessing unit is used for carrying out coding processing, data normalization and feature selection on numerous disordered original data in a KDDCup99 data set so as to remove redundancy and noise, the data unit for identifying the network attack anomalies is used for identifying anomaly attack categories including DOS, R2L, U2R and PROBING on the KDDCup99 data set.
2. The system of claim 1, wherein the data preprocessing unit comprises an encoding processing module, a data normalization module, and a feature selection module, and the encoding processing module encodes non-numerical data in numerous disordered raw data in the KDDCup99 data set, converts the non-numerical data in the raw data into numerical data, compresses each line of data in all the numerical data into a numerical value between [0 and 1] through the normalization module, obtains a KDDCup99 data set in an interval with an output range of [0 and 1], and reduces the dimensionality of the data in the data set through the feature selection module, and screens the KDDCup99 data set after normalization to obtain principal component data;
the coding processing module is used for converting non-numerical data in the KDDCup99 data set into numerical data;
the data normalization module is used for performing normalization processing on the numerical data by adopting a MinMax method to obtain a KDDCup99 data set with an output range of [0,1] interval;
the characteristic selection module is used for reducing the dimensionality of the data after normalization processing and reducing data redundancy and noise, so that the principal components of the processed data are independent of each other.
3. The system as claimed in claim 1, wherein the data unit for identifying abnormal network attacks includes an accuracy module and an observer operating characteristic curve drawing module, and the KDDCup99 dataset output by the data preprocessing unit after being subjected to dimensionality reduction is input into the mlpclasifier multi-layer perceptron classifier, then a prediction result of each piece of data in the KDDCup99 dataset is obtained by continuously adjusting the hyper-parameters, the accuracy and recall rate of the network abnormality detection system based on the neural network are obtained according to the prediction result, and finally the prediction result is input into the observer operating characteristic curve drawing module, and an ROC curve is drawn according to the obtained accuracy and recall rate, and the difference between the predicted value and the true value of the data in the dckdup 99 dataset is visually obtained through the ROC curve;
the accuracy module is used for comparing the prediction result with the original data, calculating the accuracy and the recall rate of the network abnormity detection system based on the neural network, and obtaining the detection result of the network abnormity detection system based on the neural network;
the observer operation characteristic curve drawing module is used for drawing an ROC curve for the detection result of the network abnormality detection system based on the neural network, visually displaying data in a KDDCup99 data set, and visually displaying the difference between a predicted value and a true value obtained through the network abnormality detection system based on the neural network.
4. The method for detecting a neural network-based network anomaly detection system according to any one of claims 1 to 3, comprising the steps of:
step 1: inputting the original data in the KDDCup99 data set into an encoding processing module, performing one-hot encoding on all non-numerical data in the original data, and converting all non-numerical data in the original data into numerical data;
step 2: inputting numerical data in the original data and numerical data obtained by coding into a data normalization module, and performing normalization processing by adopting a MinMax method to obtain a KDDCup99 data set in an output range of [0,1 ];
and step 3: inputting the normalized KDDCup99 data set into a feature selection module, performing dimensionality reduction processing on the data in the normalized KDDCup99 data set, firstly calculating the variance value of the data in the normalized KDDCup99 data set under a specified dimensionality reduction threshold, then defining the data larger than the dimensionality reduction threshold as principal component data, and finally adding a feature to each obtained principal component data, wherein the feature is that the normal principal component data is marked as 1, and the abnormal principal component data is marked as-1;
and 4, step 4: inputting the KDDCup99 data set subjected to the dimensionality reduction processing into an MLPClasifier multi-layer perceptron classifier, continuously adjusting hyper-parameters to obtain a prediction result of each piece of data in the KDDCup99 data set, wherein the prediction result comprises normal data and abnormal data in abnormal attack categories, obtaining the correct rate of the prediction result according to the ratio of the number of the normal data to the total number of original data in the KDDCup99 data set, and obtaining the error rate of the prediction result according to the ratio of the number of the abnormal data to the total number of the original data in the KDDCup99 data set;
and 5: inputting the correct rate and the error rate into an accuracy rate module to obtain the accurate rate and the recall rate of the network abnormality detection system based on the neural network;
step 6: and inputting a prediction result obtained by detecting the KDDCup99 data set original data through a network anomaly detection system based on a neural network into an observer operating characteristic curve drawing module, drawing an ROC curve according to the obtained precision rate and recall rate, and intuitively obtaining the difference between the predicted value and the actual value of the KDDCup99 data set data through the ROC curve.
5. The method according to claim 4, wherein the MLPClasifier multi-layer perceptron classifier in step 4 is designed by a neural network algorithm, and the specific outline is as follows: the MLPClasefi multi-layer perception machine classifier based on the neural network algorithm is designed with five hidden layers, the input of the first hidden layer is a KDDCup99 data set after dimension reduction processing, the input of the second hidden layer to the fourth hidden layer is output data of the previous hidden layer, the activation function of each hidden layer adopts a RELU function and adopts an L2 regularization mode to prevent overfitting, the regularization parameter is set to be 0.0001, the output layer adopts a linear function, the output of the last hidden layer is used as the input of the output layer to obtain the probability of each neural network state of each piece of data, the neural network state comprises a normal state and four abnormal states of DOS, R2L, U2R and PROBING, and the neural network state corresponding to the maximum probability is output, and the neural network state corresponding to the maximum probability is the prediction result of the classifier.
6. The method as claimed in claim 4, wherein the accuracy and the error rate in step 5 are input to an accuracy module to obtain the accuracy and the recall ratio of the neural network based network anomaly detection system, which is specifically expressed as:
inputting the correct rate and the error rate into an accuracy rate module, counting the number of the normal data as m, counting the number of the original data in the normal data as n, and calculating
Figure FDA0002226467410000031
Obtaining the precision rate of the network anomaly detection system based on the neural network, counting the number of normal data samples in the original data as s, and comparing the normal data samples with the sample number of the normal data samples
Figure FDA0002226467410000032
The size of the network anomaly detection system is obtained according to the recall rate of the network anomaly detection system based on the neural network.
CN201910953413.4A 2019-10-09 2019-10-09 Network anomaly detection system and method based on neural network Pending CN110719279A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910953413.4A CN110719279A (en) 2019-10-09 2019-10-09 Network anomaly detection system and method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910953413.4A CN110719279A (en) 2019-10-09 2019-10-09 Network anomaly detection system and method based on neural network

Publications (1)

Publication Number Publication Date
CN110719279A true CN110719279A (en) 2020-01-21

Family

ID=69212290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910953413.4A Pending CN110719279A (en) 2019-10-09 2019-10-09 Network anomaly detection system and method based on neural network

Country Status (1)

Country Link
CN (1) CN110719279A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182001A (en) * 2020-02-11 2020-05-19 深圳大学 Distributed network malicious attack detection system and method based on convolutional neural network
CN112749739A (en) * 2020-12-31 2021-05-04 天博电子信息科技有限公司 Network intrusion detection method
CN115409104A (en) * 2022-08-25 2022-11-29 贝壳找房(北京)科技有限公司 Method, apparatus, device, medium and program product for identifying object type

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276391A1 (en) * 2003-12-22 2009-11-05 Dintecom, Inc. Creation of neuro-fuzzy expert system from online analytical processing (olap) tools
CN106250442A (en) * 2016-07-26 2016-12-21 新疆大学 The feature selection approach of a kind of network security data and system
CN108540451A (en) * 2018-03-13 2018-09-14 北京理工大学 A method of classification and Detection being carried out to attack with machine learning techniques
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN109309675A (en) * 2018-09-21 2019-02-05 华南理工大学 A kind of network inbreak detection method based on convolutional neural networks
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN110070141A (en) * 2019-04-28 2019-07-30 上海海事大学 A kind of network inbreak detection method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090276391A1 (en) * 2003-12-22 2009-11-05 Dintecom, Inc. Creation of neuro-fuzzy expert system from online analytical processing (olap) tools
CN106250442A (en) * 2016-07-26 2016-12-21 新疆大学 The feature selection approach of a kind of network security data and system
CN108540451A (en) * 2018-03-13 2018-09-14 北京理工大学 A method of classification and Detection being carried out to attack with machine learning techniques
CN108712404A (en) * 2018-05-04 2018-10-26 重庆邮电大学 A kind of Internet of Things intrusion detection method based on machine learning
CN108881196A (en) * 2018-06-07 2018-11-23 中国民航大学 The semi-supervised intrusion detection method of model is generated based on depth
CN109309675A (en) * 2018-09-21 2019-02-05 华南理工大学 A kind of network inbreak detection method based on convolutional neural networks
CN109962909A (en) * 2019-01-30 2019-07-02 大连理工大学 A kind of network intrusions method for detecting abnormality based on machine learning
CN109977094A (en) * 2019-01-30 2019-07-05 中南大学 A method of the semi-supervised learning for structural data
CN110070141A (en) * 2019-04-28 2019-07-30 上海海事大学 A kind of network inbreak detection method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
YANQING LI,ET.AL: "《An electrode misalignment inspection system based on image processing technology for use in resistance spot welding》", 《MEASUREMENT SCIENCE AND TECHNOLOGY》 *
李勤朴等: "《一种改进的BP神经网络入侵检测方法的设计与实现》", 《湖南电力》 *
李志鹏: "《基于神经网络的多源数据攻击检测研究与应用》", 《中国优秀硕士学位论文全文数据库(电子期刊)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111182001A (en) * 2020-02-11 2020-05-19 深圳大学 Distributed network malicious attack detection system and method based on convolutional neural network
CN112749739A (en) * 2020-12-31 2021-05-04 天博电子信息科技有限公司 Network intrusion detection method
CN115409104A (en) * 2022-08-25 2022-11-29 贝壳找房(北京)科技有限公司 Method, apparatus, device, medium and program product for identifying object type

Similar Documents

Publication Publication Date Title
Ravale et al. Feature selection based hybrid anomaly intrusion detection system using K means and RBF kernel function
Altwaijry Bayesian based intrusion detection system
Ye et al. A scalable clustering technique for intrusion signature recognition
CN110719279A (en) Network anomaly detection system and method based on neural network
Salem et al. Detecting Masqueraders: A Comparison of One-Class Bag-of-Words User Behavior Modeling Techniques.
Elsayed et al. Detecting abnormal traffic in large-scale networks
CN113470695A (en) Sound abnormality detection method, sound abnormality detection device, computer device, and storage medium
CN113904881B (en) Intrusion detection rule false alarm processing method and device
CN112039903A (en) Network security situation assessment method based on deep self-coding neural network model
Ahmad et al. Analysis of classification techniques for intrusion detection
WO2022180613A1 (en) Global iterative clustering algorithm to model entities' behaviors and detect anomalies
CN113961438A (en) Multi-granularity and multi-hierarchy based historical behavior abnormal user detection system, method, equipment and storage medium
Ahmed Thwarting dos attacks: A framework for detection based on collective anomalies and clustering
Abdulrahaman et al. Ensemble learning approach for the enhancement of performance of intrusion detection system
Aziz et al. Cluster Analysis-Based Approach Features Selection on Machine Learning for Detecting Intrusion.
Do Xuan et al. Optimization of network traffic anomaly detection using machine learning.
Tun et al. Network anomaly detection using threshold-based sparse
Tang et al. A DDoS attack situation assessment method via optimized cloud model based on influence function
CN110770753B (en) Device and method for real-time analysis of high-dimensional data
Salek et al. Intrusion detection using neuarl networks trained by differential evaluation algorithm
Meng Measuring intelligent false alarm reduction using an ROC curve-based approach in network intrusion detection
Prerau et al. Unsupervised anomaly detection using an optimized K-nearest neighbors algorithm
CN116170187A (en) Industrial Internet intrusion monitoring method based on CNN and LSTM fusion network
Gao Anomaly detection
Hussein et al. Network Intrusion Detection System Using Ensemble Learning Approaches

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200121