CN112836215A

CN112836215A - Artificial intelligent active intrusion detection method based on voting mechanism

Info

Publication number: CN112836215A
Application number: CN202110053636.2A
Authority: CN
Inventors: 林德秀; 叶睿; 吴家庆; 冯煜濠
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-01-15
Filing date: 2021-01-15
Publication date: 2021-05-25

Abstract

The invention discloses an artificial intelligence active intrusion detection method based on a voting mechanism, which compares 4 models of a proximity algorithm KNN, a Gauss Bayes, a BP neural network and a decision tree, selects 3 models of the Gauss Bayes, the BP neural network and the decision tree to vote to obtain a detection result of a sample, and obtains a better effect in a short time. The invention not only verifies the performance effect of a single classifier, but also integrates the advantages of a plurality of classifiers, carries out sample detection by a voting mechanism, constructs an intrusion detection model with higher detection rate and stronger adaptability to system state change, and constructs a simulation environment to verify the detection rate and accuracy.

Description

Artificial intelligent active intrusion detection method based on voting mechanism

Technical Field

The patent belongs to the technical field of information security, and particularly relates to an artificial intelligence active intrusion detection method based on a voting mechanism.

Background

Intrusion refers to the act of compromising the confidentiality, integrity or availability of network resources, such as unauthorized attempts to gain illegal data access or tampering with data, bypassing the security mechanisms of a computer or network. Intrusion detection is an active network security defense technology, effectively makes up the defects of a static security defense technology, and can provide comprehensive protection for a network system. Therefore, research on intrusion detection techniques is necessary, and intelligent intrusion detection techniques are one of the important research points. At present, the network faces more and more viruses, loopholes and hacker attacks, and the intrusion detection technology is used as one of core technologies of network security defense, so that the attack behavior can be effectively detected.

The introduction of machine learning methods into intrusion detection systems has been a trend in recent years. At present, many intrusion detection systems based on machine learning methods such as neural networks, support vector machines, naive bayes, decision trees, and the like appear. The main functions of these detection systems are to monitor the network and computer systems in real time, to discover and identify intrusion behaviour or attempts in the system, and to give intrusion alarms. However, the current network data contains a large amount of redundant and noisy variables (characteristics), which results in the reduction of the accuracy of the detection model and the overlong training time of the detection model; the value ranges (value ranges) of the features are different, and if the value of part of the features is too large, deviation can be generated on the final classification result, and meanwhile, the convergence of parameters can be influenced, so that the training time of the classifier is increased.

Aiming at the problems, the ideas of feature selection and normalization are introduced, and a network intrusion detection algorithm based on feature selection is provided. The feature selection can reduce the dimension of a feature space on the premise of not reducing the classification accuracy as much as possible, remove redundant and noisy features, namely, select a feature subset related to an output result or important from an original feature set according to a certain evaluation function. Normalization is a dimensionless processing means, a dimensionless expression is converted into a dimensionless expression through transformation, the dimensionless expression becomes a scalar, characteristics among different dimensions are compared in numerical value, and deviation of classification results caused by characteristics with overlarge partial numerical values can be avoided.

The method combining feature selection and normalization can improve the accuracy of the classification model and shorten the training time of the model. The method is validated in both class two and multi-classification domains that were rarely previously involved. Experiments prove that the method effectively improves the classification accuracy,

except for the support vector machine, the other two multi-classification models remarkably reduce the training time of the models after feature extraction, and reduce the system overhead. Due to the nature of the support vector machine, the training time does not fluctuate much.

However, in the multi-classification problem, because the occurrence frequency of some classes in the KDDCup99 data set is very small, how to train a model with high accuracy and strong generalization ability from the labels with very small training number is a problem to be explored in the future.

Disclosure of Invention

The invention aims to provide an artificial intelligent active intrusion detection method based on a voting mechanism, which is used for detecting samples by the voting mechanism and constructing an intrusion detection model with higher detection rate and stronger adaptability to system state change.

In order to achieve the purpose, the invention provides the following technical scheme:

an artificial intelligent active intrusion detection method based on a voting mechanism comprises the following steps:

(1) training process:

1.1) firstly discretizing original data, carrying out feature selection on the discretized data, normalizing extracted feature subsets, and finally importing the data to be processed into a classifier for training;

1.2) respectively establishing models through training, and testing and comparing the accuracy and the training duration of each model;

(2) and (3) detection process:

2.1) inputting original computer data into the established model, and selecting an output mechanism of two-classification or five-classification;

2.2) voting by using the trained model to obtain the possibility that the new input data are classified into two or five categories;

2.3) selecting the category with the highest possibility as the prediction result feedback of the new data.

Further, the discretization method in step 1.1) comprises the following steps: entropy Minimization Discretization (EMD), proportional time interval discretization (PKID).

Further, the method for selecting the features in step 1.1) includes: a correlation-based feature selection method (CFS), a consistency-based filter method (CONS), an INTERACT method.

Further, the classifier of step 1.1) comprises: proximity algorithms (KNN), gaussian bayes, BP neural networks, and decision trees (decisiontrees).

Further, step 2.1) the second classification comprises: normal data (normal) and attack data (attack).

Further, the five classifications of step 2.1) include: normal data (normal), department of Service Denial of Service attack (dos), port attack (probe), Remote-to-log User attack (R2L), User-to-Root right-granting attack (U2R).

Further, the voting method in the step 2.2) is an experiment by using a cross-validation method, and comprises the following steps:

2.2.1) setting parameters of a proximity algorithm (KNN), wherein the parameters comprise K, and the K represents the number of the most adjacent data;

2.2.2) parameter setting of decision tree (decision tree), wherein the parameters comprise maximum depth (max _ depth) and minimum sample number (min _ samples _ split) required by internal node subdivision; (ii) a

2.2.3) comprehensive prediction: and comparing the test results of the models, detecting the sample by combining the results and the prediction models in a proportion voting mode, and obtaining the optimal detection result of the sample through multiple proportion experiments.

Further, the prediction model includes: gaussian bayes, Back Propagation (BP) neural networks, and decision trees.

Drawings

FIG. 1 is a flow chart of intrusion detection according to the present invention;

FIG. 2 is the average accuracy of parameter K of KNN under two classes;

FIG. 3 is the average accuracy of the parameter max _ depth of the decision tree under two classes;

FIG. 4 is the average accuracy of the parameter min _ samples _ split of the decision tree under two classes;

FIG. 5 is a confusion matrix of initial proportion voting detection results of each model under two categories;

FIG. 6 is a performance evaluation of the initial proportion voting detection results of the models under the two-classification;

FIG. 7 is a confusion matrix of the best-occupation-ratio voting detection results of the models under two categories;

FIG. 8 is a performance evaluation of the best-to-match voting detection results of the models under two categories;

FIG. 9 is the average accuracy of parameter K for KNN under five classifications;

FIG. 10 is the average accuracy of the parameter max _ depth of the decision tree under five categories;

FIG. 11 is the average accuracy of the parameter min _ samples _ split of the decision tree under five categories;

FIG. 12 is a confusion matrix of initial proportion voting detection results of models under five categories;

FIG. 13 is a performance evaluation of the initial percentage vote test results of each model under five categories;

FIG. 14 is a confusion matrix of the best-score voting test results for each model under five categories;

fig. 15 shows performance evaluation of the best-score voting test results of the respective models in the five categories.

Advantageous effects

Compared with the effects under four classification models, the KNN shows outstanding performance under two-classification or five-classification, but the defect is that the running time is too long, and the prediction is carried out on the other three prediction models (decision tree, Bayes and neural networks) in a proportion voting mode. The invention integrates the advantages of a plurality of classifiers, carries out sample detection by a voting mechanism, and constructs an intrusion detection model with higher detection rate and stronger adaptability to system state change.

Detailed Description

The following detailed description of the embodiments of the present invention will be made with reference to the accompanying drawings.

The invention provides an artificial intelligence active intrusion detection method based on a voting mechanism, and a flow chart of the intrusion detection method is shown in figure 1, and the method comprises the following steps:

(1) training process:

the discretization method comprises the following steps: entropy Minimization Discretization (EMD), proportional time interval discretization (PKID). The method for feature selection comprises the following steps: a correlation-based feature selection method (CFS), a consistency-based filter method (CONS), an INTERACT method. The classifier includes: proximity algorithms (KNN), gaussian bayes, BP neural networks, and decision trees (decisiontrees).

(ii) Neighbor Algorithm KNN (K-Nearest Neighbor)

The method has the following advantages that the idea is very simple and intuitive: if a sample belongs to a certain class in the K most similar samples in the feature space (i.e., the nearest neighbors in the feature space), then the sample also belongs to this class. The method only determines the category of the sample to be classified according to the category of the nearest sample or a plurality of samples in the classification decision.

Gauss Bayes

Gaussian bayes in fact assumes that each attribute follows a gaussian distribution in each class, then we can use maximum likelihood estimation to get the parameters of the gaussian distribution-mean and variance, and then use probability density to get the "probability" that a sample belongs to each class.

Bayesian formula:

gaussian distribution:

(iii) BP (back propagation) neural network

The BP neural network is a multi-layer feedforward network trained according to error back propagation (error back propagation for short), the algorithm is called BP algorithm, the basic idea is a gradient descent method, and a gradient search technology is utilized to minimize the mean square error between the actual output value and the expected output value of the network. The basic BP algorithm includes two processes, forward propagation of signals and back propagation of errors. That is, the error output is calculated in the direction from the input to the output, and the weight and the threshold are adjusted in the direction from the output to the input. During forward propagation, an input signal acts on an output node through a hidden layer, an output signal is generated through nonlinear transformation, and if actual output does not accord with expected output, the process of backward propagation of errors is carried out. The error back transmission is to back transmit the output error to the input layer by layer through the hidden layer, and to distribute the error to all units of each layer, and to use the error signal obtained from each layer as the basis for adjusting the weight of each unit. The error is reduced along the gradient direction by adjusting the connection strength of the input node and the hidden node, the connection strength of the hidden node and the output node and the threshold value, the network parameters (weight and threshold value) corresponding to the minimum error are determined through repeated learning and training, and the training is stopped immediately. At the moment, the trained neural network can process and output the information which is subjected to nonlinear conversion and has the minimum error to the input information of similar samples.

Decision tree (decisiontree)

A decision tree is a predictive model that represents a mapping between object attributes and object values. Each node in the tree represents an object and each divergent path represents a possible attribute value, and each leaf node corresponds to the value of the object represented by the path traveled from the root node to the leaf node. The decision tree has only a single output, and if a plurality of outputs are desired, independent decision trees can be established to handle different outputs.

(2) And (3) detection process:

2.1) inputting original computer data into the established model, and selecting an output mechanism of two-classification or five-classification; the second classification includes: normal data (normal) and attack data (attack); the five categories include: normal data (normal), department of Service Denial of Service attack (dos), port attack (probe), Remote-to-log User attack (R2L), User-to-Root right-granting attack (U2R).

2.2) voting by using three trained models of Gaussian Bayes, BP neural networks and decision trees according to the three input weight ratios, and obtaining the possibility that new input data respectively belong to each category according to the weighted average of the three weights;

The voting method is an experiment by using a cross validation method, and comprises the following steps:

2.2.1) KNN parameter setting, wherein the parameters comprise K, and K represents the number of the most adjacent data;

2.2.2) setting parameters of a decision tree, wherein the parameters comprise a maximum depth (max _ depth) and a minimum sample number (min _ samples _ split) required by internal node subdivision;

2.2.3) comprehensive prediction: and (3) comparing the test results of the models, detecting the sample by combining the test results with the prediction models in a proportion voting mode, and obtaining the optimal detection result of the sample through multiple proportion experiments. The prediction model includes: gaussian bayes, BP neural networks and decision trees (decisiontrees).

The embodiment of the invention respectively tests different effects obtained by different learning method models under two-classification and five-classification conditions for analysis and comparison.

Example 1:

under two categories, cross-validation experiments were performed:

setting KNN parameters (K on the horizontal axis and accuracy on the vertical axis):

a total of 3, 5, 7, 9, 11, 13 and 15 cases of K are selected, and the average accuracy of the cases is tested respectively. As can be seen from fig. 2, the average accuracy is the highest when K is 3 and K is 5, and finally K is 3 is selected as the K value of the KNN algorithm under the binary classification.

② decision tree parameter setting (maximum depth on horizontal axis and accuracy on vertical axis)

max _ depth, ranging from 10 to 30, with the accuracy at each value setting as shown in fig. 3, it can be seen that the highest is at 26 and 30, here 26, with this setting, the accuracy is checked in the test range from 2 to 20, as shown in fig. 4 below:

finally min _ samples _ split is set to 2.

Thirdly, comprehensive prediction:

after the comparison of prediction results of the models is completed, the Bayes model, the decision tree model and the neural network model are combined, and a sample is detected in a voting mode with a ratio of 1:1:1, so that the following results are obtained:

the confusion matrix is shown in fig. 5: the left and right columns represent prediction categories, respectively: normal (normal data) and attack (attack data), the total number of each column representing the number of data predicted to be of that class; the upper and lower rows represent the true attribution categories of data: normal data and attack data, the total number of data per row representing the number of data instances for that category. The values in each column represent the number of classes for which real data is predicted: for example, in a bayesian confusion matrix, we get 3874 data predicted as normal data and 2 data incorrectly predicted as attack data for the data that was originally in the normal class.

The performance evaluation is shown in fig. 6: the abscissa is accuracy, precision, recall, F1-Score (where the results are all scaled by 100 times), and the ordinate is the percentage of performance.

Through multiple proportion experiments, the voting decision proportion is finally changed into Bayes: a neural network: the decision tree is 35:8:7, the best effect is achieved:

the confusion matrix is shown in fig. 7: adding the data into a comprehensive confusion matrix of a voting mechanism to obtain 3866 data which are predicted to be normal data and 10 data which are mistakenly predicted to be attack data for the data which are initially in normal category; for data with initial attach category, only 30 data are predicted to be normal data, and 15855 data are correctly predicted to be attach attack data, which excels Bayesian and BP neural networks and is superior to decision trees in the classification effect of normal data.

The performance evaluation is shown in fig. 8: the voting mechanism is added, the comprehensive performance of the voting mechanism is in accuracy, precision and recall rate, the four indexes of F1-Score exceed Bayes and BP neural networks, and the difference between the four indexes and a decision tree is small.

Example 2:

under five categories, cross-validation experiments were performed:

as shown in fig. 9, the tested K value and the fold number are set to match the two-class classification, the average accuracy corresponding to each K value is obtained, and based on the result, the K value of the KNN algorithm in the five-class classification is also set to 3.

max _ depth, ranging from 10 to 30, the accuracy at each value setting is as shown in fig. 10, and when the parameter reaches 27, the accuracy is substantially maintained at 99.84, and thus max _ depth is set to 27.

On the basis, min _ sampls _ split is tested, the range is 2 to 20, and the accuracy is shown in fig. 11; as the parameter increases, the accuracy tends to decrease, and therefore the parameter is finally set to 2.

Thirdly, comprehensive prediction:

after the comparison of prediction results of the models is completed (illustration part of the attached drawings), the Bayes model, the decision tree model and the neural network model are combined, and a sample is detected in a voting mode with a ratio of 1:1:1, so that the following results are obtained:

the confusion matrix is shown in fig. 12: adding the comprehensive confusion matrix of a voting mechanism into the comprehensive confusion matrix to obtain 3838 data which are predicted to be normal data and 20 data which are mistakenly predicted to be dos attack data, 10 data which are mistakenly predicted to be probe attack data and 8 data which are mistakenly predicted to be R2L attack data for the data which are initially in the normal category; for data which is initially in a dos attack category, 15655 data are predicted to be dos attack data, 12 data are mistakenly predicted to be normal data, and 1 data are mistakenly predicted to be probe attack data; for data initially in the probe attack category, 162 are predicted to be probe attack data, 1 is mispredicted to be normal data, and 5 is mispredicted to be dos attack data; for data that was initially in the R2L attack category, 43 were predicted as R2L attack data, while 5 were mispredicted as normal data and 1 was mispredicted as probe attack data. Excel in Bayes and BP neural networks.

The performance evaluation is shown in fig. 13: the comprehensive performance of the added mechanism is in accuracy, precision and recall rate, and the four indexes of F1-Score exceed Bayes and BP neural networks and have smaller difference with a decision tree.

Through multiple proportion experiments, the voting decision proportion is finally changed into Bayes: a neural network: and (3) comparing the performances of the other three classifiers in a single experiment to obtain the best effect:

the confusion matrix is shown in fig. 14: adding the comprehensive confusion matrix of a voting mechanism to obtain 3858 data which are predicted to be normal data, 10 data which are mispredicted to be probe attack data and 8 data which are mispredicted to be R2L attack data for the data which are initially in the normal category; for data which is initially in a dos attack category, 15665 data are predicted to be dos attack data, 1 data are mistakenly predicted to be normal data, and 2 data are mistakenly predicted to be probe attack data; for data that is initially in the probe attack category, 163 are predicted as probe attack data, while 4 are mispredicted as normal data, and 1 is mispredicted as dos attack data; for data that was initially in the R2L attack category, 44 were predicted as R2L attack data, while 4 were mispredicted as normal data and 1 was mispredicted as probe attack data. Compared with the original Bayes, BP neural network and decision tree, the method obtains better classification result.

The performance evaluation is shown in fig. 15: the comprehensive performance of the voting mechanism is added, and the four indexes of accuracy, precision and recall rate, namely F1-Score, exceed Bayes, BP neural networks and decision trees.

The classification effect of each model is obvious after the classification is expanded from two classes to five classes, in the two classes, all abnormal data packets are combined into one class, and compared with the five classes, the imbalance of samples in the two classes is not obvious.

Claims

1. An artificial intelligence active intrusion detection method based on a voting mechanism is characterized by comprising the following steps:

(1) training process:

(2) and (3) detection process:

2. An artificial intelligence active intrusion detection method based on a voting mechanism according to claim 1, wherein the discretization method of step 1.1) comprises the following steps: entropy minimization discretization method, proportional time interval discretization method.

3. An artificial intelligence active intrusion detection method based on a voting mechanism according to claim 1, wherein the method for feature selection in step 1.1) comprises: a feature selection method based on correlation, a filter method based on consistency, an InterACT method.

4. An artificial intelligence active intrusion detection method based on a voting mechanism according to claim 1, wherein the classifier of step 1.1) comprises: proximity algorithms, gaussian bayes, back-propagation neural networks, and decision trees.

5. An artificial intelligence active intrusion detection method based on voting mechanism according to claim 1, wherein the step 2.1) of the second classification comprises: normal data and attack data.

6. An artificial intelligence active intrusion detection method based on a voting mechanism according to claim 1, wherein the five classifications of step 2.1) comprise: normal data, denial of service attacks, port attacks, remote user attacks, and privilege escalations.

7. The artificial intelligence active intrusion detection method based on the voting mechanism according to claim 1, wherein the voting method in step 2.2) is a cross-validation experiment, and comprises the following steps:

2.2.1) setting parameters of the proximity algorithm, wherein the parameters comprise K, and the K represents the number of the most adjacent data;

2.2.2) setting parameters of the decision tree, wherein the parameters comprise maximum depth and minimum sample number required by internal node subdivision;

8. The artificial intelligence active intrusion detection method based on voting mechanism according to claim 7, wherein the prediction model comprises: gaussian bayes, back-propagation neural networks and decision trees.