CN115563610B - Training method, recognition method and device for intrusion detection model - Google Patents
Training method, recognition method and device for intrusion detection model Download PDFInfo
- Publication number
- CN115563610B CN115563610B CN202211546247.4A CN202211546247A CN115563610B CN 115563610 B CN115563610 B CN 115563610B CN 202211546247 A CN202211546247 A CN 202211546247A CN 115563610 B CN115563610 B CN 115563610B
- Authority
- CN
- China
- Prior art keywords
- model
- training
- layer
- task
- meta
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012549 training Methods 0.000 title claims abstract description 82
- 238000000034 method Methods 0.000 title claims abstract description 69
- 238000001514 detection method Methods 0.000 title claims abstract description 54
- 238000013145 classification model Methods 0.000 claims abstract description 18
- 230000004913 activation Effects 0.000 claims abstract description 9
- 238000011176 pooling Methods 0.000 claims abstract description 5
- 101100455978 Arabidopsis thaliana MAM1 gene Proteins 0.000 claims abstract 3
- 238000012360 testing method Methods 0.000 claims description 31
- 238000012545 processing Methods 0.000 claims description 17
- 230000006870 function Effects 0.000 claims description 14
- 238000004891 communication Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 7
- 238000012795 verification Methods 0.000 claims description 7
- 230000009977 dual effect Effects 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000009966 trimming Methods 0.000 claims description 3
- 150000003254 radicals Chemical class 0.000 claims 1
- 238000013527 convolutional neural network Methods 0.000 abstract description 13
- 238000013528 artificial neural network Methods 0.000 abstract description 6
- 239000000523 sample Substances 0.000 description 42
- 238000002474 experimental method Methods 0.000 description 14
- 238000013135 deep learning Methods 0.000 description 6
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000010801 machine learning Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 238000007430 reference method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000003066 decision tree Methods 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 238000007637 random forest analysis Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000002679 ablation Methods 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000010355 oscillation Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 239000013074 reference sample Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 101150049349 setA gene Proteins 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 210000003813 thumb Anatomy 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/50—Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
- G06F21/55—Detecting local intrusion or implementing counter-measures
- G06F21/554—Detecting local intrusion or implementing counter-measures involving event detection and direct action
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Security & Cryptography (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Hardware Design (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention relates to a training method, an identification method and a device of an intrusion detection model. The method specifically comprises the following steps: and acquiring a sample data set, establishing a classification model, and training the classification model by using a meta-training method based on MAML. The classification model is a multi-channel CNN model. The multi-channel CNN model includes: the system comprises an input layer, a plurality of channels, a splicing layer and a full-connection layer, wherein each channel defines a Block, each Block comprises a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional maximum pooling layer and a Dropout layer, the splicing layer is used for connecting local features extracted from a plurality of different channels to form a new feature vector, and the full-connection layer and the output layer are sequentially arranged behind the splicing layer. The detection method based on the deep neural network and the meta-learning training thought can better solve the problem that the model cannot be trained due to insufficient attack sample data.
Description
Technical Field
The present invention relates to the field of intrusion detection, and in particular, to a training method, an identification method, and an apparatus for an intrusion detection model.
Background
For certain specific types of attacks, most deep learning methods can accurately identify the type of network attack that was previously trained, provided that massive amounts of data and sufficient computing resources are provided. However, the current internet environment is changing, and new attack modes are endlessly developed. For example, zero-day attacks (Zero-day), which refers to attacks that are exploited immediately after being discovered, that use security holes without patches to make very damaging network attacks on systems or software applications. Depth models require retraining in the face of detection of new attacks, and sample requirements are large and time consuming. However, it is often difficult for security authorities to obtain enough instances of attacks to provide for model training in a short period of time. This leads to the problem that the model cannot be trained due to an insufficient number of samples.
Disclosure of Invention
Based on this, it is necessary to provide a training method for an intrusion detection model against the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient data of the attack sample. The method utilizes limited samples to train out the classifier with good generalization capability, and realizes the rapid learning and detection of new attack samples.
A method of training an intrusion detection model, comprising:
a sample data set is acquired and a sample data set is acquired,
a classification model is established and a classification model is established,
the classification Model was trained by MAML (Model-modeling Meta-Learning) based Meta-training method.
The detection method based on the deep neural network and the meta-learning training thought can better solve the problem that the model cannot be trained due to insufficient attack sample data.
In one embodiment, the classification model is a multi-channel CNN model.
In one embodiment, the multi-channel CNN model comprises:
an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional max-pooling layer and a Dropout layer,
and a splicing layer for connecting the local features extracted from the different channels to form a new feature vector,
and a full-connection layer and an output layer are sequentially arranged behind the splicing layer.
In one of the embodiments of the present invention,
the probability distribution of labels y in the output layer is calculated by Softmax activation function.
In one embodiment, the sample data set comprises a meta-training set Dmata-train comprising a sample set and a query set, and a meta-testing set Dmata-test comprising a support set and a test set,
after training the classification model, entering a meta-test stage, wherein the meta-test stage comprises a fine tuning stage and a verification stage,
the fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are usedAnd sample data on the support set to fine tune model parameters, as shown in the following formula,
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,representing the initial parameter +.>Training loss values of the model of (c) on task Ti,
the verification phase includes: after the trimming stage, a result is obtainedParameterized new model->The new model is +.>Evaluation was performed and averaged to avoid accidental results.
In one embodiment, the training the classification model by the MAML-based meta-training method specifically includes: training is based on dual gradient updates, including internal and external updates,
in the internal update phase, training loss values at each task Ti are first calculated using the sample set data SiThe local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:wherein α is the learning rate shared between different tasks in the internal update step, +.>Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>Is a weak supervision model with preference(s),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task to the initial model, specifically, a gradient updating weight wi is set for each task Ti, and the updating operation of the weight value is as follows:
wherein ,representing oneTotal loss value after several iterative operations, +.>Represents the weighted learning rate, t represents the number of iterations,
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
then, obtaining the parameters after local update through query set trainingAnd obtaining a loss value using the query set corresponding to each task Ti>And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:
after multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained。/>
An intrusion detection identification method, comprising:
acquiring intrusion data to be identified;
and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
An intrusion detection identification device comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method.
A computer apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method.
Drawings
Fig. 1 is a flowchart of a training method of an intrusion detection model according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a multi-channel CNN model according to an embodiment of the present application.
FIG. 3 is a meta-training phase flow diagram of MAML-based network anomaly detection in an embodiment of the present application.
FIG. 4 is a graph of Loss during training of a model of an embodiment of the present application.
FIG. 5 is a graph comparing the run times of different models.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.
It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Based on this, it is necessary to provide a training method for an intrusion detection model against the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient data of the attack sample. The method utilizes limited samples to train out the classifier with good generalization capability, and realizes the rapid learning and detection of new attack samples.
As shown in fig. 1, an embodiment of the present application provides a training method of an intrusion detection model, where the method includes: and acquiring a sample data set, establishing a classification model, and training the classification model by using a meta-training method based on MAML.
In one embodiment, the classification model is a multi-channel CNN model. The application optimizes the multichannel CNN model. Specifically, as shown in fig. 2, the multi-channel CNN model includes: an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer (Conv 2D), an activation function selection LeakyReLU, a 2-dimensional maximization layer (MaxPooling 2D), and a Dropout layer, wherein the Dropout layer is set to 0.2. And a stitching layer for connecting local features extracted from the plurality of different channels, A new feature vector is formed, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. In fig. 2, there are two full connection layers, FC (32,8) and FC (1), the parameters in brackets are the output dimensions, and the splice layer is the connection in fig. 2.
The following examples describe the optimized multi-channel CNN model of the present application in detail.
First, let sample x be a one-dimensional vector containing d features, defined as follows:
where ci represents the ith feature of the sample. To accommodate the input rules of two-dimensional convolution, the dimensions of all samples need to be reshaped to 1×dx1, representing height, width and channel number, respectively;
a Block is defined for each channel, each Block comprises a convolution layer with different convolution kernel sizes, the network Block comprises a plurality of parallel blocks, input data are respectively input into the blocks, feature detection is carried out at different positions of the input data, and local features are extracted from different spatial channels of the multi-channel vector. According to experimental study, three parallel convolution layer training is used, the window sizes of the convolution kernels are set to 1×3, 1×4 and 1×5, and the step size is 1×1, as shown in the following formula:
where d is the dimension of the input x, c is the characteristic of x, and wj and bj represent the weight and bias of the offset matrix in the j-th channel convolution operation, respectively. kj represents the convolution kernel size. σ is an activation function that selects a LeakyReLU to accelerate learning convergence by mapping nonlinearities into the data. Unlike ReLU, leakyReLU can avoid overfitting and solve the problem of dead ReLU by assigning a non-zero slope to all negative values, i.e., some neurons in the network may never be updated. After three separate convolution operations, to reduce the netThe complexity of the complex will use a max-pooling layer to connect the outputs of the convolution layers, see in particular the following formula:。
the largest pooling layer can filter out the characteristic with weaker correlation by downsampling the characteristic diagram of the upper layer, and reserve the information with strongest correlation for the next layer, thereby effectively reducing the overfitting.
Next, in the stitching layer, the local features extracted from the three different channels are concatenated to form a new feature vector. The following formula is specifically shown:
where C represents a splice operation and F represents a flush operation, the data dimension is adjusted to one dimension to accommodate the input of the fully connected layer. The fully connected layer makes an optimal decision in combination with the extracted features, wherein the fully connected layer comprises two hidden layers, each hidden layer comprises 32 and 8 neurons, and the LeakyReLU activation function is used for enhancing the learning ability of the network, so that the model carries out global learning from the feature map space. The probability distribution of the tag y in the output layer is calculated by Softmax activation function:
where yi represents the output ith tag value. In the experimental setting, k=2. The details of the parameter settings of the network model are shown in table 1.
Table 1 parameter setting table
The multichannel CNN model is a neural network specially designed for small sample learning, and the training mode is different from the traditional supervised learning. Instead of simply dividing the entire dataset into a training set and a testing set, a meta-training set comprising a plurality of tasks is generated based on the source dataset such that each task comprises a sample set and a query set for modeling a meta-testing set comprising a support set and a testing set. The following illustrates how a small sample task is generated from the original dataset.
Given a data set comprising normal samples and N attack type samples:
wherein ,,/>0 represents a normal type and the others represent N different types of attacks. Thus, a given dataset is divided into n+1 subsets by tag class:
wherein ,refers to the set of samples (xi, yi) for all yi=t. Meta-learning means that the neural network has to handle tasks that have never been considered, and therefore needs to select an attack class to simulate a small sample scenario of a new attack in real life. For convenience of description, attack class N is selected here as a new network attack class, and is excluded during training; the remaining N-1 attack categories are known attacks for training.
First, randomly select an attack sample setAs an attack data source in the task set, wherein i belongs toNext, from the normal sample set +.>And attack sample set->The K samples are randomly sampled respectively to form the task set. The specific formula is as follows:
wherein ,representing the generation of a random value +_>Representing +.>K samples at random. Query set->Is a sampling step and ofThe sample set is the same, including H normal and attack class r samples. See in particular the following formula.
Wherein S andrepresenting the sample set and the query set in each task, respectively, and ensuring that they do not contain duplicate samples, i.e +.>. Each task that is ultimately generated includes 2K samples for training and 2H samples for verification. This process is repeated n times to construct n task sets for training. The task sampling steps of the meta-test set are the same as those of the meta-training set, and are represented by a support set P and a test set T, respectively, with the difference that the attack sample is from a specific subset +.>M times are selected and repeated. Thus, a total of n+m tasks are ultimately generated, where n tasks are the meta-training set and m tasks are the meta-test set. The sample set in each task contains 2k×n samples, the query set contains 2h×n samples, the number of samples in the support set is 2k×m, and the number of samples in the test set is 2h×m.
After the task set is generated, the task set is input into the optimized multichannel CNN network for training. Unlike the training approach of traditional supervised learning, the small sample classification process performed using the MAML framework of the present application requires two phases: meta trainingStage and meta-test stage. The basic idea is to try to initialize the parameterized multichannel model from random θAnd a parameter found in the distribution of the specific task that does not necessarily have the best performance for the different categories of data provided by the meta-training phase, but can be quickly adapted to new tasks that contain unknown attacks.
In the meta-training phase, it is based on dual gradient update training, which comprises two modules: the internal update module and the external update module are specifically implemented as shown in fig. 3.
In the internal update phase, training loss values at each task Ti are first calculated using the sample set data SiThe local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:wherein α is the learning rate shared between different tasks in the internal update step, +.>Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>Has a preferred weak supervision model, and has good detection performance on specific attacks in corresponding tasks.
In the external update phase, a weighted gradient update mechanism is used to minimize the deviation of each specific task from the initial model, specifically, a gradient update weight wi is set for each task Ti, and the update goal of the weight is to set the value of wi to an optimal value that minimizes the target value in the next iteration t. Local optimization of the model is avoided through automatic learning of the weights, so that overfitting is relieved, and model convergence is promoted to be more stable. The updating operation of the weight value is as follows:
wherein ,representing the total loss value after one iteration, < + >>And represents a weighted learning rate, and t represents the number of iterations.
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
Then, obtaining the parameters after local update through query set trainingAnd obtaining a loss value using the query set corresponding to each task Ti>And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:
After multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained。
In the meta-test stage, in order to avoid accidental situations, m tasks are randomly sampled for verifying the generalization capability of the model.
Specifically, the sample data set comprises a meta-training set Dseta-train and a meta-testing set Dseta-test, the meta-training set Dseta-train comprises a sample set and a query set, the meta-testing set Dseta-test comprises a support set and a test set, after the classification model is trained, a meta-testing stage is entered, and the meta-testing stage comprises a fine tuning stage and a verification stage.
The fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are usedAnd fine tuning the model parameters by sample data on the support set, wherein the fine tuning aims to ensure the detection performance of the model on the attack type by executing a few iteration steps and a small amount of attack type samples which never appear, and rapidly adapt to new tasks. The specific implementation is shown in the following formula,
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,representing the initial parameter +.>Training loss values for models of (a) on task Ti.
The verification phase includes: after the trimming stage, a result is obtainedParameterized new model->The new model is +.>Evaluation was performed and averaged to avoid accidental results.
The above-described methods of the present application are specifically evaluated by experiments as follows.
The first part is the experimental setup and hyper-parameters, which provide the experimental setup and hyper-parameters, performance index, and simulation environment. The second part is experimental performance evaluation and analysis, the method is abbreviated as MCCML, the effectiveness of each component is verified by comparing the method with a reference method, and the experimental result is analyzed in detail to verify the performance of the method. The hardware environment code implementation framework used for a specific experiment is Pytorch.
In particular, experimental setup and hyper-parameters include the following.
Through rules of thumb and a number of experiments, the optimal superparameters used by the models of the present application are listed in table 2. External update
A global optimization of the model is performed, so the experimental set-up beta value is greater than the alpha value. In the training phase, the number of attack samples K in each task is set to 5. However, in order to avoid the contingency of the model test stage result, the number of attack samples H in each task is set to 15 here. Furthermore, after the forward propagation process is completed, the backward propagation process of the small sample training is similar to conventional supervised learning. Since the set small sample task of anomaly detection is based on two classification problems of supervised learning, the problem of data imbalance does not exist. Thus, the loss function used in the training process is a binary cross entropy function. To better train the proposed model, the experiment updates the network parameters based on Adam optimization method of random gradient descent (Stochastic Gradient Descent, SGD).
TABLE 2 super parameter settings
Existing common datasets are manually generated in a specific environment containing many normal and abnormal samples, and are not applicable to small sample problems. For small sample learning in network intrusion detection, a task set needs to be reconstructed according to the attack type tags. Thus, using the existing public data set CICIDS2017 as a data source, a small portion of the samples are extracted from it, packaged into tasks, and multiple task sets are reconstructed, including normal and specific attack samples required for the experiment. Finally, five most typical attacks (DDoS, bruteForce, portscan, bot, web) in the CICIDS2017 dataset are selected for experiments. Furthermore, data preprocessing is an essential step before training the model, and thus it is necessary to perform preprocessing operations on these data. As shown in table 3, a total of 5 groups of experiments were included, each group of experiments selected one type of attack to simulate detection of a true unknown sample attack, and three from the remaining four types of attacks were selected for training, so there were a total of 4 parallel experiments per group. Each group of experiments is repeated for a plurality of times, and an average value is taken as a final evaluation result, so that the model evaluation result is as accurate as possible.
TABLE 3 experimental grouping situation
The experimental performance evaluation and analysis are as follows.
The performance of the proposed MAML-based new attack intrusion detection method will be verified. The setting of the number of iterations may be obtained by observing the change in training loss. Fig. 4 shows a Loss plot of the model over 100 iterations. As can be seen from the figure, with the constant training of the neural network, the loss function achieved a rapid convergence in the first few iterations, remaining at a relatively stable level after 60 iterations, with slight oscillations. Thus, the number of iterations (Eposide) is set to 100.
To evaluate the performance of the method proposed in this application, MCCML, and its ability to fit and generalize, it is compared to a reference classifier that is currently in widespread use, including traditional machine learning algorithms: k Nearest Neighbor (KNN), random Forest (RF); integrated learning algorithm: adaboost, bagging algorithm (Bootstrap aggregating, bagging), and gradient boost decision tree (GradientBoosting Decision Tree, GBDT). In addition, the benchmark method also comprises experimental comparison of some classical deep learning algorithms: MLP, multichannel CNN (same as the infrastructure in MCCML, trained using conventional supervised learning training methods). All the model methods are based on the same reference data set for experiments so as to realize fair comparison of the detection performance of the new tasks.
Table 4 lists the performance of the methods and benchmarking methods presented herein in identifying various unknown attack categories, including accuracy, recall, and F1 metrics. The bold portion is the best detection result for each test attack category. The last three columns in table 4 can be considered as a set of ablation experiments, demonstrating the effectiveness of each component in the model by performing comparative experiments on three components of the multi-channel CNN, meta-learning framework, and weighted gradient update. As can be seen from table 4: (1) Compared with a full-connection layer method, the multichannel convolution method improves each index by 3% on average; (2) Compared with the traditional network model training mode, the meta learning training for small sample learning provided by the application has the advantage that the overall performance is improved by 6% to 7%; (3) For small sample scenes, some shallow learning methods are even better than deep learning because deep learning relies on a large number of sample sets, too little training data can lead to over-fitting, and the performance is poor; (4) The average gradient update rules of MAML may result in the initial model being too biased towards certain specific tasks that exist and not accommodating new tasks. Weighting gradient updates can make the model more extensive, reducing the problem of the model performing too much on certain specific tasks. In summary, compared to the traditional machine learning or deep neural network, the method MCCML provided by the present application provides a better detection effect, which is generally superior to the reference method in all indexes, and the worst detection result is comparable to the best result in the reference method.
TABLE 4 Table 4
To highlight the training efficiency of the proposed model, fig. 5 provides a run-time comparison of each iteration of the different models. Experimental results show that the calculation speed of the method provided by the application is obviously faster than that of a pure deep learning method. Time consumption is one of the disadvantages of deep learning, and faster detection efficiency and higher performance can be achieved through training by meta learning ideas. The run time of each iteration of the method of the present application reaches 0.652s, which is also comparable to the training efficiency of machine learning. Since small sample learning is a relatively new topic in the field of network intrusion detection, little correlation work is available for comparison, nor is there a reference sample set suitable for testing. Therefore, the application reconstructs a detection task set special for small sample learning by using the CICIDS2017 open source data set, and selects a plurality of related researches by using the CICIDS2017 data set to carry out reference comparison experiments. The abnormal flow rate is judged to be normal flow rate much more dangerous than the normal flow rate is judged to be abnormal flow rate. The proposed algorithm MCCML is compared with the Siamese, AE-CGAN-RF and ANID methods for recall that is the most interesting for network intrusion prevention systems, as shown in table 5.
TABLE 5
It should be mentioned that not all reference models use the same data set size. Among them, AE-CGAN-RF and ANID are not small sample detection methods, and they all require a large number of samples to train. Experimental results show that the MCCML method can obtain competitive performance in a new task containing unknown attack, has high detection rate on a new attack sample, and is superior to all other reference detection methods, wherein the average detection rate reaches 95.22%. In addition, compared with a similar small sample method Siamese, MAML can be seen to be superior to the Siamese network model in the field of network anomaly detection.
On the basis of the above, the embodiment of the application also provides an intrusion detection and identification method, which comprises the following steps:
acquiring intrusion data to be identified; and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
On the basis of the above, the embodiment of the present application further provides an intrusion detection and identification device, including:
the data processing module is used for calling an intrusion detection model obtained by adopting a training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
On the basis of the above, the embodiment of the application also provides a computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the executable instruction causes a processor to execute the operation corresponding to the method.
On the basis of the above, embodiments of the present application further provide a computer apparatus, including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.
Claims (4)
1. An intrusion detection recognition method is characterized by comprising a training method of an intrusion detection model, wherein the training method of the intrusion detection model comprises the following steps:
a sample dataset is acquired, a classification model is built, the sample dataset is from a CIDS2017 dataset,
training a classification model by using a meta-training method based on MAML, wherein the classification model is a multichannel CNN model,
the multi-channel CNN model includes:
an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional max-pooling layer and a Dropout layer,
and a splicing layer for connecting the local features extracted from the different channels to form a new feature vector,
a full connection layer and an output layer are sequentially arranged behind the splicing layer, the probability distribution of the label y in the output layer is calculated through a Softmax activation function,
the sample data set comprises a meta-training set Deta-train and a meta-test set Deta-test, the meta-training set Deta-train comprises a sample set and a query set, the meta-test set Deta-test comprises a support set and a test set,
after training the classification model, entering a meta-test stage, wherein the meta-test stage comprises a fine tuning stage and a verification stage,
the fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are usedAnd sample data on the support set to fine tune model parameters, as shown in the following formula,
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,representing the initial parameter +.>Training loss values of the model of (a) on task Ti, the verification phase comprising: after the trimming phase, a free radical is obtained>Parameterized new model->The new model is +.>Evaluation is performed, and an average is taken to avoid accidental results,
the training of the classification model by the meta training method based on MAML specifically comprises the following steps: training is based on dual gradient updates, including internal and external updates,
in the internal update phase, training loss values at each task Ti are first calculated using the sample set data SiThe local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:wherein α is the learning rate shared between different tasks in the internal update step, +.>Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>Is a weak supervision model with preference(s),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task to the initial model, specifically, a gradient updating weight wi is set for each task Ti, and the updating operation of the weight value is as follows:
wherein ,representing the total loss value after one iteration, < + >>Represents the weighted learning rate, t represents the number of iterations,
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
then, obtaining the parameters after local update through query set trainingAnd obtaining a loss value using the query set corresponding to each task Ti>And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows: />Beta represents the learning rate of the external update,
after multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained,
Acquiring intrusion data to be identified;
and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
2. An intrusion detection and identification device, comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the method of claim 1 and processing the intrusion data to be identified to obtain a processing result.
3. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method of claim 1.
4. A computer apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, and the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method as claimed in claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546247.4A CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211546247.4A CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115563610A CN115563610A (en) | 2023-01-03 |
CN115563610B true CN115563610B (en) | 2023-05-30 |
Family
ID=84770287
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211546247.4A Active CN115563610B (en) | 2022-12-05 | 2022-12-05 | Training method, recognition method and device for intrusion detection model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563610B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115618353B (en) * | 2022-10-21 | 2024-01-23 | 北京珞安科技有限责任公司 | Industrial production safety identification system and method |
CN116389175B (en) * | 2023-06-07 | 2023-08-22 | 鹏城实验室 | Flow data detection method, training method, system, equipment and medium |
CN116821907B (en) * | 2023-06-29 | 2024-02-02 | 哈尔滨工业大学 | Drop-MAML-based small sample learning intrusion detection method |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110365659B (en) * | 2019-06-26 | 2020-08-04 | 浙江大学 | Construction method of network intrusion detection data set in small sample scene |
CN110808945B (en) * | 2019-09-11 | 2020-07-28 | 浙江大学 | Network intrusion detection method in small sample scene based on meta-learning |
CN113037730B (en) * | 2021-02-27 | 2023-06-20 | 中国人民解放军战略支援部队信息工程大学 | Network encryption traffic classification method and system based on multi-feature learning |
-
2022
- 2022-12-05 CN CN202211546247.4A patent/CN115563610B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115563610A (en) | 2023-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115563610B (en) | Training method, recognition method and device for intrusion detection model | |
CN113282759A (en) | Network security knowledge graph generation method based on threat information | |
CN106648654A (en) | Data sensing-based Spark configuration parameter automatic optimization method | |
CN113435509B (en) | Small sample scene classification and identification method and system based on meta-learning | |
Wang et al. | Filter pruning with a feature map entropy importance criterion for convolution neural networks compressing | |
Costa et al. | Ida 2016 industrial challenge: Using machine learning for predicting failures | |
Usman et al. | Filter-based multi-objective feature selection using NSGA III and cuckoo optimization algorithm | |
US20220100867A1 (en) | Automated evaluation of machine learning models | |
CN113541985B (en) | Internet of things fault diagnosis method, model training method and related devices | |
Jia et al. | An effective imbalanced JPEG steganalysis scheme based on adaptive cost-sensitive feature learning | |
Almazini et al. | Heuristic Initialization Using Grey Wolf Optimizer Algorithm for Feature Selection in Intrusion Detection | |
KR20190105147A (en) | Data clustering method using firefly algorithm and the system thereof | |
US11295229B1 (en) | Scalable generation of multidimensional features for machine learning | |
Letteri et al. | Dataset Optimization Strategies for MalwareTraffic Detection | |
CN117134958A (en) | Information processing method and system for network technology service | |
Ding et al. | Efficient model-based collaborative filtering with fast adaptive PCA | |
US20230041338A1 (en) | Graph data processing method, device, and computer program product | |
CN115758462A (en) | Method, device, processor and computer readable storage medium for realizing sensitive data identification in trusted environment | |
US20220172105A1 (en) | Efficient and scalable computation of global feature importance explanations | |
CN114528906A (en) | Fault diagnosis method, device, equipment and medium for rotary machine | |
CN113934813A (en) | Method, system and equipment for dividing sample data and readable storage medium | |
Spasov et al. | Dynamic neural network channel execution for efficient training | |
CN113822317A (en) | Post-processing output data of a classifier | |
CN113162914B (en) | Intrusion detection method and system based on Taylor neural network | |
US20220405599A1 (en) | Automated design of architectures of artificial neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |