CN115563610B - Training method, recognition method and device for intrusion detection model - Google Patents

Training method, recognition method and device for intrusion detection model Download PDF

Info

Publication number
CN115563610B
CN115563610B CN202211546247.4A CN202211546247A CN115563610B CN 115563610 B CN115563610 B CN 115563610B CN 202211546247 A CN202211546247 A CN 202211546247A CN 115563610 B CN115563610 B CN 115563610B
Authority
CN
China
Prior art keywords
model
training
layer
task
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211546247.4A
Other languages
Chinese (zh)
Other versions
CN115563610A (en
Inventor
左严
杨萍萍
王正荣
王祥伟
汤斌
包寅杰
贾俊铖
胡梦娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu New Hope Technology Co ltd
Original Assignee
Jiangsu New Hope Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu New Hope Technology Co ltd filed Critical Jiangsu New Hope Technology Co ltd
Priority to CN202211546247.4A priority Critical patent/CN115563610B/en
Publication of CN115563610A publication Critical patent/CN115563610A/en
Application granted granted Critical
Publication of CN115563610B publication Critical patent/CN115563610B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a training method, an identification method and a device of an intrusion detection model. The method specifically comprises the following steps: and acquiring a sample data set, establishing a classification model, and training the classification model by using a meta-training method based on MAML. The classification model is a multi-channel CNN model. The multi-channel CNN model includes: the system comprises an input layer, a plurality of channels, a splicing layer and a full-connection layer, wherein each channel defines a Block, each Block comprises a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional maximum pooling layer and a Dropout layer, the splicing layer is used for connecting local features extracted from a plurality of different channels to form a new feature vector, and the full-connection layer and the output layer are sequentially arranged behind the splicing layer. The detection method based on the deep neural network and the meta-learning training thought can better solve the problem that the model cannot be trained due to insufficient attack sample data.

Description

Training method, recognition method and device for intrusion detection model
Technical Field
The present invention relates to the field of intrusion detection, and in particular, to a training method, an identification method, and an apparatus for an intrusion detection model.
Background
For certain specific types of attacks, most deep learning methods can accurately identify the type of network attack that was previously trained, provided that massive amounts of data and sufficient computing resources are provided. However, the current internet environment is changing, and new attack modes are endlessly developed. For example, zero-day attacks (Zero-day), which refers to attacks that are exploited immediately after being discovered, that use security holes without patches to make very damaging network attacks on systems or software applications. Depth models require retraining in the face of detection of new attacks, and sample requirements are large and time consuming. However, it is often difficult for security authorities to obtain enough instances of attacks to provide for model training in a short period of time. This leads to the problem that the model cannot be trained due to an insufficient number of samples.
Disclosure of Invention
Based on this, it is necessary to provide a training method for an intrusion detection model against the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient data of the attack sample. The method utilizes limited samples to train out the classifier with good generalization capability, and realizes the rapid learning and detection of new attack samples.
A method of training an intrusion detection model, comprising:
a sample data set is acquired and a sample data set is acquired,
a classification model is established and a classification model is established,
the classification Model was trained by MAML (Model-modeling Meta-Learning) based Meta-training method.
The detection method based on the deep neural network and the meta-learning training thought can better solve the problem that the model cannot be trained due to insufficient attack sample data.
In one embodiment, the classification model is a multi-channel CNN model.
In one embodiment, the multi-channel CNN model comprises:
an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional max-pooling layer and a Dropout layer,
and a splicing layer for connecting the local features extracted from the different channels to form a new feature vector,
and a full-connection layer and an output layer are sequentially arranged behind the splicing layer.
In one of the embodiments of the present invention,
the probability distribution of labels y in the output layer is calculated by Softmax activation function.
In one embodiment, the sample data set comprises a meta-training set Dmata-train comprising a sample set and a query set, and a meta-testing set Dmata-test comprising a support set and a test set,
after training the classification model, entering a meta-test stage, wherein the meta-test stage comprises a fine tuning stage and a verification stage,
the fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are used
Figure 851147DEST_PATH_IMAGE001
And sample data on the support set to fine tune model parameters, as shown in the following formula,
Figure 418526DEST_PATH_IMAGE002
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,
Figure 217854DEST_PATH_IMAGE003
representing the initial parameter +.>
Figure 741239DEST_PATH_IMAGE001
Training loss values of the model of (c) on task Ti,
the verification phase includes: after the trimming stage, a result is obtained
Figure 978274DEST_PATH_IMAGE004
Parameterized new model->
Figure 16637DEST_PATH_IMAGE005
The new model is +.>
Figure 370389DEST_PATH_IMAGE005
Evaluation was performed and averaged to avoid accidental results.
In one embodiment, the training the classification model by the MAML-based meta-training method specifically includes: training is based on dual gradient updates, including internal and external updates,
in the internal update phase, training loss values at each task Ti are first calculated using the sample set data Si
Figure 13860DEST_PATH_IMAGE006
The local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:
Figure 411343DEST_PATH_IMAGE007
wherein α is the learning rate shared between different tasks in the internal update step, +.>
Figure 953314DEST_PATH_IMAGE008
Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>
Figure 32129DEST_PATH_IMAGE009
Is a weak supervision model with preference(s),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task to the initial model, specifically, a gradient updating weight wi is set for each task Ti, and the updating operation of the weight value is as follows:
Figure 592423DEST_PATH_IMAGE010
wherein ,
Figure 911540DEST_PATH_IMAGE011
representing oneTotal loss value after several iterative operations, +.>
Figure 127758DEST_PATH_IMAGE012
Represents the weighted learning rate, t represents the number of iterations,
furthermore, these weights need to satisfy the condition of weight normalization, i.e
Figure 72580DEST_PATH_IMAGE013
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
Figure 238113DEST_PATH_IMAGE014
then, obtaining the parameters after local update through query set training
Figure 180661DEST_PATH_IMAGE015
And obtaining a loss value using the query set corresponding to each task Ti>
Figure 680913DEST_PATH_IMAGE016
And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:
Figure 914579DEST_PATH_IMAGE017
beta represents the learning rate of the external update,
after multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained
Figure 449466DEST_PATH_IMAGE018
。/>
An intrusion detection identification method, comprising:
acquiring intrusion data to be identified;
and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
An intrusion detection identification device comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method.
A computer apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method.
Drawings
Fig. 1 is a flowchart of a training method of an intrusion detection model according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a multi-channel CNN model according to an embodiment of the present application.
FIG. 3 is a meta-training phase flow diagram of MAML-based network anomaly detection in an embodiment of the present application.
FIG. 4 is a graph of Loss during training of a model of an embodiment of the present application.
FIG. 5 is a graph comparing the run times of different models.
Detailed Description
In order that the above objects, features and advantages of the invention will be readily understood, a more particular description of the invention will be rendered by reference to the appended drawings. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be embodied in many other forms than described herein and similarly modified by those skilled in the art without departing from the spirit of the invention, whereby the invention is not limited to the specific embodiments disclosed below.
It will be understood that when an element is referred to as being "fixed to" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.
Based on this, it is necessary to provide a training method for an intrusion detection model against the existing problems. The method can better solve the problem that the model cannot be trained due to insufficient data of the attack sample. The method utilizes limited samples to train out the classifier with good generalization capability, and realizes the rapid learning and detection of new attack samples.
As shown in fig. 1, an embodiment of the present application provides a training method of an intrusion detection model, where the method includes: and acquiring a sample data set, establishing a classification model, and training the classification model by using a meta-training method based on MAML.
In one embodiment, the classification model is a multi-channel CNN model. The application optimizes the multichannel CNN model. Specifically, as shown in fig. 2, the multi-channel CNN model includes: an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer (Conv 2D), an activation function selection LeakyReLU, a 2-dimensional maximization layer (MaxPooling 2D), and a Dropout layer, wherein the Dropout layer is set to 0.2. And a stitching layer for connecting local features extracted from the plurality of different channels, A new feature vector is formed, and a full connection layer and an output layer are sequentially arranged behind the splicing layer. In fig. 2, there are two full connection layers, FC (32,8) and FC (1), the parameters in brackets are the output dimensions, and the splice layer is the connection in fig. 2.
The following examples describe the optimized multi-channel CNN model of the present application in detail.
First, let sample x be a one-dimensional vector containing d features, defined as follows:
Figure 110385DEST_PATH_IMAGE019
where ci represents the ith feature of the sample. To accommodate the input rules of two-dimensional convolution, the dimensions of all samples need to be reshaped to 1×dx1, representing height, width and channel number, respectively;
a Block is defined for each channel, each Block comprises a convolution layer with different convolution kernel sizes, the network Block comprises a plurality of parallel blocks, input data are respectively input into the blocks, feature detection is carried out at different positions of the input data, and local features are extracted from different spatial channels of the multi-channel vector. According to experimental study, three parallel convolution layer training is used, the window sizes of the convolution kernels are set to 1×3, 1×4 and 1×5, and the step size is 1×1, as shown in the following formula:
Figure 35616DEST_PATH_IMAGE020
where d is the dimension of the input x, c is the characteristic of x, and wj and bj represent the weight and bias of the offset matrix in the j-th channel convolution operation, respectively. kj represents the convolution kernel size. σ is an activation function that selects a LeakyReLU to accelerate learning convergence by mapping nonlinearities into the data. Unlike ReLU, leakyReLU can avoid overfitting and solve the problem of dead ReLU by assigning a non-zero slope to all negative values, i.e., some neurons in the network may never be updated. After three separate convolution operations, to reduce the netThe complexity of the complex will use a max-pooling layer to connect the outputs of the convolution layers, see in particular the following formula:
Figure 322241DEST_PATH_IMAGE021
the largest pooling layer can filter out the characteristic with weaker correlation by downsampling the characteristic diagram of the upper layer, and reserve the information with strongest correlation for the next layer, thereby effectively reducing the overfitting.
Next, in the stitching layer, the local features extracted from the three different channels are concatenated to form a new feature vector. The following formula is specifically shown:
Figure 462366DEST_PATH_IMAGE022
where C represents a splice operation and F represents a flush operation, the data dimension is adjusted to one dimension to accommodate the input of the fully connected layer. The fully connected layer makes an optimal decision in combination with the extracted features, wherein the fully connected layer comprises two hidden layers, each hidden layer comprises 32 and 8 neurons, and the LeakyReLU activation function is used for enhancing the learning ability of the network, so that the model carries out global learning from the feature map space. The probability distribution of the tag y in the output layer is calculated by Softmax activation function:
Figure 746717DEST_PATH_IMAGE023
where yi represents the output ith tag value. In the experimental setting, k=2. The details of the parameter settings of the network model are shown in table 1.
Table 1 parameter setting table
Figure 221561DEST_PATH_IMAGE024
The multichannel CNN model is a neural network specially designed for small sample learning, and the training mode is different from the traditional supervised learning. Instead of simply dividing the entire dataset into a training set and a testing set, a meta-training set comprising a plurality of tasks is generated based on the source dataset such that each task comprises a sample set and a query set for modeling a meta-testing set comprising a support set and a testing set. The following illustrates how a small sample task is generated from the original dataset.
Given a data set comprising normal samples and N attack type samples:
Figure 791170DEST_PATH_IMAGE025
wherein ,
Figure 972753DEST_PATH_IMAGE026
,/>
Figure 490322DEST_PATH_IMAGE027
0 represents a normal type and the others represent N different types of attacks. Thus, a given dataset is divided into n+1 subsets by tag class:
Figure 203194DEST_PATH_IMAGE028
wherein ,
Figure 831622DEST_PATH_IMAGE029
refers to the set of samples (xi, yi) for all yi=t. Meta-learning means that the neural network has to handle tasks that have never been considered, and therefore needs to select an attack class to simulate a small sample scenario of a new attack in real life. For convenience of description, attack class N is selected here as a new network attack class, and is excluded during training; the remaining N-1 attack categories are known attacks for training.
First, randomly select an attack sample set
Figure 680760DEST_PATH_IMAGE030
As an attack data source in the task set, wherein i belongs to
Figure 306913DEST_PATH_IMAGE031
Next, from the normal sample set +.>
Figure 756349DEST_PATH_IMAGE032
And attack sample set->
Figure 860572DEST_PATH_IMAGE033
The K samples are randomly sampled respectively to form the task set. The specific formula is as follows:
Figure 829796DEST_PATH_IMAGE034
Figure 689167DEST_PATH_IMAGE035
Figure 111053DEST_PATH_IMAGE036
Figure 346862DEST_PATH_IMAGE037
wherein ,
Figure 357543DEST_PATH_IMAGE038
representing the generation of a random value +_>
Figure 138548DEST_PATH_IMAGE039
Representing +.>
Figure 296997DEST_PATH_IMAGE040
K samples at random. Query set->
Figure 743022DEST_PATH_IMAGE041
Is a sampling step and ofThe sample set is the same, including H normal and attack class r samples. See in particular the following formula.
Figure 686839DEST_PATH_IMAGE042
。/>
Figure 888013DEST_PATH_IMAGE043
Figure 346807DEST_PATH_IMAGE044
Wherein S and
Figure 403713DEST_PATH_IMAGE045
representing the sample set and the query set in each task, respectively, and ensuring that they do not contain duplicate samples, i.e +.>
Figure 388986DEST_PATH_IMAGE046
. Each task that is ultimately generated includes 2K samples for training and 2H samples for verification. This process is repeated n times to construct n task sets for training. The task sampling steps of the meta-test set are the same as those of the meta-training set, and are represented by a support set P and a test set T, respectively, with the difference that the attack sample is from a specific subset +.>
Figure 761062DEST_PATH_IMAGE047
M times are selected and repeated. Thus, a total of n+m tasks are ultimately generated, where n tasks are the meta-training set and m tasks are the meta-test set. The sample set in each task contains 2k×n samples, the query set contains 2h×n samples, the number of samples in the support set is 2k×m, and the number of samples in the test set is 2h×m.
After the task set is generated, the task set is input into the optimized multichannel CNN network for training. Unlike the training approach of traditional supervised learning, the small sample classification process performed using the MAML framework of the present application requires two phases: meta trainingStage and meta-test stage. The basic idea is to try to initialize the parameterized multichannel model from random θ
Figure 831786DEST_PATH_IMAGE048
And a parameter found in the distribution of the specific task that does not necessarily have the best performance for the different categories of data provided by the meta-training phase, but can be quickly adapted to new tasks that contain unknown attacks.
In the meta-training phase, it is based on dual gradient update training, which comprises two modules: the internal update module and the external update module are specifically implemented as shown in fig. 3.
In the internal update phase, training loss values at each task Ti are first calculated using the sample set data Si
Figure 698242DEST_PATH_IMAGE006
The local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:
Figure 600339DEST_PATH_IMAGE007
wherein α is the learning rate shared between different tasks in the internal update step, +.>
Figure 80999DEST_PATH_IMAGE008
Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>
Figure 186489DEST_PATH_IMAGE009
Has a preferred weak supervision model, and has good detection performance on specific attacks in corresponding tasks.
In the external update phase, a weighted gradient update mechanism is used to minimize the deviation of each specific task from the initial model, specifically, a gradient update weight wi is set for each task Ti, and the update goal of the weight is to set the value of wi to an optimal value that minimizes the target value in the next iteration t. Local optimization of the model is avoided through automatic learning of the weights, so that overfitting is relieved, and model convergence is promoted to be more stable. The updating operation of the weight value is as follows:
Figure 105904DEST_PATH_IMAGE010
wherein ,
Figure 800190DEST_PATH_IMAGE011
representing the total loss value after one iteration, < + >>
Figure 264801DEST_PATH_IMAGE012
And represents a weighted learning rate, and t represents the number of iterations.
Furthermore, these weights need to satisfy the condition of weight normalization, i.e
Figure 372434DEST_PATH_IMAGE049
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
Figure 767643DEST_PATH_IMAGE050
where k is the number of tasks. />
Then, obtaining the parameters after local update through query set training
Figure 129486DEST_PATH_IMAGE015
And obtaining a loss value using the query set corresponding to each task Ti>
Figure 217527DEST_PATH_IMAGE051
And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows:
Figure 546877DEST_PATH_IMAGE052
beta tableThe learning rate of the external update is shown.
After multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained
Figure 293248DEST_PATH_IMAGE018
In the meta-test stage, in order to avoid accidental situations, m tasks are randomly sampled for verifying the generalization capability of the model.
Specifically, the sample data set comprises a meta-training set Dseta-train and a meta-testing set Dseta-test, the meta-training set Dseta-train comprises a sample set and a query set, the meta-testing set Dseta-test comprises a support set and a test set, after the classification model is trained, a meta-testing stage is entered, and the meta-testing stage comprises a fine tuning stage and a verification stage.
The fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are used
Figure 962126DEST_PATH_IMAGE053
And fine tuning the model parameters by sample data on the support set, wherein the fine tuning aims to ensure the detection performance of the model on the attack type by executing a few iteration steps and a small amount of attack type samples which never appear, and rapidly adapt to new tasks. The specific implementation is shown in the following formula,
Figure 283386DEST_PATH_IMAGE002
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,
Figure 585186DEST_PATH_IMAGE054
representing the initial parameter +.>
Figure 587777DEST_PATH_IMAGE055
Training loss values for models of (a) on task Ti.
The verification phase includes: after the trimming stage, a result is obtained
Figure 173479DEST_PATH_IMAGE056
Parameterized new model->
Figure 150793DEST_PATH_IMAGE005
The new model is +.>
Figure 392419DEST_PATH_IMAGE005
Evaluation was performed and averaged to avoid accidental results.
The above-described methods of the present application are specifically evaluated by experiments as follows.
The first part is the experimental setup and hyper-parameters, which provide the experimental setup and hyper-parameters, performance index, and simulation environment. The second part is experimental performance evaluation and analysis, the method is abbreviated as MCCML, the effectiveness of each component is verified by comparing the method with a reference method, and the experimental result is analyzed in detail to verify the performance of the method. The hardware environment code implementation framework used for a specific experiment is Pytorch.
In particular, experimental setup and hyper-parameters include the following.
Through rules of thumb and a number of experiments, the optimal superparameters used by the models of the present application are listed in table 2. External update
A global optimization of the model is performed, so the experimental set-up beta value is greater than the alpha value. In the training phase, the number of attack samples K in each task is set to 5. However, in order to avoid the contingency of the model test stage result, the number of attack samples H in each task is set to 15 here. Furthermore, after the forward propagation process is completed, the backward propagation process of the small sample training is similar to conventional supervised learning. Since the set small sample task of anomaly detection is based on two classification problems of supervised learning, the problem of data imbalance does not exist. Thus, the loss function used in the training process is a binary cross entropy function. To better train the proposed model, the experiment updates the network parameters based on Adam optimization method of random gradient descent (Stochastic Gradient Descent, SGD).
TABLE 2 super parameter settings
Figure 995439DEST_PATH_IMAGE057
Existing common datasets are manually generated in a specific environment containing many normal and abnormal samples, and are not applicable to small sample problems. For small sample learning in network intrusion detection, a task set needs to be reconstructed according to the attack type tags. Thus, using the existing public data set CICIDS2017 as a data source, a small portion of the samples are extracted from it, packaged into tasks, and multiple task sets are reconstructed, including normal and specific attack samples required for the experiment. Finally, five most typical attacks (DDoS, bruteForce, portscan, bot, web) in the CICIDS2017 dataset are selected for experiments. Furthermore, data preprocessing is an essential step before training the model, and thus it is necessary to perform preprocessing operations on these data. As shown in table 3, a total of 5 groups of experiments were included, each group of experiments selected one type of attack to simulate detection of a true unknown sample attack, and three from the remaining four types of attacks were selected for training, so there were a total of 4 parallel experiments per group. Each group of experiments is repeated for a plurality of times, and an average value is taken as a final evaluation result, so that the model evaluation result is as accurate as possible.
TABLE 3 experimental grouping situation
Figure 373330DEST_PATH_IMAGE058
The experimental performance evaluation and analysis are as follows.
The performance of the proposed MAML-based new attack intrusion detection method will be verified. The setting of the number of iterations may be obtained by observing the change in training loss. Fig. 4 shows a Loss plot of the model over 100 iterations. As can be seen from the figure, with the constant training of the neural network, the loss function achieved a rapid convergence in the first few iterations, remaining at a relatively stable level after 60 iterations, with slight oscillations. Thus, the number of iterations (Eposide) is set to 100.
To evaluate the performance of the method proposed in this application, MCCML, and its ability to fit and generalize, it is compared to a reference classifier that is currently in widespread use, including traditional machine learning algorithms: k Nearest Neighbor (KNN), random Forest (RF); integrated learning algorithm: adaboost, bagging algorithm (Bootstrap aggregating, bagging), and gradient boost decision tree (GradientBoosting Decision Tree, GBDT). In addition, the benchmark method also comprises experimental comparison of some classical deep learning algorithms: MLP, multichannel CNN (same as the infrastructure in MCCML, trained using conventional supervised learning training methods). All the model methods are based on the same reference data set for experiments so as to realize fair comparison of the detection performance of the new tasks.
Table 4 lists the performance of the methods and benchmarking methods presented herein in identifying various unknown attack categories, including accuracy, recall, and F1 metrics. The bold portion is the best detection result for each test attack category. The last three columns in table 4 can be considered as a set of ablation experiments, demonstrating the effectiveness of each component in the model by performing comparative experiments on three components of the multi-channel CNN, meta-learning framework, and weighted gradient update. As can be seen from table 4: (1) Compared with a full-connection layer method, the multichannel convolution method improves each index by 3% on average; (2) Compared with the traditional network model training mode, the meta learning training for small sample learning provided by the application has the advantage that the overall performance is improved by 6% to 7%; (3) For small sample scenes, some shallow learning methods are even better than deep learning because deep learning relies on a large number of sample sets, too little training data can lead to over-fitting, and the performance is poor; (4) The average gradient update rules of MAML may result in the initial model being too biased towards certain specific tasks that exist and not accommodating new tasks. Weighting gradient updates can make the model more extensive, reducing the problem of the model performing too much on certain specific tasks. In summary, compared to the traditional machine learning or deep neural network, the method MCCML provided by the present application provides a better detection effect, which is generally superior to the reference method in all indexes, and the worst detection result is comparable to the best result in the reference method.
TABLE 4 Table 4
Figure 804703DEST_PATH_IMAGE059
To highlight the training efficiency of the proposed model, fig. 5 provides a run-time comparison of each iteration of the different models. Experimental results show that the calculation speed of the method provided by the application is obviously faster than that of a pure deep learning method. Time consumption is one of the disadvantages of deep learning, and faster detection efficiency and higher performance can be achieved through training by meta learning ideas. The run time of each iteration of the method of the present application reaches 0.652s, which is also comparable to the training efficiency of machine learning. Since small sample learning is a relatively new topic in the field of network intrusion detection, little correlation work is available for comparison, nor is there a reference sample set suitable for testing. Therefore, the application reconstructs a detection task set special for small sample learning by using the CICIDS2017 open source data set, and selects a plurality of related researches by using the CICIDS2017 data set to carry out reference comparison experiments. The abnormal flow rate is judged to be normal flow rate much more dangerous than the normal flow rate is judged to be abnormal flow rate. The proposed algorithm MCCML is compared with the Siamese, AE-CGAN-RF and ANID methods for recall that is the most interesting for network intrusion prevention systems, as shown in table 5.
TABLE 5
Figure 330363DEST_PATH_IMAGE060
It should be mentioned that not all reference models use the same data set size. Among them, AE-CGAN-RF and ANID are not small sample detection methods, and they all require a large number of samples to train. Experimental results show that the MCCML method can obtain competitive performance in a new task containing unknown attack, has high detection rate on a new attack sample, and is superior to all other reference detection methods, wherein the average detection rate reaches 95.22%. In addition, compared with a similar small sample method Siamese, MAML can be seen to be superior to the Siamese network model in the field of network anomaly detection.
On the basis of the above, the embodiment of the application also provides an intrusion detection and identification method, which comprises the following steps:
acquiring intrusion data to be identified; and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
On the basis of the above, the embodiment of the present application further provides an intrusion detection and identification device, including:
the data processing module is used for calling an intrusion detection model obtained by adopting a training method of the intrusion detection model and processing the intrusion data to be identified so as to obtain a processing result.
On the basis of the above, the embodiment of the application also provides a computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the executable instruction causes a processor to execute the operation corresponding to the method.
On the basis of the above, embodiments of the present application further provide a computer apparatus, including: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (4)

1. An intrusion detection recognition method is characterized by comprising a training method of an intrusion detection model, wherein the training method of the intrusion detection model comprises the following steps:
a sample dataset is acquired, a classification model is built, the sample dataset is from a CIDS2017 dataset,
training a classification model by using a meta-training method based on MAML, wherein the classification model is a multichannel CNN model,
the multi-channel CNN model includes:
an input layer and a plurality of channels, each defining a Block, each Block comprising a two-dimensional convolution layer, an activation function selection LeakyReLU, a 2-dimensional max-pooling layer and a Dropout layer,
and a splicing layer for connecting the local features extracted from the different channels to form a new feature vector,
a full connection layer and an output layer are sequentially arranged behind the splicing layer, the probability distribution of the label y in the output layer is calculated through a Softmax activation function,
the sample data set comprises a meta-training set Deta-train and a meta-test set Deta-test, the meta-training set Deta-train comprises a sample set and a query set, the meta-test set Deta-test comprises a support set and a test set,
after training the classification model, entering a meta-test stage, wherein the meta-test stage comprises a fine tuning stage and a verification stage,
the fine tuning stage includes: when the model needs to be adapted to a new specific task, pre-trained model parameters are used
Figure QLYQS_1
And sample data on the support set to fine tune model parameters, as shown in the following formula,
Figure QLYQS_2
where Pi represents the support set of the ith task, alpha is the learning rate shared between the different tasks in the internal update step,
Figure QLYQS_3
representing the initial parameter +.>
Figure QLYQS_4
Training loss values of the model of (a) on task Ti, the verification phase comprising: after the trimming phase, a free radical is obtained>
Figure QLYQS_5
Parameterized new model->
Figure QLYQS_6
The new model is +.>
Figure QLYQS_7
Evaluation is performed, and an average is taken to avoid accidental results,
the training of the classification model by the meta training method based on MAML specifically comprises the following steps: training is based on dual gradient updates, including internal and external updates,
in the internal update phase, training loss values at each task Ti are first calculated using the sample set data Si
Figure QLYQS_8
The local parameter theta of each task Ti is optimally updated along the gradient descent direction, and the formula is as follows:
Figure QLYQS_9
wherein α is the learning rate shared between different tasks in the internal update step, +.>
Figure QLYQS_10
Training loss value of model with initial parameter theta on task Ti, and gradient updating initial parameter theta of internal model corresponding to task Ti by using the loss value to obtain updated parameter theta>
Figure QLYQS_11
Is a weak supervision model with preference(s),
in the external updating stage, a weighted gradient updating mechanism is adopted to minimize the deviation of each specific task to the initial model, specifically, a gradient updating weight wi is set for each task Ti, and the updating operation of the weight value is as follows:
Figure QLYQS_12
wherein ,
Figure QLYQS_13
representing the total loss value after one iteration, < + >>
Figure QLYQS_14
Represents the weighted learning rate, t represents the number of iterations,
furthermore, these weights need to satisfy the condition of weight normalization, i.e
Figure QLYQS_15
Therefore, the obtained weights need to be further normalized, which is specifically shown in the following formula:
Figure QLYQS_16
then, obtaining the parameters after local update through query set training
Figure QLYQS_17
And obtaining a loss value using the query set corresponding to each task Ti>
Figure QLYQS_18
And calculating the total loss of each batch, and updating the parameter theta of the global network, wherein the specific implementation is as follows: />
Figure QLYQS_19
Beta represents the learning rate of the external update,
after multiple iterations, the value of the loss function is continuously reduced, the network model gradually converges, and finally a trained model can be obtained
Figure QLYQS_20
Acquiring intrusion data to be identified;
and calling an intrusion detection model obtained by adopting the training method of the intrusion detection model, and processing the intrusion data to be identified to obtain a processing result.
2. An intrusion detection and identification device, comprising:
a data acquisition module and a data processing module,
the data acquisition module is used for acquiring intrusion data to be identified,
the data processing module is used for calling an intrusion detection model obtained by adopting the method of claim 1 and processing the intrusion data to be identified to obtain a processing result.
3. A computer storage medium having stored therein at least one executable instruction for causing a processor to perform operations corresponding to the method of claim 1.
4. A computer apparatus, comprising: the device comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface are communicated with each other through the communication bus, and the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method as claimed in claim 1.
CN202211546247.4A 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model Active CN115563610B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211546247.4A CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211546247.4A CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Publications (2)

Publication Number Publication Date
CN115563610A CN115563610A (en) 2023-01-03
CN115563610B true CN115563610B (en) 2023-05-30

Family

ID=84770287

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211546247.4A Active CN115563610B (en) 2022-12-05 2022-12-05 Training method, recognition method and device for intrusion detection model

Country Status (1)

Country Link
CN (1) CN115563610B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115618353B (en) * 2022-10-21 2024-01-23 北京珞安科技有限责任公司 Industrial production safety identification system and method
CN116389175B (en) * 2023-06-07 2023-08-22 鹏城实验室 Flow data detection method, training method, system, equipment and medium
CN116821907B (en) * 2023-06-29 2024-02-02 哈尔滨工业大学 Drop-MAML-based small sample learning intrusion detection method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110365659B (en) * 2019-06-26 2020-08-04 浙江大学 Construction method of network intrusion detection data set in small sample scene
CN110808945B (en) * 2019-09-11 2020-07-28 浙江大学 Network intrusion detection method in small sample scene based on meta-learning
CN113037730B (en) * 2021-02-27 2023-06-20 中国人民解放军战略支援部队信息工程大学 Network encryption traffic classification method and system based on multi-feature learning

Also Published As

Publication number Publication date
CN115563610A (en) 2023-01-03

Similar Documents

Publication Publication Date Title
CN115563610B (en) Training method, recognition method and device for intrusion detection model
CN113282759A (en) Network security knowledge graph generation method based on threat information
CN106648654A (en) Data sensing-based Spark configuration parameter automatic optimization method
CN113435509B (en) Small sample scene classification and identification method and system based on meta-learning
Wang et al. Filter pruning with a feature map entropy importance criterion for convolution neural networks compressing
Costa et al. Ida 2016 industrial challenge: Using machine learning for predicting failures
Usman et al. Filter-based multi-objective feature selection using NSGA III and cuckoo optimization algorithm
US20220100867A1 (en) Automated evaluation of machine learning models
CN113541985B (en) Internet of things fault diagnosis method, model training method and related devices
Jia et al. An effective imbalanced JPEG steganalysis scheme based on adaptive cost-sensitive feature learning
Almazini et al. Heuristic Initialization Using Grey Wolf Optimizer Algorithm for Feature Selection in Intrusion Detection
KR20190105147A (en) Data clustering method using firefly algorithm and the system thereof
US11295229B1 (en) Scalable generation of multidimensional features for machine learning
Letteri et al. Dataset Optimization Strategies for MalwareTraffic Detection
CN117134958A (en) Information processing method and system for network technology service
Ding et al. Efficient model-based collaborative filtering with fast adaptive PCA
US20230041338A1 (en) Graph data processing method, device, and computer program product
CN115758462A (en) Method, device, processor and computer readable storage medium for realizing sensitive data identification in trusted environment
US20220172105A1 (en) Efficient and scalable computation of global feature importance explanations
CN114528906A (en) Fault diagnosis method, device, equipment and medium for rotary machine
CN113934813A (en) Method, system and equipment for dividing sample data and readable storage medium
Spasov et al. Dynamic neural network channel execution for efficient training
CN113822317A (en) Post-processing output data of a classifier
CN113162914B (en) Intrusion detection method and system based on Taylor neural network
US20220405599A1 (en) Automated design of architectures of artificial neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant