CN117081831A

CN117081831A - Network intrusion detection method and system based on data generation and attention mechanism

Info

Publication number: CN117081831A
Application number: CN202311150569.1A
Authority: CN
Inventors: 行鸿彦; 倪志伟
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-11-17

Abstract

The invention discloses a network intrusion detection method and a system based on a data generation and attention mechanism, which relate to the technical field of networks and comprise the following steps: receiving a network traffic data set, and preprocessing the network traffic data set to obtain a preprocessed network traffic data set; inputting minority attack class samples in the attack class samples into a pre-established ECGAN model to obtain minority attack class samples of a preset type and a preset number, and merging the minority attack class samples with the network traffic samples to obtain a final training data set; inputting the final training data set into a pre-established neural network model to obtain high-dimensional space-time flow characteristics, and inputting the high-dimensional space-time flow characteristics into a preset classification network to obtain a classification result; and training a pre-established network intrusion detection model by taking a loss function as an evaluation standard and an optimization algorithm as an optimizer in combination with the classification result to obtain a trained network intrusion detection model, thereby realizing a network intrusion detection function.

Description

Network intrusion detection method and system based on data generation and attention mechanism

Technical Field

The invention relates to the technical field of networks, in particular to a network intrusion detection method and system based on a data generation and attention mechanism.

Background

With the rapid development of technologies such as 4G, 5G, internet of things and cloud computing, great convenience is brought to life of people, but network security problems with increased network attack events are brought along, and great threat and challenges are brought to social security. For example, in month 4 of 2020, the echeveria organization initiates a network attack on the subject of covd-19 against the chinese healthcare industry and institutions to steal related secrets of the chinese healthcare industry. Therefore, timely detection and blocking of network attacks is critical to network security. The network intrusion detection is used as an important component of network security, can actively detect abnormal traffic data possibly existing in a network, identify network attacks, provide real-time network protection and has important research significance.

In recent years, the traditional machine learning method is widely applied to the field of network intrusion detection, but the traditional shallow machine learning cannot automatically learn features, needs to manually extract features, has poor detection performance when facing high-dimensional large-scale network data traffic, and cannot meet the requirements of a novel network environment. And the deep learning can learn the characteristics autonomously, so that the intrusion detection performance is better. In addition, because of unbalance of flow data distribution in a real network environment, normal flow behavior is far higher than abnormal flow behavior, and detection performance of abnormal samples is poor based on a model trained by a data set with unbalanced data types. Aiming at the data class imbalance problem, the traditional methods solve the class imbalance problem by utilizing undersampling, oversampling, mixed sampling and other technologies, but the methods have the risks of overfitting and losing useful information.

The existing network intrusion detection research is singly aimed at the problem of poor performance of a classification model or the problem of unbalanced classification of network traffic data, but the two problems are not combined to be processed, and the generalization capability of the model is insufficient, so that unknown network attacks cannot be effectively detected.

Disclosure of Invention

To solve the above-mentioned deficiencies of the prior art, an object of the present invention is to provide a method and a system for network intrusion detection based on data generation and attention mechanisms.

The aim of the invention can be achieved by the following technical scheme: a network intrusion detection method based on data generation and attention mechanism includes the following steps:

the method comprises the steps of receiving a network traffic data set, and preprocessing the network traffic data set to obtain a preprocessed network traffic data set, wherein a network traffic sample is arranged in the preprocessed network traffic data set, and comprises a normal traffic sample and an attack sample;

inputting minority attack class samples in the attack class samples into a pre-established ECGAN model to obtain minority attack class samples of a preset type and a preset number, and combining the obtained minority attack class samples of the preset type and the preset number with the network traffic samples to obtain a final training data set;

inputting the final training data set into a pre-established CNN-BiLSTM-Attention neural network model to obtain high-dimensional space-time flow characteristics, and inputting the high-dimensional space-time flow characteristics into a preset classification network to obtain a classification result;

and combining the classification result, training a pre-established network intrusion detection model by taking a loss function Binary crossentropy as an evaluation standard and an optimization algorithm Adam as an optimizer to obtain a trained network intrusion detection model, thereby realizing the function of network intrusion detection.

Preferably, the process of preprocessing the network traffic data set includes:

standardizing and outlier processing are carried out on the network traffic data set: converting the symbol features into digital feature representations by single-hot encoding; and carrying out Min-Max Scaling on the network flow data set subjected to standardization and outlier processing to normalize the value to between 0 and 1.

Preferably, the network traffic data set includes a training data set and a test data set, wherein the network traffic samples in the training data set and the test data set include normal traffic samples and attack class samples, and the minority class attack class samples in the attack class samples are input into the pre-established ECGAN model, and are adopted as minority class attack class samples in the training data set.

Preferably, the feature types included in the network traffic data set are numerical features and symbolic features, the network traffic data set is subjected to standardization processing, and the symbolic features are converted into numerical feature representations based on a single-hot coding method:

based on the Min-Max Scaling method, the normalized network flow data set is normalized according to the following formula:

wherein x is the data corresponding to one of the numerical characteristics in the network flow data set, x _max For maximum value in the data corresponding to the numerical characteristic, x _min Is the minimum value, x in the corresponding data of the numerical characteristic ^* And representing the normalized numerical characteristic corresponding data.

Preferably, the ECGAN model includes a generator G, a discriminator D, and a classifier C.

Preferably, the loss function of the discriminator is defined as:

L _D (x,z)＝BCE(D(x),1)+BCE(D(G(z)),0)

wherein BCE is binary cross entropy, D is discriminator, G is generator, x is true mark data, z is a random vector;

the loss function of the generator is defined as:

L _G (z)＝BCE(D(G(z)),1)

for the generator, only the predictor of the discriminator needs to be input and tag 1 indicates that the predictor is correct;

for a classifier, the loss function is defined as:

L _C (x,y,z)＝CE(C(x),y)+λCE(C(G(z)),argmax(C(G(z)))＞t)

where y is the pseudo tag, λ is the unsupervised loss weight (antagonism weight), CE is the cross entropy loss, C is the classifier, and t is the pseudo tag threshold.

Preferably, the process of inputting the minority attack class samples in the attack class samples into the pre-established ECGAN model to obtain the minority attack class samples with the preset type and the preset number is as follows:

adopting a generator G, a discriminator D and a classifier C to perform iterative training of preset times on the ECGAN network model, and storing network parameters corresponding to the minimum value of the loss function in iteration as optimal data to generate the network model;

and generating data of minority attack class samples in the attack class samples by using the optimal data generation network model to obtain minority attack class samples of a preset type and a preset number.

Preferably, the one-dimensional numerical value characteristics of the network traffic samples in the final training data set are converted into two-dimensional numerical value characteristics and input into a CNN neural network, the spatial characteristics of the network traffic samples are extracted, then the spatial characteristics of the network traffic samples are integrated and input into a BiLSTM network through a full connection layer to extract the time characteristics of the network traffic samples, and finally the extracted high-dimensional space-time traffic characteristics are output;

and inputting the numerical characteristics of the network flow samples in the final training data set into an Attention network, obtaining the time-space fusion characteristics of the network flow samples in the training data set, carrying out Attention weighting, obtaining the output of a weighted result, and finally, carrying out two-class output by using a sigmoid function as an activation function, and outputting the class results respectively corresponding to the network flow data samples.

In a second aspect, to achieve the above object, the present invention discloses a network intrusion detection system based on a data generation and attention mechanism, comprising:

and a data preprocessing module: the method comprises the steps of receiving a network traffic data set, and preprocessing the network traffic data set to obtain a preprocessed network traffic data set, wherein a network traffic sample is arranged in the preprocessed network traffic data set, and comprises a normal traffic sample and an attack sample;

sample processing module: the method comprises the steps of inputting minority attack class samples in attack class samples into a pre-established ECGAN model to obtain minority attack class samples of a preset type and a preset number, and combining the obtained minority attack class samples of the preset type and the preset number with network traffic samples to obtain a final training data set;

and a feature classification module: the method comprises the steps of inputting a final training data set into a pre-established CNN-BiLSTM-Attention neural network model to obtain high-dimensional space-time flow characteristics, and inputting the high-dimensional space-time flow characteristics into a preset classification network to obtain a classification result;

model training module: the method is used for training a pre-established network intrusion detection model by taking a loss function Binary crossentropy as an evaluation standard and an optimization algorithm Adam as an optimizer in combination with the classification result to obtain a trained network intrusion detection model, thereby realizing the function of network intrusion detection.

In another aspect of the present invention, in order to achieve the above object, there is disclosed an apparatus comprising:

one or more processors;

a memory for storing one or more programs;

when executed by one or more of the processors, causes the one or more processors to implement the data generation and attention mechanism based network intrusion detection method as described above.

The invention has the beneficial effects that:

the invention utilizes ECGAN model attack sample to generate data, and also utilizes CNN-BiLSTM-Attention classification model to complete classification judgment task for network traffic, thereby further improving network intrusion detection precision, improving unknown attack detection function and reducing false alarm rate.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort;

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a schematic diagram of a network intrusion detection model according to the present invention;

FIG. 3 is a schematic diagram of the training process based on the ECGAN network model according to the present invention;

FIG. 4 is a block diagram of a schematic CNN-BiLSTM-Attention neural network of the present invention;

FIG. 5 is a schematic diagram of the system architecture of the present invention;

FIG. 6 is a verification graph of an ablation experiment of the present invention;

fig. 7 is a graph comparing the present invention with other models.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the network intrusion detection method based on the data generation and attention mechanism includes the following steps:

the data preprocessing method specifically comprises the following steps: carrying out standardization processing on the network flow data set, namely adopting single-heat coding to convert symbol characteristics into numerical value characteristic representation; min-Max Scaling is performed on the normalized and outlier processed data set to normalize the values to between 0 and 1. Taking a network flow sample in the network flow data set as input, and taking the network flow sample in the preprocessed network flow data set as output, so as to construct a network flow data preprocessing module;

the network flow data set adopted by the embodiment of the invention is a UNSW-NB15 data set, wherein the UNSW-NB15 data set is characterized in that:

the UNSW-NB15 dataset comprises a training dataset and a test dataset. The training data set and the test data set both comprise a plurality of class labels, in the method, only a classification task is needed to be carried out, namely whether the network traffic data sample is normal traffic or abnormal traffic is judged, the abnormal traffic represents that the network traffic data sample is attacked, and the data set labels are replaced. The training set is 175341, the test set is 82332, and the training set comprises 43 features, a multi-class label and a two-class label, wherein the multi-class label has 10 types in total and is respectively in Normal state and 9 attack types.

The feature types contained in the network flow data set are numerical features and symbol features, the network flow data set is subjected to standardized processing, and the symbol features are converted into numerical feature representations based on a single-heat coding method.

The UNSW-NB15 data set comprises 40 numerical characteristics and 3 symbol characteristics, wherein the 3 symbol characteristics are protocol_type characteristics, service characteristics and flag characteristics respectively, 3 symbol characteristics such as protocol_type and the like are converted into 3 numerical characteristics corresponding to the 3 numerical characteristics based on single-hot coding, and the original 43-dimensional network traffic data set is converted into 196-dimensional network traffic data set.

Based on Min-Max Scaling method, carrying out normalization processing on the network flow data set after normalization processing according to a formula:

wherein x is the data corresponding to one of the numerical characteristics in the network flow data set, x _max Is the maximum value, x in the data corresponding to the numerical characteristic _min Is the minimum value, x in the data corresponding to the numerical characteristic ^* And representing the data corresponding to the normalized numerical characteristics.

in this embodiment, the network traffic data set includes a training data set and a test data set, where the network traffic samples in the training data set and the test data set include attack class samples, and fewer attack class samples of the training data set in the preprocessed network traffic data set are used as inputs.

In this embodiment, as shown in FIG. 3, a minority class attack class sample x in the preprocessed training dataset _i Input into the ECGAN model, which consists of three parts: generator G, discriminator D, classifier C. The first stage of each training period trains a generator G and a discriminator D, and the generator G receives random noise and random labels and generates corresponding artificial data and artificial labels. The arbiter D is then trained using the real data and the artificial samples generated by the producer G and updated to better distinguish the real samples from the artificial samples. And training the classifier in the second stage of each training period, converting the real label and the generated label code into a one-hot code form, and calculating the loss of the classifier by using the real data and the artificial sample generated by the generator respectively. ECGAN uses a pseudo-tagging method that assumes tags based on the most likely class of classifier C in its current state. The generated samples and labels are retained only if the model predicts the class of samples with a probability above a certain threshold. The loss function of the discriminator is defined as:

L _D (x,z)＝BCE(D(x),1)+BCE(D(G(z)),0)

where BCE is binary cross entropy, D is discriminator, G is generator, x is true mark data, and z is a random vector. The loss function of the generator is defined as:

L _G (z)＝BCE(D(G(z)),1)

for the generator, only the predictor of the discriminator needs to be entered and tag 1 indicates that the predictor is correct. This training generator learns to generate more realistic data. For a classifier, the loss function is defined as:

L _C (x,y,z)＝CE(C(x),y)+λCE(C(G(z)),argmax(C(G(z)))＞t)

Based on input few attack samples and output specific types and number of generated samples, the constructed generator G, the discriminator D and the loss function of the classifier C are adopted to carry out iterative training for the ECGAN network model for preset times, and network parameters corresponding to the minimum value of the loss function in iteration are stored to be used as optimal data to generate the network model.

Loading the obtained optimal data generation network model to generate data of minority attack class samples in the training data set, and generating a preset number of minority attack class samples;

and merging the obtained attack samples with preset types and numbers with the training data set in the network traffic data set subjected to data preprocessing to construct a final training data set.

In one embodiment, the preset number of iterations is 9000 rounds.

In this embodiment, as shown in fig. 4, converting one-dimensional numerical characteristics of the network traffic samples in the final training dataset into two-dimensional numerical characteristics, inputting the two-dimensional numerical characteristics into the CNN neural network, extracting spatial characteristics of the network traffic samples, integrating the spatial characteristics of the network traffic samples into the BiLSTM network through the full connection layer, extracting time characteristics of the network traffic samples, and finally outputting the extracted high-dimensional space-time traffic characteristics; the CNN neural network consists of an input layer, a two-dimensional convolution layer, a pooling layer, a full-connection layer and an output layer, wherein the deep network with the convolution layer and the pooling layer alternately overlapped can iteratively extract more complex flow space characteristics; the BiLSTM neural network is a bidirectional long-short-term memory neural network, is a special LSTM network, is formed by combining forward LSTM and backward LSTM, and can better capture bidirectional dependency, so that the BiLSTM neural network is applied to extract the time characteristics of network flow samples in the final training data set.

The space-time fusion characteristics extracted by the CNN-BiLSTM network are used as the input of the Attention mechanism, and the calculation process of the Attention mechanism can be divided into two processes: firstly, calculating weight coefficients, and secondly, weighting and summing. In the process 1, similarity calculation is carried out between the value of an input vector Key and a query vector F (Q, K) to obtain an attention score S, and then the attention score S is normalized by a softmax function and is converted into a probability distribution form to obtain a weight coefficient A; in process 2, normalized attention weight A is multiplied by each Value vector and added to get the final attention vector Value. Through the two processes, the AttenionValue for the Query vector can be obtained, so that the targeted attention to the input sequence is realized, finally, a sigmoid function is used as an activation function to perform two-class output, the classification result corresponding to each network traffic data sample is output, and the performance of the provided network intrusion detection model is checked. .

By adopting the network intrusion detection model, quick, efficient and accurate network intrusion detection is realized.

In one embodiment, each model uses an Adam optimizer, the generator G and the arbiter D of the ECGAN network model use wastertein loss functions, the classifier C uses cross entropy loss functions, the drop rate of neurons of the Dropout layer of the ECGAN network model is set to 0.3, the tanh activation function, the CNN layer and the BiLSTM layer in the CNN-BiLSTM-Attention neural network each use a relu function as an activation function, and the output layer uses a sigmoid function as an activation function.

In another aspect, as shown in fig. 5, in order to achieve the above object, the present invention discloses a network intrusion detection system based on a data generation and attention mechanism, comprising:

In addition, in order to verify the effectiveness of the ECGAN model and the attribute mechanism on network intrusion detection, each model is subjected to an ablation experiment on the UNSW-NB15 data set, and the result is shown in fig. 6, and as can be seen from the graph, the classification prediction performance is better than that of the CNN model and the CNN-BiLSTM model. The accuracy of the CNN-BiLSTM model after the Attention mechanism is added is improved to 90.19 percent. After the artificial abnormal flow sample generated by the ECGAN model is added into the CNN-BiLSTM-Attention model for training, the accuracy of the model is further improved to 92.16%, and experiments prove that the ECGAN model and the Attention mechanism can improve the detection capability of the CNN-BiLSTM model on abnormal flow.

Comparing the model with naive Bayes, logistic regression, decision trees, random forests and other models, the result is shown in FIG. 7, wherein the random forests in the above models relatively perform optimally, and the accuracy of the random forests reaches 86.16%, but is still lower than that of the invention: 92.16%. The CNN-BiLSTM model based on ECGAN model and attribute mechanism has more advantages in network intrusion detection compared with other models.

Based on the same inventive concept, the present invention also provides a computer apparatus comprising: one or more processors, and memory for storing one or more computer programs; the program includes program instructions and the processor is configured to execute the program instructions stored in the memory. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal for implementing one or more instructions, in particular for loading and executing one or more instructions within a computer storage medium to implement the methods described above.

It should be further noted that, based on the same inventive concept, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor performs the above method. The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features, and advantages of the present disclosure. It will be understood by those skilled in the art that the present disclosure is not limited to the embodiments described above, which have been described in the foregoing and description merely illustrates the principles of the disclosure, and that various changes and modifications may be made therein without departing from the spirit and scope of the disclosure, which is defined in the appended claims.

Claims

1. A network intrusion detection method based on data generation and attention mechanisms, the method comprising the steps of:

2. The method for data generation and attention mechanism based network intrusion detection according to claim 1, wherein the process of preprocessing the network traffic data set comprises:

3. The method for network intrusion detection based on data generation and attention mechanisms according to claim 2, wherein the network traffic data set comprises a training data set and a test data set, wherein the network traffic samples in the training data set and the test data set comprise normal traffic samples and attack class samples, and wherein the inputting of minority class attack class samples in the attack class samples into the pre-established ECGAN model is performed by using minority class attack class samples in the training data set.

4. A network intrusion detection method according to claim 3 based on data generation and attention mechanisms, wherein the network traffic data sets contain feature types of numerical features and symbolic features, the network traffic data sets are subjected to standardization processing, and the symbolic features are converted into numerical feature representations based on a single-heat coding method:

5. The method of claim 1, wherein the ECGAN model comprises a generator G, a discriminator D, and a classifier C.

6. The data generation and attention mechanism based network intrusion detection method of claim 5, wherein the discriminator's loss function is defined as:

L _D (x,z)＝BCE(D(x),1)+BCE(D(G(z)),0)

the loss function of the generator is defined as:

L _G (z)＝BCE(D(G(z)),1)

for a classifier, the loss function is defined as:

L _C (x,y,z)＝CE(C(x),y)+λCE(C(G(z)),argmax(C(G(z)))＞t)

y is the pseudo tag, λ is the unsupervised loss weight, CE is the cross entropy loss, C is the classifier, and t is the pseudo tag threshold.

7. The network intrusion detection method based on the data generation and attention mechanism according to claim 1, wherein the process of inputting the minority attack class samples in the attack class samples into the pre-established ECGAN model to obtain the minority attack class samples of the preset type and the preset number is as follows:

8. The network intrusion detection method based on the data generation and attention mechanism according to claim 1, wherein the one-dimensional numerical characteristics of the network traffic samples in the final training dataset are converted into two-dimensional numerical characteristics, the two-dimensional numerical characteristics are input into a CNN neural network, the spatial characteristics of the network traffic samples are extracted, the temporal characteristics of the network traffic samples are extracted by integrating the time characteristics input into the BiLSTM network through a full connection layer, and the extracted high-dimensional space-time traffic characteristics are finally output;

9. A network intrusion detection system based on data generation and attention mechanisms, comprising:

10. An apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by one or more of the processors, causes the one or more processors to implement a data generation and attention mechanism based network intrusion detection method according to any one of claims 1 to 8.