CN113542271A

CN113542271A - Network background flow generation method based on generation of confrontation network GAN

Info

Publication number: CN113542271A
Application number: CN202110796467.1A
Authority: CN
Inventors: 董庆宽; 任晓龙; 陈原; 赵晓倩; 杨福兴; 穆涛
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-07-14
Filing date: 2021-07-14
Publication date: 2021-10-22
Anticipated expiration: 2041-07-14
Also published as: CN113542271B

Abstract

The invention provides a network background flow generation method based on generation of a confrontation network GAN, which comprises the following steps: 1) acquiring a training sample set; 2) constructing and generating an confrontation network model library; 3) performing iterative training on the generated confrontation network model library; 4) acquiring traffic data packet characteristics obtained by the prediction of a trained generator network; 5) the network traffic generates a result. The invention carries out iterative training on the model base which comprises a plurality of generation countermeasure networks with the same type as the network application through the training sample set comprising the network flow data packet characteristics of various network applications, accelerates the convergence speed of the generation countermeasure network model base, and effectively improves the efficiency of generating the network background flow on the premise of ensuring the communication safety.

Description

Network background flow generation method based on generation of confrontation network GAN

Technical Field

The invention belongs to the technical field of network security, relates to a network background traffic generation method, and particularly relates to a network background traffic generation method based on generation of a confrontation network GAN, which can be used for generating network background traffic.

Background

When communication nodes in the internet use network application to communicate, the communication nodes need to perform interaction of traffic data packets, and one network traffic sent by the communication nodes contains a group of packet sequences

Wherein

A-th indicating that the communication node needs to transmit_iA traffic packet.

Operators providing network application services need a large number of network traffic data packet samples during network security analysis, network pressure test and the like, and network traffic generation technology is continuously developed. The network traffic generation method mainly comprises a network traffic generation method based on a statistical model and a network traffic generation method based on traffic characteristics.

The network traffic generation method based on the statistical model mainly generates traffic by means of probability models such as Markov models, Poisson distribution models and other matching traffic generation tools, and the network traffic generation method mainly generates background network traffic during internet pressure testing.

The network flow generation method based on the flow data packet features mainly utilizes a machine learning technology to extract features of a flow data packet to serve as a training sample set of a neural network, then the neural network is built for iterative training, finally, the network flow features are predicted and output, then a flow generation tool is used for generating an initial data packet sequence according to the predicted network flow features, and data required to be sent by a user are encrypted and then embedded into the initial data packet sequence to generate a network flow.

The generation countermeasure network can predict the traffic data packet characteristics, so that the probability distribution of the traffic data packet characteristics predicted by the generator network is very similar to the training sample set in statistical characteristics, therefore, the application of the generation countermeasure network in the aspect of network traffic generation is significant, for example, the patent application with the application publication number of CN109889452A entitled "network background traffic generation method and system based on conditional generation countermeasure network" discloses a background traffic generation method based on conditional generation countermeasure network (CGAN), the method fixedly fills all flow data packet samples collected in advance into M-dimensional vectors, builds a conditional generation countermeasure network, carries out iterative training on the conditional generation countermeasure network, and generating the simulated background traffic by a generator network of the training conditional generation countermeasure network, and then transmitting. However, the method has the disadvantages that each traffic data packet sample of the training sample set is filled to the fixed 1518 features after vectorization, all the traffic data packets of different types are used as a training sample set of the conditional generation countermeasure network, and in the process of obtaining the network background traffic generation result, the iterative training is performed on the conditional generation countermeasure network through a plurality of traffic data packets of different types, so that the convergence speed is slow, and the efficiency of generating the network background traffic is low.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, provides a network background traffic generation method based on generation of a confrontation network GAN, and aims to improve the efficiency of network background traffic generation on the premise of ensuring communication security.

In order to achieve the purpose, the technical scheme adopted by the invention comprises the following steps:

(1) obtaining a training sample set X_train：

(1a) S traffic data packets B ═ B { B } including M network applications when a communication node communicates with the Internet by using a wireshark tool₁,B₂,...,B_s,...,B_SEach network application corresponds to at least one traffic data packet, each traffic data packet corresponds to one network application, and each traffic data packet B_sIncluding W characteristics, and labeling each network application category to obtain category label set R corresponding to M network applications_class＝{R₁,R₂,...,R_m,...,R_MWherein M is more than or equal to 2, S is more than or equal to 5000, B_sS is more than or equal to 1 and less than or equal to S, W is more than or equal to 2, R represents the S-th flow data packet_mRepresenting a category label corresponding to the mth network application, wherein M is more than or equal to 1 and less than or equal to M;

(1b) for each flow data packet B_sThe non-numerical characteristics are subjected to single-hot coding, and each flow data packet subjected to single-hot coding is normalized to obtain a flow data packet set subjected to preprocessing

Wherein

Is represented by B_sThe result of the pretreatment of (1);

(1c) by R_classThe network application class label in (1) for each preprocessed traffic data packet

Is marked to obtain

Corresponding network application category label set y ═ y₁,y₂,...,y_s,...y_SAnd gathering the preprocessed flow data packets

And combining the corresponding network application class label sets y into a training sample set

Wherein y is_sRepresenting and pre-processing traffic packets

Corresponding Web application class tag, X_RmRepresents X_trainThe network application class label is R_mThe set of samples of (a) is,

indicating that the network application class label is R_mV denotes X_trainThe network application class label is R_mV is more than 0 and less than S, and V is more than or equal to 0 and less than or equal to V;

(2) constructing and generating an antagonistic network model library:

constructing a model library comprising M generative countermeasure networks of the same kind as the network applications

Each generating a countermeasure network

Comprising a network of generators cascaded in sequence

And arbiter network

Wherein the generator network

The system comprises an input layer, a first full-connection module and an output layer; arbiter network

Comprises an input layer, a second full-connection module and an output layer,

representing the generation countermeasure network corresponding to the mth network application;

(3) performing iterative training on the generated confrontation network model library:

(3a) initializing mth generative countermeasure network

Inclusion generator network

Has the parameters of

Arbiter network

Has the parameters of

The number of iterations is q₁Maximum number of iterations is Q₁，Q₁Not less than 2000, and q is₁＝0；

(3b) From the network application class label R_mSample set of

Randomly selecting K samples

As generating countermeasure networks

Of a network of input, generators

For each sample

ToProcessing the flow data packet characteristics for prediction to obtain a predicted flow data packet characteristic set

Arbiter network

Calculate each one separately

And each of

Derived from a sample set

To obtain a probability set

And probability set D₂＝{d₁,d₂,...,d_k,...,d_KK is more than or equal to 1 and less than or equal to 50, K is more than or equal to 1 and less than or equal to K,

represents the k-th randomly selected sample,

to represent

Through a generator network

The characteristics of the traffic data packet obtained by prediction,

representation arbiter network

Computing

Derived from a sample set

Probability of (d)_kRepresentation arbiter network

Computing

Derived from a sample set

The probability of (d);

(3c) using a cross-entropy loss function, by

Computation generator network

Loss of

At the same time pass

And d_kComputation arbiter network

Loss of

And using a counter-propagating method by

Computation generator network

Gradient of network parameters by

Computing arbiter network

Network parameter gradient of (a); then using a gradient descent algorithm, by

Network parameter gradient pairs of

Network parameters of

Is updated by

Network parameter gradient pairs of

Parameter (d) of

Updating is carried out;

(3d) judging q₁＝Q₁If yes, obtaining M trained generation countermeasure networks

Otherwise, let q₁＝q₁+1, and performing step (3 b);

(4) obtaining a trained generator network

Predicting the characteristics of the flow data packet:

will train sample set X_trainAs a network of generators each trained

For each application category, labeled R_mSample set of

Each sample in the flow data packet feature prediction method is subjected to preprocessing flow data packet feature prediction to obtain a predicted flow data packet feature set

Wherein

Indicates that the Web application tag is R_mSample set of

The V samples are subjected to a predicted flow data packet characteristic set obtained through prediction,

representing a training sample set X_trainThe network application class label is R_mSample set of

The v sample of (1)

Trained generator network

Predicting the obtained flow data packet characteristics;

(5) and a network flow generation result:

randomly selecting an application class label as R from the feature set A of the predicted flow data packet_wPredicted traffic data packet feature set

And from

Randomly selecting L predicted flow data packet characteristics

The flow generator sets the characteristics of the data packet according to the predicted flow

Generating an initial traffic packet sequence c ═ { c ═ c₁,c₂,...c_l,...c_LAnd encrypting data to be sent by the communication node and embedding the encrypted data into each initial flow data packet c_lObtaining network traffic c ' ═ c ' including L traffic packets in which encrypted data is embedded '₁,c′₂,...c′_l,...c′_LTherein of

For predicting traffic data packet feature set

The characteristic of the first predicted flow data packet randomly selected from (c)_lRepresenting an initial flow data packet, c 'generated by the flow generator according to the characteristic of the l predicted flow data packet'_lRepresenting the flow data packet of the first initial flow data packet through the encryption data embedding, wherein L is more than or equal to 1 and less than or equal to L, L is more than or equal to V, and R is more than or equal to 1 and less than or equal to R_w≤R_M。

Compared with the prior art, the invention has the following advantages:

1. according to the invention, through the training sample set containing the network flow data packet characteristics of various network applications, iterative training is carried out on the model base consisting of a plurality of generation countermeasure networks with the same type as the network applications, one generation countermeasure network corresponds to one network application, the defect of low convergence speed caused by iterative training of a conditional generation countermeasure network through a plurality of different types of flow data packets in the prior art is avoided, and the efficiency of network background flow generation is effectively improved on the premise of ensuring the communication safety.

2. According to the model base of the generated countermeasure network, each generated countermeasure network comprises the generator network and the discriminator network which are sequentially cascaded, the structure is simple, the convergence speed of training can be improved, and the generation efficiency of the network background flow is further improved.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention.

Detailed Description

The invention is described in further detail below with reference to the figures and specific examples.

Referring to fig. 1, the present invention includes the steps of:

step 1) obtaining a training sample set X_train：

Step 1a) in this embodiment, a wireshark tool is used to grab S traffic data packets B ═ B that include M kinds of network applications when a communication node performs internet communication₁,B₂,...,B_s,...,B_SEach network application corresponds to at least one traffic data packet, each traffic data packet corresponds to one network application, and each traffic data packet B_sIncluding W characteristics, and labeling each network application category to obtain category label set R corresponding to M network applications_class＝{R₁,R₂,...,R_m,...,R_MIn which B is_sIndicating the s-th traffic packet, R_mThe class label corresponding to the mth network application is represented, M is not less than 1 and not more than M, in this embodiment, S is 7553, M is 5, W is 8, the 5 network applications include an Http web page request, Wechat, OneNote, 163 mailbox, and a channel dictionary, and each traffic packet includes 8 features, which are a source port, a destination port, a network service type of a target host, a protocol type, a packet length, a packet arrival time interval, and a sliding window length, respectively.

Step 1B) for each traffic packet B_sThe non-numerical characteristics are subjected to single-hot coding, and each flow data packet subjected to single-hot coding is normalized to obtain a flow data packet set subjected to preprocessing

Wherein

Is represented by B_sIn this embodiment, the two characteristics of the protocol type and the network service type of the target host are subjected to one-hot coding;

the one-hot coding uses the state register to code the state, and the one-hot coding can convert the non-digital features which are difficult to learn for generating the countermeasure network model into the digital features which are easy to learn for generating the countermeasure network model, so that the training difficulty for generating the countermeasure network model is reduced.

The data normalization technology can simplify data operation, and can solve the problem of gradient explosion when the generated countermeasure network adjusts network parameters according to a gradient descent algorithm, thereby accelerating the convergence speed of the generated countermeasure network model.

Step 1c) by R_classThe network application class label in (1) for each preprocessed traffic data packet

Is marked to obtain

Wherein y is_sRepresenting and pre-processing traffic packets

The corresponding web application category label is used,

represents X_trainThe network application class label is R_mThe set of samples of (a) is,

indicating that the network application class label is R_mV denotes X_trainThe network application class label is R_m0 < V < S, 0. ltoreq. V. ltoreq.V, in this example

The total number of samples contained in (a) 3067,

the total number of samples contained in (a) is 2368,

the total number of samples contained in (a) is 903,

total number of samples contained in (1) is 453, X_R5The total number of samples contained in (c) is 762.

Step 2), constructing and generating a confrontation network model library:

Each generating a countermeasure network

Comprising a network of generators cascaded in sequence

And arbiter network

Wherein the generator network

Comprises an input layer, a second full-connection module and an output layer,

wherein the generator network

The first full-connection module in the system comprises three full-connection layers which are sequentially stacked, wherein the activation functions are all leak-relu, and the number of the neurons is respectively 50, 30 and 30; the output layer contains 8 neurons, and the activation function is tanh;

arbiter network

The second full-connection module in (1) comprises the activation functions of leak-relu, the number of the neurons is respectively 100, 60 and 30, the output layer comprises 5 neurons, and the activation function is softmax.

And a leak-relu activation function is used, so that the problem that the network gradient disappears in the back propagation process is solved. The activation function of a common neural network learning is f (x) ═ x, whose derivative is constantly 1, which results in the disappearance of the gradient during back propagation. The leaky-relu activation function is used in the generation of the countermeasure network, so that the learning speed of the generation of the countermeasure network can be increased, and the training time for generating the countermeasure network is further shortened.

Step 3) generating an iterative training of the antagonistic network model library:

step 3a) initializing the mth generative countermeasure network

Inclusion generator network

Has the parameters of

Arbiter network

Has the parameters of

The number of iterations is q₁Maximum number of iterations is Q₁，Q₁12000, and q₁＝0；

Step 3b) applying a class label as R from the network_mSample set of

Randomly selecting K samples

As generating countermeasure networks

Of a network of input, generators

For each sample

Predicting the characteristics of the preprocessed traffic data packet to obtain a feature set of the preprocessed traffic data packet

Arbiter network

Calculate each one separately

And each of

Derived from a sample set

To obtain a probability set

And probability set D₂＝{d₁,d₂,...,d_k,...,d_KK10 in this example, 1 ≦ K,

represents the k-th randomly selected sample,

to represent

Through a generator network

The characteristics of the traffic data packet obtained by prediction,

representation arbiter network

Computing

Derived from a sample set

Probability of (d)_kRepresentation arbiter network

Computing

Derived from a sample set

The probability of (d);

step 3c) using a cross entropy loss function by

Computation generator network

Loss of

At the same time pass

And d_kComputation arbiter network

Loss of

And adopting a back propagation method built in an Adam optimizer by

Computation generator network

Gradient of network parameters by

Computing arbiter network

Network parameter gradient of (a); the Adam optimizer then uses a gradient descent algorithm, by

Network parameter gradient pairs of

Network parameters of

Is updated by

Network parameter gradient pairs of

Parameter (d) of

Updating is carried out;

wherein there is a loss

And

the calculation formulas of (A) and (B) are respectively as follows:

step 3d) determining q₁＝Q₁If yes, obtaining M trained generation countermeasure networks

Otherwise, let q₁＝q₁+1, and performing step (3 b);

the method comprises the steps of establishing a model base formed by a plurality of generation countermeasure networks with the same network application types, and performing iterative training on the generation countermeasure network models in the model base in parallel through a training sample set containing network traffic data packet characteristics of various network applications. The defect of low convergence speed caused by iterative training of a conditional generation countermeasure network through a plurality of different types of flow data packets in the prior art is overcome.

Step 4) obtaining the trained generator network

Predicting the characteristics of the flow data packet:

will train sample set X_trainAs a network of generators each trained

For each application category, labeled R_mSample set of

Wherein

Indicates that the Web application tag is R_mSample set of

representing a training sample set X_trainThe network application class label is R_mSample set X of_RmThe v sample of (1)

Is trainedGood generator network

Predicting the obtained flow data packet characteristics;

step 5), generating a result by the network flow:

And from

Randomly selecting L predicted flow data packet characteristics

Using tarfen equal flow generator script to collect characteristic set according to predicted flow data packet

Writing a configuration file, and generating an initial flow data packet sequence c ═ c by a flow generator according to the configuration file₁,c₂,...c_l,...c_LAnd encrypting data to be sent by the communication node and then sequentially embedding the data into each initial flow data packet c_lObtaining network traffic c ' ═ c ' including L traffic packets in which encrypted data is embedded '₁,c′₂,...c′_l,...c′_LTherein of

For predicting traffic data packet feature set

The characteristic of the first predicted flow data packet randomly selected from (c)_lRepresenting an initial flow data packet, c 'generated by the flow generator according to the characteristic of the l predicted flow data packet'_lIndicating that the ith initial traffic packet has been embedded with encrypted dataFlow data packet, L is more than or equal to 1 and less than or equal to L, L is more than or equal to V, and R is more than or equal to 1 and less than or equal to R_w≤R_MIn this embodiment, L is 10.

The foregoing description is only an example of the present invention and does not constitute any limitation to the present invention, and it will be apparent to those skilled in the art that various modifications and variations in form and detail may be made without departing from the principle of the present invention after understanding the content and principle of the present invention, but these modifications and variations are within the scope of the claims of the present invention.

Claims

1. A network background traffic generation method based on generation of a countermeasure network GAN is characterized by comprising the following steps:

(1) obtaining a training sample set X_train：

Wherein

Is represented by B_sThe result of the pretreatment of (1);

Is marked to obtain

Wherein y is_sRepresenting and pre-processing traffic packets

The corresponding web application category label is used,

(2) constructing and generating an antagonistic network model library:

the construction includes the application category of the networkThe same M model libraries for generating the countermeasure network

Each generating a countermeasure network

Comprising a network of generators cascaded in sequence

And arbiter network

Wherein the generator network

Comprises an input layer, a second full-connection module and an output layer,

(3a) initializing mth generative countermeasure network

Inclusion generator network

Has the parameters of

Arbiter network

Has the parameters of

(3b) From the network application class label R_mSample set of

Randomly selecting K samples

As generating countermeasure networks

Of a network of input, generators

For each sample

Arbiter network

Calculate each one separately

And each of

Derived from a sample set

To obtain a probability set

represents the k-th randomly selected sample,

to represent

Through a generator network

The characteristics of the traffic data packet obtained by prediction,

representation arbiter network

Computing

Derived from a sample set

Probability of (d)_kRepresentation arbiter network

Computing

Derived from a sample set

The probability of (d);

(3c) using a cross-entropy loss function, by

Computation generator network

Loss of

At the same time pass

And d_kComputation arbiter network

Loss of

And using a counter-propagating method by

Computation generator network

Gradient of network parameters by

Computing arbiter network

Network parameter gradient of (a); then using a gradient descent algorithm, by

Network parameter gradient pairs of

Network parameters of

Is updated by

Network parameter gradient pairs of

Parameter (d) of

Updating is carried out;

Otherwise, let q₁＝q₁+1, and performing step (3 b);

(4) obtaining a trained generator network

Predicting the characteristics of the flow data packet:

will train sample set X_trainAs a network of generators each trained

For each application category, labeled R_mSample set of

Each sample in the flow data packet is subjected to preprocessing flow data packet characteristic prediction to obtain a prediction flowVolume packet feature set

Wherein

Indicates that the Web application tag is R_mSample set of

The v sample of (1)

Trained generator network

Predicting the obtained flow data packet characteristics;

(5) and a network flow generation result:

And from

Randomly selecting L predictionsTraffic packet characterization

For predicting traffic data packet feature set

2. The method for generating background traffic of a network based on the GAN of claim 1, wherein the GAN of step (2) is used to generate the anti-adversarial network

Wherein:

generator network

The first full-connection module in (1) comprises the activation functions of all leak-relu, and the number of the neurons is respectively50, 30 and 30; the output layer contains 8 neurons, and the activation function is tanh;

arbiter network

3. The method for generating network background traffic based on generating anti-GAN network as claimed in claim 1, wherein the passing in step (3c)

Computation generator network

Loss of

And by

And d_kComputation arbiter network

Loss of

The calculation formulas are respectively as follows: