CN113556328B

CN113556328B - Encryption traffic classification method based on deep learning

Info

Publication number: CN113556328B
Application number: CN202110736516.2A
Authority: CN
Inventors: 付兴兵; 余志鹏; 陈媛芳; 林菲
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2022-09-30
Anticipated expiration: 2041-06-30
Also published as: CN113556328A

Abstract

The invention discloses an encryption traffic classification method based on deep learning, and relates to the field of traffic classification and deep learning. The invention aims to research the potential relation between encrypted flow and a time-space sequence of the flow, and designs four RNN models with good time-sequence classification effect, namely GRU, LSTM, BiGRU and BiLSTM, to classify the encrypted flow. To further improve the reliability of the experimental conclusions, a CNN model was designed for comparison. After comparison of a plurality of models, in the four types of RNN models, except that the performance of the GRU model is slightly poor, the classification effects of the other three RNN models are almost the same; the performance of the CNN model in this experiment is very good, and the running time is the shortest of the five models.

Description

Encryption traffic classification method based on deep learning

Technical Field

The invention belongs to the field of flow classification and deep learning, and aims to research the potential relation between encrypted flow and a flow time-space sequence.

Background

With the continuous development of science and technology and the continuous expansion of internet scale, the current network application shows diversified development trend, more and more activities of people occur on computers, and daily life can not leave the network, which is a necessary result brought by the increasingly digital society. But with the further advent of the "big data" era, some potential problems are gradually exposed to our view: a large amount of unknown internet data traffic is filling the whole internet world, so that the internet bandwidth is fully occupied, the pressure and the burden of the internet are greatly increased, some important data transmission is damaged, and the transmission efficiency is greatly reduced. In addition, the requirements of different network applications for various network resources are different, for example: when voice or video conversation is used, the required resource is a continuous and stable network bandwidth resource; smaller network traffic such as telnet, which is sensitive to delay, has higher requirements on the connectivity of the network, and needs to be able to well ensure that they can quickly pass through; for some critical and sensitive network data, it is required that network technologies can effectively guarantee their security and data confidentiality during transmission, and then a complete and non-interference dedicated tunnel is required. Therefore, it is very important for network managers to accurately classify the application traffic and know the specific factors affecting the network transmission efficiency.

However, at present, with the increasing and increasing awareness of information security of people, encrypting traffic and data in the internet becomes an important way and means for people to maintain rights and protect their information privacy. A report has shown that by 2017 in 2 months, half of the network traffic has been encrypted, even with mandatory legal provisions and requirements for certain network traffic. It is not possible for network managers to classify network traffic as in the last decade, only in clear text traffic, but much effort should be put into the research of classification of encrypted traffic.

With the vigorous development of artificial intelligence, the underlying fields of machine learning and deep learning become the most explosive field in current academic research at a stroke. Countless young students are immortal in their lives and inject a great deal of fresh and youthful vigor for this emerging area. Much of the knowledge about computational statistics, applied mathematics and the basis of mathematics is widely used in the research fields of artificial intelligence and machine learning, which has made it rapidly evolve over the past decades. In particular, network technology makes a stepwise breakthrough, which makes people aware of the infinite possibilities in the field of artificial intelligence. More and more new technology products and resources are being put into the field of deep learning. This measure makes deep learning unprecedented and highly developed. It is believed that in the coming years, deep learning will break through the existing technical barriers and play a role of the center pillar in leading a new technological revolution. At present, many industries strive to be ascending in the new technological revolution, and the flow classification is natural and no exception. However, although the traffic classification method combining machine learning or Deep learning is adopted in recent years, the technology is still in the beginning stage, research on various aspects is not particularly perfect, and the classification effect is generally poor, so that most of the mainstream traffic classification methods in the market at present are the traditional traffic analysis based on port numbers or the traffic analysis based on the DPI (Deep Packet Inspection) technology.

Based on the various current situations, the invention aims to improve the existing flow classification algorithms, and explores the internal logic relation among different flows through experiments by adopting a mode of combining deep learning, so that the classification efficiency of the algorithm is improved, and a set of better flow classification model is built.

At present, the mainstream methods for classifying the flow mainly include the following methods: the method comprises the steps of port number-based traffic analysis, DPI technology, machine learning traffic analysis method based on traffic time sequence and statistical characteristics, and deep learning traffic analysis method based on traffic space-time characteristics.

A network traffic analysis method based on a default port number is a method which is widely applied to network traffic analysis at the earliest. The main principle is that the traffic in the network is distinguished according to the characteristic that different network traffic adopts different default port numbers, and one of the main advantages of the traffic information analysis identification method is simple and quick. The port number identification-based approach first proposed by IANA is still of great practical value in an era where the number of network traffic and the types of network services were relatively small before 2010. However, the number of internet applications has increased year by year, and the number of conventional ports is far from enough, which forces the process to select some random port numbers as temporary ports. Thus, the accuracy and precision of traffic analysis using the random port number are significantly reduced. Furthermore, some technicians may artificially modify and disguise the port number for security or privacy reasons, which further affects the accuracy of port number-based traffic analysis.

Analyzing the flow through a DPI technology is the most common method for classifying the flow at present, the DPI detection technology itself is a deep packet inspection analysis technology based on a data acquisition packet of a large application network layer, various deep packet inspections are respectively performed on payloads (such as http, ftp, https, etc.) of various large network layers and application layers, and the number of the payloads in the deep packet inspection is accurately divided according to the network payloads and application standard values. Pattern matching, packet-by-packet analysis, etc. techniques are used in DPI techniques, which enable specific protocols and application types for some traffic to be accurately identified. However, the performance of the DPI algorithm is relatively low, and the operations of protocol parsing, restoring and feature matching need to be performed, which directly results in a huge overhead of the cost of calculating and saving data, and the DPI algorithm has poor extensibility, and any new network application traffic cannot be identified before the feature library in the network application is not updated. Even more fatal, DPI technology identification is based on plaintext, which results in a low probability that encrypted traffic will be correctly identified by DPI technology. The DPI technology can only rely on detecting the server hello part of the features of the HTTPS protocol for identification, and the accuracy of classifying encrypted traffic is poor. Today, the proportion of encrypted traffic is increasing, and a traffic analysis method based on the DPI technology is obviously not up to the time.

The traffic classification method based on machine learning is a novel network data traffic classification analysis method which rises up in recent years, and is also a research hotspot of various scholars. The method is mainly characterized in that according to the characteristic that the traditional encryption technology does not process flow statistical characteristics, a large number of data flows and a plurality of data packet characteristics are collected as characteristic sets, and machine learning algorithm is added to identify the flow statistical characteristics, so that the purpose of flow classification is achieved.

Deep learning is a high-level machine learning technology, does not need to perform complex feature engineering, can realize good performance generally by a feature automatic selection function and only directly transmitting data to a network. The characteristic enables deep learning to be an ideal flow classification method, and especially when new types of flows are continuously generated, the flow classification model using the deep learning algorithm has better effect. Compared with the traditional machine learning method at present, the deep learning algorithm has higher learning capacity and can learn highly complex patterns. The space-time characteristics are mapped into a tensor form and are used as input to be fed into a deep learning model, and a prediction classification result can be obtained through a series of deep learning networks. This often has good classification effect on the traffic data which is continuous in space and time.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides an encryption flow classification method based on deep learning.

The invention is inspired by Draper-Gil, Gerard and the like, the huge connection between the encryption flow and the time is speculated, and four kinds of RNN models, namely LSTM, GRU, bidirectional LSTM and bidirectional GRU, are designed to compare and analyze the classification performance of the encryption flow by utilizing the characteristic that the effect of processing data related to the time by a recurrent neural network is better. Meanwhile, in order to further improve the persuasion of experimental conclusion, a CNN model used for comparison is also designed in the experiment. The most important part in the experiment is the construction and optimization of the 4 classes of RNN models and CNN models. After the models are instantiated, parameters of each model need to be continuously modified according to experimental data until the optimal models are obtained, and the optimal models are used as final models. Six performance indexes (Accuracy, Precision, Recall, Loss, F1-Measure, FPR) of each final model are listed in the same table for comparison; finally, potential relation between time and flow is mined through experiments.

The technical problem to be solved by the invention is realized by the following technical method:

the invention adopts a deep learning-based mode to carry out classification research on encrypted flow, and mainly comprises the following steps:

step 1: preprocessing network flow data;

step 2: designing and building a deep learning model;

step 2.1, building a GRU model;

step 2.2, building an LSTM model;

step 2.3, building a BiGRU model;

step 2.4, building a BilSTM model;

step 2.5, building a CNN model;

step 3, training the model by using a training data set;

step 4, evaluating the model and making corresponding data records;

and 5, comparing and analyzing the experimental data, and giving a conclusion.

Drawings

FIG. 1 is a general experimental flow chart of the present invention;

FIG. 2 is a block diagram of a portion of the data preprocessing of the present invention;

FIG. 3 is a schematic diagram of a GRU model structure;

FIG. 4 shows six performance indicators and operating times of a GRU model;

FIG. 5 is a graph of GRU model Loss;

FIG. 6 is a graph of the GRU model ROC;

FIG. 7 is a schematic structural diagram of an LSTM model;

FIG. 8 shows six performance indexes and runtime of the LSTM model;

FIG. 9 is a plot of the LSTM model Loss;

FIG. 10 is a graph of the LSTM model ROC;

FIG. 11 is a schematic structural diagram of a BiGRU model;

FIG. 12 shows six performance indicators and operating times of the BiGRU model;

FIG. 13 is a graph of the BiGRU model Loss;

FIG. 14 is a BiGRU model ROC plot;

FIG. 15 is a schematic structural diagram of a BilSTM model;

FIG. 16 shows six performance indexes and operation time of the BilSTM model;

FIG. 17 is a BilSTM model Loss plot;

FIG. 18 is a BiLSTM model ROC plot;

FIG. 19 is a schematic diagram of a CNN model structure;

FIG. 20 shows six performance indicators and operating times of the CNN model;

FIG. 21 is a graph of the CNN model Loss;

FIG. 22 is a CNN model ROC graph.

Detailed Description

The invention is further described below with reference to the following examples, which are set forth in detail:

today the internet is in a high development stage and traffic classification has been widely applied in several fields, QoS provisioning, ISP operator billing statistics, security related applications, intrusion detection systems, etc. However, with the rapid change of internet network traffic, especially under the condition that the encryption traffic ratio is increasing in recent years, the traditional traffic analysis method based on the DPI technology and the traffic analysis technology based on the port number are slightly inferior, and the traditional machine learning method depends on features selected manually too much, if the feature sets are too small, the generalization of learning results is severely limited, and the extraction of feature engineering is complicated, and an optimal distribution space of data needs to be found by converting coordinate axes, or mutually independent feature attributes are extracted by adopting a feature conversion mode, so that the purposes of reducing data dimension and reducing data redundancy are achieved; today, as traffic classes continue to increase, the learning rate of conventional machine learning-based methods is slightly insufficient. Therefore, we decide to use a deep learning-based method to perform classification research on encrypted traffic. Network data traffic is highly continuous in time and space, for example, when a webpage is accessed and a file is uploaded or downloaded, three-way handshaking of TCP is required to be carried out to establish a stable connection, which means that a large amount of traffic of the same type is flooded in the network in a short period of time; this provides a sense of inspiration for subsequent studies, perhaps by analyzing time-related data, based on the characteristic that flow continues spatio-temporally.

The whole flow of the encryption flow classification experiment is shown in figure 1, the whole flow is divided into three parts, namely data preprocessing, model building and training and model prediction. The concrete description is as follows:

step 1, preprocessing network flow data; as shown in fig. 2:

step 1.1, importing data in the ISCX VPN 2016 data set by using a scipy.io library of Python.

And 1.2, dividing the data into a sample part and a label part after the data is successfully imported.

And 1.3, centralizing and normalizing the sample. Centering means that the sample data is subtracted by the average value of the sample, so that the purpose that the sample overall presents normal distribution with 0 as the center is achieved; normalization is to divide sample data by the standard deviation of a sample, so that the absolute value of the sample data is finally changed into a decimal between 0 and 1, and the data is calculated more quickly, thereby accelerating the convergence of the model.

And step 1.4, converting the label into a one-hot code format.

And 1.5, dividing the processed labels and samples into a training set and a test set (the proportion of the test set is 30 percent of the total data set), and randomly disordering the data in the set.

Step 2: designing and building a deep learning model;

and 2.1, building a GRU model. In the constructed GRU model, there are two GRU layers, one dropout layer and one dense layer. The first and second layers are GRU layers, each having 50 neurons. And (3) adding a dropout layer, and setting an inactivation factor to be 0.2, namely randomly making 20% of neurons dormant when passing through the dropout layer, thereby achieving the effect of preventing overfitting of the model. Finally, adding a dense layer, fully connecting the output of the previous layer, outputting 7 numbers of 0-1 as a result of predicting the traffic category, and activating a function to select a sigmoid function; as shown in fig. 3-6;

and 2.2, building an LSTM model. In the constructed LSTM model, 4 LSTM layers, 4 dropout layers, 1 full-connection layer and 1 activation layer are shared. Firstly, four LSTM layers are provided, and each layer has 50 neurons; a dropout layer is arranged behind each LSTM layer, so that 20% of neurons are inactivated randomly, and the overfitting phenomenon of the model is prevented. In the first three layers of the LSTM layer, the parameter return _ sequences is set to True, so that the output sequence in the LSTM layer returns hidden state values of all time steps, and the study on the space-time sequence relation among input data is facilitated. Finally, a layer of complete full-connection layer outputs 7 types of result prediction, and softmax is used as an activated function; as shown in fig. 7-10;

and 2.3, building a BiGRU model. Among the BiGRU models constructed, 2 BiGRU layers, 1 timeDistributed layer, 1 fully-connected layer and 1 globalaveragepoiling 1D layers were used. Firstly, a BiGRU layer is formed, 80 neurons are arranged on each GRU layer, 160 neurons are formed in total, a ReLU function is adopted as an activation function, and a random inactivation factor is set to be 0.3, namely 30% of neurons are inactivated randomly, and the convergence rate of a model is increased. Merge _ mode in the model refers to a combination mode of forward and backward GRU models, and is set as concat in an experiment, namely a splicing mode. The bidirectional GRU layer is followed by a batch normalization layer for preventing the gradient of the model from disappearing and accelerating the convergence of the model. And then adding a timeDistributed layer, carrying out full connection on the time dimension, respectively outputting a time step from the sequence for each time step in the input, setting the number of fully connected neurons as 80, and selecting a ReLU function as an activation function. And adding a BiGRU layer and a batch standardization layer into the model, wherein the parameter configuration of each layer is the same as that of the BiGRU layer and the batch standardization layer which are configured for the first time. And then, a full connection layer is formed, 7 neurons are totally arranged, the output of the upper layer is fully connected, 7 numbers between 0 and 1 are output, and softmax is selected as an activation function. Finally, a GlobavalagePooling 1D layer is arranged, and the GlobavalagePooling 1 layer is used for applying global average value to the time domain signal and obtaining a final prediction result after pooling; as shown in fig. 11-14;

and 2.4, building a BilSTM model. In our constructed BilSTM model, there are 2 BilSTM layers, 1 LSTM layer and 1 fully connected layer. Firstly, a BiLSTM layer is introduced, 80 neurons are arranged on each LSTM, the number of the neurons is 160, and the dropout value is set to be 0.3, so that 30% of the neurons can be inactivated randomly, and the over-fitting phenomenon is avoided. Then introducing an LSTM layer, setting the dropout value to 0.3, randomly inactivating 30% of neurons, and accelerating the convergence speed of the model; adding a layer of BiLSTM, wherein the parameter settings of the layer are the same as those of the BiLSTM layer added for the first time; finally, a fully-connected layer is added to be used as prediction result output of the model, 7 neurons are arranged in the fully-connected layer, and 7 prediction results with the number between 0 and 1 representing 7 flow types are finally output; the activating function adopts a softmax function; as shown in fig. 15-18.

And 2.5, building a CNN model. In the CNN model we constructed, there were 2 layers of 1DCNN, 1 dropout, 1 layer of MaxPooling1D, 1 flattening layer and 2 fully connected layers. Adding two 1DCNN layers in the model, wherein the number of filters in each 1DCNN layer is 64, the number of convolution kernels is 3, the number of activation functions is selected from a ReLU function, and the value of dropout is set to 0.1, namely, 10% of neurons are inactivated randomly, the convergence speed is increased, and the over-fitting phenomenon is prevented. Adding a layer of Max plating 1D, reducing the dimension of the data, simplifying the calculation amount, and setting the number of pooling layers to be 2. And adding a flattening layer into the CNN model, and converting the multi-dimensional data obtained from the previous layer into one-dimensional data so that the CNN model can be directly transited from the convolution layer to the full connection layer. Adding a full connection layer, performing full connection on the output result of the upper layer, and transmitting the result to the next layer; in the first fully-connected layer, we set 100 neurons and take the ReLU function as the activation function of the model. Finally, adding a full connection layer for outputting a final prediction result of the model; in the full connection layer, 7 neurons are totally arranged, the 7 neurons respectively output prediction results of 7 types of different flow rates, and the activation function is a softmax function; as shown in fig. 19-22.

And step 3: training the model by using a training data set; the number of small batches of training samples is 628, with 5 batches being trained at a time.

And 4, evaluating the model. The performance evaluation criteria of the model mainly include the following seven categories, respectively: accuracy (ACC) of the model, false alarm rate (FPR), Loss rate (Loss), Precision (Precision), Recall (Recall), F1 value (F1-Measure), and running time of the model. In the experiment, an ROC curve graph of the model under the condition of multiple classifications is also given. The ROC curve is commonly used in the models of the second classification and is used for measuring the effect of the model classification; the area enclosed by the ROC curve and the lower coordinate axis is called AUC (area Under cut), and the larger the area of AUC is, the better the classification effect is. The deep learning models adopted in the experiment are all multi-classification models, and the ROC curve represents the classification effect of the models on each type of encrypted flow. The encryption traffic we used for classification in this experiment had 7 classes.

True Posives (TP): the number of samples which are positive examples and are successfully divided into the positive examples after model prediction.

False sites (FP): refers to the number of samples that are negative in nature, but after model prediction, are mistakenly considered positive.

False Negatives (FN): refers to the number of samples that are positive in nature, but are mistakenly considered negative after model prediction.

True Negatives (TN): the number of samples which are negative examples and are successfully divided into the negative examples after model prediction is carried out.

Accuracy (ACC) calculation formula:

false alarm rate (FPR) calculation formula:

precision rate (Precision) calculation formula:

recall (Recall) calculation:

f1 value (F1-Measure) calculation formula:

loss rate in multiple classifications (Loss) calculation:

(where i represents the sample, y represents the actual label, a represents the predicted output, and n represents the total number of samples)

And 5, analyzing experimental data.

	Loss	ACC	Precision	Recall	F1-score	FPR	Run time
								GRU	0.84	0.71	0.25	0.94	0.39	0.46	70.14s
LSTM	0.059	0.71	0.79	0.59	0.67	0.026	136.55s
								BiGRU	0.056	0.75	0.81	0.62	0.70	0.024	108.26s
BiLSTM	0.059	0.71	0.79	0.61	0.69	0.025	356.97s
								CNN	0.74	0.73	0.76	0.70	0.73	0.035	8.63s

From the results after the model operation, the BiGRU model has the best classification results, multiple performance indexes are in the excellent position in the five models, and only the operation time is slower than that of a single model, the GRU model shows slightly inferior performance in the experiment, has larger Loss value and false detection rate, and has poorer accuracy. The LSTM and the BilSTM models show the middle standard in the experiment, and the difference between the performance indexes of the LSTM and the BilSTM models is not large except for the long running time of the BilSTM models. The most surprising in the experiment was the CNN model; each performance in the CNN model has unusual performance, the operation time is relatively fast, the accuracy is relatively high, and a plurality of indexes exceed the RNN model sensitive to the time sequence; reversely observing different RNN models of different types, except that the classification performance of the GRU model is poor, the difference of the other three types of models is not large; these various phenomena are all shown: the possibility of influencing the classification effect of encrypted traffic is not only time, but also many factors that are temporarily unknown.

The CNN model with a good image processing effect shows excellent classification performance in the encryption flow classification, and even exceeds some RNN models with a good time sequence classification effect in some aspects. The results of the model comparisons indeed exceed the previous expectations, while also clearly showing: the factors influencing the classification effect of the encrypted traffic are not only time, but also a plurality of unknown factors exist, and the factors act together and influence the classification effect of the traffic. Classification of encrypted traffic is not a simple matter and requires subsequent constant learning and research to continue mining its potential connections.

The classification problem of the encrypted flow is like an opaque sandbox, and the specific structure and the operation principle in the sandbox are difficult to be fully penetrated. Only by continuous experiments, and by means of the conjecture, the internal correlation of the conjecture is found according to the experimental result, and the internal details of the conjecture are solved, so that further breakthrough can be made. The prospect of encrypted traffic classification is clear, and the inherent principles are believed to be broken down in the near future after continuous efforts and attempts by subsequent learners. The difficult problem of encrypted traffic classification is solved, and the era of 'big data' is attracting us.

Claims

1. The encryption flow classification method based on deep learning is characterized by comprising the following steps:

step 1: preprocessing network flow data;

step 1.1, importing data in an ISCX VPN 2016 data set by using a scipy.io library of Python;

step 1.2, dividing the data into a sample and a label after the data is successfully imported;

step 1.3, centralizing and normalizing the sample; centering means that the sample data is subtracted by the average value of the sample, so that the purpose that the sample overall presents normal distribution with 0 as the center is achieved; normalization is to divide sample data by a standard deviation of a sample, so that an absolute value of the sample data is changed into a decimal between 0 and 1, and the data is calculated more quickly, so that the convergence of a model is accelerated;

step 1.4, converting the label into a one-hot code format;

step 1.5, dividing the processed labels and samples into a training set and a testing set, and randomly disordering data in the sets;

and 2, step: designing and building a deep learning model;

step 2.1, building a GRU model; the GRU model comprises two GRU layers, a dropout layer and a dense layer; the first and second layers are GRU layers, each layer has 50 neurons; in the dropout layer, the inactivation factor is set to be 0.2, namely 20% of neurons are allowed to sleep randomly when passing through the dropout layer, so that the over-fitting effect of a prevention model is achieved; finally, adding a dense layer, carrying out full connection on the output of the upper layer, outputting 7 numbers of 0-1 as a result of predicting the traffic type, and activating a function to select a sigmoid function;

step 2.2, building an LSTM model; in the constructed LSTM model, 4 LSTM layers, 4 dropout layers, 1 full-connection layer and 1 activation layer are shared; firstly, four LSTM layers are provided, and each layer has 50 neurons; a dropout layer is arranged behind each LSTM layer, so that 20% of neurons are inactivated randomly, and the overfitting phenomenon of the model is prevented; in the first three layers of the LSTM layer, setting the parameter return _ sequences as True, aiming at enabling the output sequence in the LSTM layer to return hidden state values of all time steps, and being more beneficial to researching the space-time sequence relation between input data; finally, a layer of complete full-connection layer outputs 7 types of result prediction, and softmax is used as an activation function;

step 2.3, building a BiGRU model; in the constructed BiGRU model, 2 BiGRU layers, 1 timeDistributed layer, 1 full-junction layer and 1 globalaveragepoiling 1D layers are used; firstly, a BiGRU layer is formed, 80 neurons are arranged on each GRU layer, 160 neurons are formed in total, a ReLU function is used as an activation function, and a random inactivation factor is set to be 0.3, namely 30% of neurons are inactivated randomly, and the convergence rate of a model is increased; merge _ mode in the model refers to a combination mode of forward and backward GRU models, and is set as concat in an experiment, namely a splicing mode; a batch standardization layer is arranged immediately behind the BiGRU layer, and the purpose is to prevent the gradient of the model from disappearing and accelerate the convergence of the model; then adding a time distributed layer, carrying out full connection on the time dimension, respectively outputting a time step from the sequence for each input time step, setting the number of fully connected neurons as 80, and selecting a ReLU function as an activation function; adding a BiGRU layer and a batch standardization layer into the model, wherein the parameter configuration of each layer is the same as that of the BiGRU layer and the batch standardization layer which are configured for the first time; next, a full connection layer is formed, 7 neurons are totally arranged, the output of the upper layer is fully connected, 7 numbers between 0 and 1 are output, and softmax is selected as an activation function; finally, a GlobavalagePooling 1D layer is arranged, and the GlobavalagePooling 1 layer is used for applying global average value to the time domain signal and obtaining a final prediction result after pooling;

step 2.4, building a BilSTM model; in the constructed BiLSTM model, 2 BiLSTM layers, 1 LSTM layer and 1 full connection layer are arranged; firstly, introducing a BilSTM layer, wherein each LSTM layer is provided with 80 neurons, the number of the neurons is 160, and the dropout value is set to be 0.3, so that 30% of the neurons are inactivated randomly, and the over-fitting phenomenon is avoided; then introducing an LSTM layer, setting the dropout value to 0.3, randomly inactivating 30% of neurons, and accelerating the convergence speed of the model; adding a BiLSTM layer, wherein the parameter settings of the layer are the same as those of the BiLSTM layer added for the first time; finally, a fully-connected layer is added to be used as prediction result output of the model, 7 neurons are arranged in the fully-connected layer, and 7 prediction results with the number between 0 and 1 representing 7 flow types are finally output; the activating function adopts a softmax function;

step 2.5, building a CNN model; in the constructed CNN model, 2 DCNN layers, 1 dropout layer, 1 MaxPholing 1D layer, 1 flattening layer and 2 full-connection layers are arranged; adding two 1DCNN layers into a model, wherein the number of filters in each 1DCNN layer is 64, the number of convolution kernels is 3, the number of the activation functions is selected from a ReLU function, and the dropout value is set to be 0.1, namely, 10% of neurons are inactivated randomly, so that the convergence speed is increased, and the over-fitting phenomenon is prevented; adding a Max scaling 1D layer, reducing the dimension of the data, simplifying the calculated amount, and setting the number of pooling layers to be 2; adding a flattening layer into the CNN model, and converting the multi-dimensional data obtained from the previous layer into one-dimensional data so that the multi-dimensional data can be directly transited from the convolution layer to the full connection layer; adding a full-connection layer, performing full connection on the output result of the upper layer, and transmitting the result to the next layer; in a first layer of fully-connected layer, 100 neurons are arranged, and a ReLU function is used as an activation function of a model; finally, adding a full connection layer for outputting a final prediction result of the model; in the full connection layer, 7 neurons are totally arranged, the 7 neurons respectively output prediction results of 7 types of different flow rates, and the activation function is a softmax function;

and step 3: training the model by using a training data set; the sample size of the small batch of training is 628, and each time, 5 batches of training are carried out;

step 4, evaluating the model; the performance evaluation criteria of the model are the following seven categories, respectively: the accuracy ACC, the false alarm rate FPR, the Loss rate Loss, the accuracy Precision, the Recall rate Recall, the F1-Measure of the model and the running time of the model; the area enclosed by the ROC curve and the lower coordinate axis is called AUC, and the larger the area of AUC is, the better the classification effect is; the adopted deep learning models are multi-classification models, and the ROC curve represents the classification effect of the models on each type of encrypted flow; the encrypted traffic used for classification has 7 classes;

TP: the number of samples which are used as positive examples and are successfully divided into the positive examples after model prediction;

FP: the number of samples which are negative examples in the samples, but are mistakenly considered as positive examples after model prediction;

FN: the number of samples which are positive examples in nature but are mistakenly considered as negative examples after model prediction;

TN: the number of samples which are negative examples and are successfully divided into the negative examples after model prediction is carried out on the samples;

the accuracy ACC calculation formula is as follows:

a false alarm rate FPR calculation formula:

precision calculation formula:

recall ratio recalling calculation formula:

F1-Measure calculation formula:

loss rate Loss calculation formula:

where i represents the sample, y represents the actual label, a represents the predicted output, and n represents the total number of samples.

2. The encryption traffic classification method based on deep learning of claim 1 is characterized in that: the test set occupancy is 30% of the total data set.