CN112839051B - Encryption flow real-time classification method and device based on convolutional neural network - Google Patents

Encryption flow real-time classification method and device based on convolutional neural network Download PDF

Info

Publication number
CN112839051B
CN112839051B CN202110081372.1A CN202110081372A CN112839051B CN 112839051 B CN112839051 B CN 112839051B CN 202110081372 A CN202110081372 A CN 202110081372A CN 112839051 B CN112839051 B CN 112839051B
Authority
CN
China
Prior art keywords
byte
neural network
convolutional neural
layer
encrypted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110081372.1A
Other languages
Chinese (zh)
Other versions
CN112839051A (en
Inventor
张建标
赵宝霖
公备
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110081372.1A priority Critical patent/CN112839051B/en
Publication of CN112839051A publication Critical patent/CN112839051A/en
Application granted granted Critical
Publication of CN112839051B publication Critical patent/CN112839051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method and a device for classifying encrypted traffic in real time based on a convolutional neural network, wherein the method comprises the following steps: sampling a preset number of data packets in each encrypted flow; taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; and inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow. The method adopts the representation based on the frequency characteristic to the original byte information of the encrypted flow, and directly constructs the input characteristic instead of the original byte, thereby enhancing the learning effect of the convolutional neural network and having higher classification accuracy. In addition, the number of the sampled data packets can be adjusted according to the actual flow capturing condition, and the structure of a network model does not need to be redesigned, so that the method has better applicability. Because the frequency characteristic of byte pairs is adopted, fewer data packets are required for classification, and the real-time property of data classification is facilitated.

Description

Encryption flow real-time classification method and device based on convolutional neural network
Technical Field
The invention relates to the technical field of computer network security, in particular to an encrypted flow real-time classification method and device based on a convolutional neural network.
Background
With the application of Virtual Private Networks (VPN) in campus networks and enterprise networks, users can rely on encryption protocols to ensure that their own information is not snooped, and in this context, a large amount of encrypted traffic is transmitted over the network, and the encrypted traffic gradually becomes a non-negligible part of the network traffic. However, the encrypted traffic brings difficulty to traffic control of the exit routers of the networks, for example, encrypted P2P transmission is difficult to be perceived by the routers, a large amount of bandwidth is occupied, and a targeted control strategy is difficult to implement; on the other hand, the privacy of the encrypted communication also protects malicious software and lawbreakers, so that the malicious behavior of the encrypted communication can bypass the security detection of campus networks and enterprise networks, and great potential safety hazards are brought to the networks. Therefore, how to classify encrypted traffic of a virtual private network becomes a key issue in the network technology field.
The prior art attempts to use an encryption traffic classification method based on manual extraction features and machine learning, but the available features of encryption traffic are few, and the manual extraction features cannot obtain high classification accuracy. Some methods of classifying by means of time characteristics are susceptible to interference traffic, resulting in classification errors. Under the background, some classification methods relying on deep learning start to emerge, the deep learning has the capability of automatic characterization, self-learning features can be found from encrypted data, and the method has universal applicability to similar input.
Most of the current encryption traffic classification technologies based on deep learning aim at improving the accuracy of encryption traffic classification, and neglect whether the technology is suitable for real-time classification, which is an important application scenario of traffic classification in QoS. Real-time classification first requires that only a small amount of data be sampled for accurate classification during the initial stages of encrypted transmission. Secondly, the prior art is limited by a trained convolutional neural network model when sampling encrypted traffic, and the sampling length cannot be adjusted in real time according to actual capturing conditions by using fixed-length sampling, so that the applicability is poor.
The current method mainly uses a sampling strategy with fixed length, the sampling range cannot be adjusted after model training, the flexibility is lacked, and the classification accuracy is low.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides an encryption flow real-time classification method and device based on a convolutional neural network.
The invention provides a convolution neural network-based encryption traffic real-time classification method, which comprises the following steps: sampling a preset number of data packets in each encrypted flow; taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
According to an embodiment of the invention, the method for classifying the encrypted traffic in real time based on the convolutional neural network comprises the following steps: determining the universality weight of the corresponding byte pair according to the number of the sampled data packets containing any byte pair and the total number of the data packets; and obtaining the frequency characteristics of the byte pairs after frequency weighting of the times of each byte pair according to the universality weight.
According to an embodiment of the invention, the method for classifying encrypted traffic in real time based on convolutional neural network determines the universality weight of the corresponding byte pair according to the number of sampled data packets and the total number of data packets containing any byte pair, and comprises the following steps:
wherein p is b The number of byte pairs b in the sampled data packet is n, which is the total number of the sampled data packets.
According to an embodiment of the present invention, the method for classifying encrypted traffic in real time based on convolutional neural network further includes, before sampling a preset number of data packets from each encrypted traffic: and determining each encrypted flow according to the source IP address, the source port, the destination IP address, the destination port and the transport layer protocol.
According to an embodiment of the invention, the method for classifying the encrypted traffic based on the convolutional neural network in real time inputs the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and comprises the following steps: the frequency characteristics of all byte pairs are distributed in 256-256 characteristic matrixes after normalization, and the front byte and the rear byte respectively correspond to the row and column indexes of the characteristic matrixes; inputting the feature matrix into a pre-trained convolutional network model.
According to an embodiment of the invention, the method for classifying the encrypted traffic based on the convolutional neural network in real time inputs the feature matrix into a pre-trained convolutional network model, comprises the following steps: inputting the feature matrix into a four-layer feature extraction network of a pre-trained convolutional network model for feature extraction, and then inputting the feature matrix into a full-connection layer and an output layer to obtain a classification prediction result; wherein each feature extraction network includes a convolution layer, a batch normalization layer (Batch normalization layer), and a pooling layer, respectively.
According to an embodiment of the invention, the method for classifying the encrypted traffic in real time based on the convolutional neural network comprises the following steps: chat, video, voice, P2P, file transfer, email, VPN chat, VPN video, VPN voice, VPNP2P, VPN file transfer, VPN email.
The invention also provides an encryption flow real-time classification device based on the convolutional neural network, which comprises: the acquisition module is used for sampling a preset number of data packets from each encrypted flow; the extraction module is used for taking the sampled data packet as a byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; the processing module is used for inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model and outputting the data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps of the encryption traffic real-time classification method based on the convolutional neural network when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the convolutional neural network-based encrypted traffic real-time classification method as described in any one of the above.
According to the method and the device for classifying the encrypted flow in real time based on the convolutional neural network, provided by the invention, the original byte information of the encrypted flow is represented based on the frequency characteristic, and the input characteristic is directly constructed instead of the original byte, so that the learning effect of the convolutional neural network is enhanced, and the classification accuracy is higher. In addition, the number of the sampled data packets can be adjusted according to the actual flow capturing condition, and the structure of a network model does not need to be redesigned, so that the method has better applicability. Meanwhile, due to the adoption of the frequency characteristic of byte pairs, fewer data packets are needed for classification, and the real-time performance of data classification is facilitated.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of an encryption flow real-time classification method based on a convolutional neural network;
FIG. 2 is a schematic diagram of a frequency characteristic matrix representation method provided by the invention;
FIG. 3 is a schematic diagram of a convolutional neural network provided by the present invention;
FIG. 4 is a second flow chart of the method for classifying encrypted traffic in real time based on convolutional neural network according to the present invention;
fig. 5 is a schematic structural diagram of an encrypted traffic real-time classification device based on a convolutional neural network;
fig. 6 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
On the exit node routers of the campus network and the enterprise network, the intercepted traffic is classified in real time according to application type labels so as to identify common traffic, VPN traffic and service types of the common traffic and VPN traffic and assist the routers in real-time traffic control. In a real-time classification scene, only a small part of encrypted traffic data can be accessed, so the invention particularly provides an encrypted traffic representation method based on frequency characteristics, and a Convolutional Neural Network (CNN) is used for classification, and the method can access any n (n=1, 2, …) data packets in an encrypted session, can perform high-accuracy classification, and provides a complete solution for lightweight real-time classification.
The following describes a method and a device for classifying encrypted traffic in real time based on a convolutional neural network with reference to fig. 1 to 6. Fig. 1 is a schematic flow chart of an encryption traffic real-time classification method based on a convolutional neural network, and as shown in fig. 1, the invention provides an encryption traffic real-time classification method based on a convolutional neural network, which comprises the following steps:
101. a predetermined number of data packets are sampled from each encrypted traffic.
For each encrypted traffic, a preset number n of data packets, such as n consecutive data packets, are sampled from the encrypted traffic. When the number of data packets contained in the encrypted traffic is less than n, all k (k is more than or equal to 1 and less than or equal to n) data packets can be directly sampled.
102. And taking the sampled data packet as a byte stream, taking any two connected bytes as a byte pair, and determining the frequency characteristics of all byte pairs.
The sampled data packet is data encrypted by a transmission protocol or a tunneling technique, transmitted in binary form, and the byte is a data unit composed of 8-bit binary data. The sampled data packets are represented in byte stream form. If two consecutive bytes (called byte pairs) are denoted b i b i+1 B is easy to know according to the one-to-one correspondence between binary numbers and decimal numbers i b i+1 Corresponds to a value at [0,65535 ]]Decimal number b, systemCounting the frequencies of occurrence of byte pairs in a data string, each byte pair having a different decimal value yields one frequency, yielding 65536 frequencies in total. Further, the frequency of byte pairs may be determined according to the following formula.
Wherein m represents the total number of bytes of the byte stream, c b Byte pair b representing a value b i b i+1 The number of occurrences in the byte stream, both values are obtained by traversing the byte stream.
Various correlation processes can be performed according to the frequency, and finally the frequency characteristic of each byte pair is obtained.
103. And inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow. The pre-trained convolutional neural network model is obtained by training after sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
After the data preprocessing stage, the original data is converted into frequency characteristics taking conversation as a unit, and can be converted into a characteristic matrix form to be used as the input of a convolutional neural network in the subsequent stage so as to predict the data type. Correspondingly, the convolutional neural network model is obtained by performing label marking and frequency characteristic determination on samples of the same known data type and performing multiple training, so that the type of the corresponding data stream can be obtained according to the input frequency characteristic.
As an alternative embodiment, the data stream types include: chat, video, voice, P2P, file transfer, email, VPN chat, VPN video, VPN voice, VPNP2P, VPN file transfer, VPN email. Of course, the classification may be performed according to specific requirements, and the present invention is not limited thereto.
According to the method for classifying the encrypted flow in real time based on the convolutional neural network, disclosed by the invention, the original byte information of the encrypted flow is represented based on the frequency characteristic, and the input characteristic is directly constructed instead of the original byte, so that the learning effect of the convolutional neural network is enhanced, and the classification accuracy is higher. In addition, the number of the sampled data packets can be adjusted according to the actual flow capturing condition, and the structure of a network model does not need to be redesigned, so that the method has better applicability. Meanwhile, due to the adoption of the frequency characteristic of byte pairs, fewer data packets are needed for classification, and the real-time performance of data classification is facilitated.
In one embodiment, the determining the frequency characteristics of all byte pairs includes: determining the universality weight of the corresponding byte pair according to the number of the sampled data packets containing any byte pair and the total number of the data packets; and obtaining the frequency characteristics of the byte pairs after frequency weighting of the times of each byte pair according to the universality weight.
Calculation b i b i+1 The greater the popularity weight in a sampled data packet, the more b is represented by i b i+1 The more widely distributed, 65536 popularity weights are available. The frequency signature is then determined for each byte pair weighted with a universality weight. The frequency characteristics of each byte pair can be quantized more accurately by weighting the universal weights, and the recognition accuracy is improved.
In one embodiment, the determining the popularity weight of the corresponding byte pair according to the number of the sampled data packets and the total number of the data packets containing any byte pair includes:
wherein p is b The number of byte pairs in the sampled data packet is n, and n is the total number of the sampled data packets.
On the basis, calculate byte pair b i b i+1 Can be determined according to the following equation:
D b =F b *U b
in one embodiment, before sampling the preset number of data packets from each encrypted traffic, the method further includes: and determining each encrypted flow according to the source IP address, the source port, the destination IP address, the destination port and the transport layer protocol.
The encrypted traffic is split and may be represented in terms of five tuples (source IP address, source port, destination IP address, destination port and transport layer protocol).
In one embodiment, the frequency characteristics of all byte pairs are input into a pre-trained convolutional neural network model, further comprising: the frequency characteristics of all byte pairs are distributed in 256-256 characteristic matrixes after normalization, and the front byte and the rear byte respectively correspond to the row and column indexes of the characteristic matrixes; inputting the feature matrix into a pre-trained convolutional network model.
The frequency characteristics of byte pair b are normalized to the [0,1] interval according to the following formula, and stored in a matrix of 256 x 256 size. The front and rear bytes in the byte pair correspond to the row and column indexes of the storage location, respectively. Fig. 2 is a schematic diagram of a frequency characteristic matrix representation method provided by the present invention, and a frequency characteristic matrix of a sample is shown in fig. 2. One implementation of normalization is as follows:
in one embodiment, the inputting the feature matrix into a pre-trained convolutional network model comprises: inputting the feature matrix into a four-layer feature extraction network of a pre-trained convolutional network model for feature extraction, and then inputting the feature matrix into a full-connection layer and an output layer to obtain a classification prediction result; each feature extraction network comprises a convolution layer, a batch normalization layer and a pooling layer.
FIG. 3 is a schematic diagram of a convolutional neural network structure provided by the invention, and as shown in FIG. 3, a convolutional neural network model is mainly divided into six layers, namely a first convolutional layer, a first batch normalization layer and a first pooling layer; a second convolution layer, a second batch normalization layer and a second pooling layer; a third convolution layer, a third batch normalization layer, and a third pooling layer; a fourth convolution layer, a fourth batch normalization layer and a fourth pooling layer; a fifth full-connection layer; and a sixth output layer.
In addition, in the training process, various parameters of the model can be set according to evaluation values (such as accuracy, F1 score and the like) of the training stage, wherein the parameters comprise the size of convolution kernels, the number of convolution kernels of each layer, the parameters of a pooling layer and the parameters of a full-connection layer. Tags of the network model training data are used as output.
Fig. 4 is a second flow chart of the method for classifying encrypted traffic in real time based on convolutional neural network. The method for classifying the real-time encrypted traffic based on the convolutional neural network is described in detail below with reference to the flow chart.
Take the example of an encryption traffic application type classification for ISCX VPN2016 data set. The data set contains 12 types of traffic generated under VPN encryption and transport protocol encryption, which are chat (chat and vpn_chat), file transfer (file and vpn_file), P2P transfer (P2P and vpn_p2p), stream (stream and vpn_stream), voice over network (VoIP and vpn_voip), email (email and vpn_email), respectively.
Step 1: session segmentation and sampling.
Step 1.1: and cutting the original file by using a tool split Cap, wherein the cutting option is session, and dividing the original flow according to the same source IP, source port, destination IP, destination port and transmission level protocol (the source IP and the destination IP can be mutually exchanged), so as to obtain different sessions and store the sessions as a pcap format.
Step 1.2: using the binary reading function of Python, reading the data part in the pcap file, intercepting 3 (n=3) continuous data packets from the original traffic as data samples of each session, deleting the MAC header and the IP address field of the sample data packets, and preventing the model from being overfitted.
Step 2: and generating a frequency characteristic matrix.
Step 2.1: the samples are read in the form of bytes, and the word frequency (TF) of byte pairs is counted according to formula (1) in the order of the original bytes.
Wherein k (1 is less than or equal tok is less than or equal to 3) the number of data packets actually contained in the sample, m is the total number of bytes of the sample, and c b Representing the number of times byte pair b occurs in the sample.
Step 2.2: calculating an Inverse Document Frequency (IDF) according to equation (2)
Wherein p is b (p b And k) represents the number of data packets of a certain byte pair b in the sample.
Step 2.3: the word frequency-inverse document frequency of byte pair b is calculated according to equation (3).
TF_IDF b =TF b *IDF b (3)
Step 2.4: the frequency characteristics of the byte pairs b are normalized to the [0,1] interval according to formula (4) and stored in a matrix of 256 x 256 size. The front and back bytes in the byte pair correspond to the row and column indexes of the storage position respectively, and the frequency characteristic matrix of the sample is shown in fig. 2. The normalization is as follows:
wherein, TF_IDF min Representing the smallest frequency eigenvalue in the matrix, tf_idf max Representing the largest frequency eigenvalue in the matrix. And finally, saving the frequency characteristic matrix as a gray level map in the png format.
Step 3: construction of convolutional neural network classification model and parameter setting
Convolutional neural network extraction features: building a convolutional neural network based on a Pytorch library, wherein the network is shown in fig. 3, and the dimensions of each layer are as follows: the network comprises four convolution units, the convolution kernel size is 3x3, the number of channels of a convolution layer, a batch normalization layer and a pooling layer in the same layer is equal, the channels are 64, 128, 256 and 256 in sequence, and the activation function is Relu; comprising two fully connected layers of size 1024, 12.
Training of the network is required before using CNN classification. Dividing a png format picture generated by a data set into 10 equal parts, and performing 10-fold cross validation, wherein the occupation ratio of a training set, a validation set and a test set for each validation is 8:1:1. training uses a random gradient descent (SGD) algorithm containing momentum, with a learning rate set to 0.001 and a momentum parameter set to 0.8. The loss function is the cross entropy of the predicted label and the actual label, and the iteration number is set to be 50.
After the prediction accuracy of the model converged, testing was performed using test set data, simulating real-time classification, with an average accuracy of 94.90% and an F1 score of 0.948.
The device for classifying the encrypted traffic based on the convolutional neural network in real time is described below, and the device for classifying the encrypted traffic based on the convolutional neural network in real time and the method for classifying the encrypted traffic based on the convolutional neural network in real time described above can be correspondingly referred to each other.
Fig. 5 is a schematic structural diagram of an encrypted traffic real-time classification device based on a convolutional neural network according to an embodiment of the present invention, and as shown in fig. 5, the encrypted traffic real-time classification device based on a convolutional neural network includes: an acquisition module 501, an extraction module 502 and a processing module 503. Wherein, the acquisition module 501 is configured to sample a preset number of data packets from each encrypted traffic; the extraction module 502 is configured to take the sampled data packet as a byte stream, connect two bytes arbitrarily as a byte pair, and determine frequency characteristics of all byte pairs; the processing module 503 is configured to input the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and output a data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
The embodiment of the device provided by the embodiment of the present invention is for implementing the above embodiments of the method, and specific flow and details refer to the above embodiments of the method, which are not repeated herein.
The device for classifying the encrypted flow in real time based on the convolutional neural network provided by the embodiment of the invention adopts the representation based on the frequency characteristic to the original byte information of the encrypted flow instead of directly constructing the input characteristic by the original byte, thereby enhancing the learning effect of the convolutional neural network and having higher classification accuracy. In addition, the number of the sampled data packets can be adjusted according to the actual flow capturing condition, and the structure of a network model does not need to be redesigned, so that the method has better applicability. Meanwhile, due to the adoption of the frequency characteristic of byte pairs, fewer data packets are needed for classification, and the real-time performance of data classification is facilitated.
Fig. 6 is a schematic structural diagram of an electronic device according to the present invention, and as shown in fig. 6, the electronic device may include: processor 601, communication interface (Communications Interface) 602, memory 603 and communication bus 604, wherein processor 601, communication interface 602, memory 603 complete the communication between each other through communication bus 604. The processor 601 may invoke logic instructions in the memory 603 to perform a convolutional neural network based method of classifying encrypted traffic in real time, the method comprising: sampling a preset number of data packets in each encrypted flow; taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
Further, the logic instructions in the memory 603 described above may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, are capable of performing the method for classifying encrypted traffic based on convolutional neural networks provided by the above methods, the method comprising: sampling a preset number of data packets in each encrypted flow; taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
In still another aspect, the present invention further provides a non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, is implemented to perform the method for classifying encrypted traffic based on a convolutional neural network provided in the above embodiments, the method comprising: sampling a preset number of data packets in each encrypted flow; taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs; inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow; the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. The method for classifying the encrypted traffic in real time based on the convolutional neural network is characterized by comprising the following steps of:
sampling a preset number of data packets from each encrypted flow;
taking the sampled data packet as byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs;
inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model, and outputting the data stream type of each encrypted flow;
the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label;
the data stream types include chat, video, voice, P2P, file transfer, email, VPN chat, VPN video, VPN voice, VPNP2P, VPN file transfer, VPN email; the convolutional neural network model comprises six layers, namely a first convolutional layer, a first batch normalization layer and a first pooling layer; a second convolution layer, a second batch normalization layer and a second pooling layer; a third convolution layer, a third batch normalization layer, and a third pooling layer; a fourth convolution layer, a fourth batch normalization layer and a fourth pooling layer; a fifth full-connection layer; and a sixth output layer.
2. The method for classifying encrypted traffic in real time based on convolutional neural network according to claim 1, wherein said determining the frequency characteristics of all byte pairs comprises:
determining the universality weight of the corresponding byte pair according to the number of the sampled data packets containing any byte pair and the total number of the data packets;
and obtaining the frequency characteristics of the byte pairs after frequency weighting of the times of each byte pair according to the universality weight.
3. The method for classifying encrypted traffic in real time based on convolutional neural network according to claim 2, wherein said determining the popularity weight of the corresponding byte pair according to the number of sampled data packets and the total number of data packets containing any byte pair comprises:
wherein p is b For sampling byte pairs in a data packetb, n is the total number of the sampled data packets.
4. The method for classifying encrypted traffic in real time based on convolutional neural network according to claim 1, wherein before sampling a preset number of data packets from each encrypted traffic, the method further comprises:
and determining each encrypted flow according to the source IP address, the source port, the destination IP address, the destination port and the transport layer protocol.
5. The method for classifying encrypted traffic in real time based on convolutional neural network according to claim 1, wherein the inputting the frequency characteristics of all byte pairs into the pre-trained convolutional neural network model comprises:
the frequency characteristics of all byte pairs are distributed in 256-256 characteristic matrixes after normalization, and the front byte and the rear byte respectively correspond to the row and column indexes of the characteristic matrixes;
inputting the feature matrix into a pre-trained convolutional network model.
6. An encrypted traffic real-time classification device based on a convolutional neural network is characterized by comprising:
the acquisition module is used for sampling a preset number of data packets from each encrypted flow;
the extraction module is used for taking the sampled data packet as a byte stream, arbitrarily connecting two bytes as a byte pair, and determining the frequency characteristics of all byte pairs;
the processing module is used for inputting the frequency characteristics of all byte pairs into a pre-trained convolutional neural network model and outputting the data stream type of each encrypted flow;
the pre-trained convolutional neural network model is obtained by sampling and extracting frequency characteristics according to the encrypted flow taking the known data flow type as a label;
the data stream types include chat, video, voice, P2P, file transfer, email, VPN chat, VPN video, VPN voice, VPNP2P, VPN file transfer, VPN email; the convolutional neural network model comprises six layers, namely a first convolutional layer, a first batch normalization layer and a first pooling layer; a second convolution layer, a second batch normalization layer and a second pooling layer; a third convolution layer, a third batch normalization layer, and a third pooling layer; a fourth convolution layer, a fourth batch normalization layer and a fourth pooling layer; a fifth full-connection layer; and a sixth output layer.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the convolutional neural network-based method of classifying encrypted traffic in real time as claimed in any one of claims 1 to 5 when the program is executed.
8. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the convolutional neural network-based encrypted traffic real-time classification method of any one of claims 1 to 5.
CN202110081372.1A 2021-01-21 2021-01-21 Encryption flow real-time classification method and device based on convolutional neural network Active CN112839051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110081372.1A CN112839051B (en) 2021-01-21 2021-01-21 Encryption flow real-time classification method and device based on convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110081372.1A CN112839051B (en) 2021-01-21 2021-01-21 Encryption flow real-time classification method and device based on convolutional neural network

Publications (2)

Publication Number Publication Date
CN112839051A CN112839051A (en) 2021-05-25
CN112839051B true CN112839051B (en) 2023-11-03

Family

ID=75929273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110081372.1A Active CN112839051B (en) 2021-01-21 2021-01-21 Encryption flow real-time classification method and device based on convolutional neural network

Country Status (1)

Country Link
CN (1) CN112839051B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114124437B (en) * 2021-09-28 2022-09-23 西安电子科技大学 Encrypted flow identification method based on prototype convolutional network
CN114254171B (en) * 2021-12-20 2024-07-23 湖北天融信网络安全技术有限公司 Data classification method, model training method, device, terminal and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111147396A (en) * 2019-12-26 2020-05-12 哈尔滨工程大学 Encrypted flow classification method based on sequence characteristics

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111147396A (en) * 2019-12-26 2020-05-12 哈尔滨工程大学 Encrypted flow classification method based on sequence characteristics

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Deep Packet: A Novel Approach For Encrypted Traffic Classification Using Deep Learning;Mohammad Lotfollahi等;《arXiv:1709.02656v3》;20180704;摘要、第6页 *
TF-IDF提取文章关键词算法;修炼之路;《CSDN》;20170806;第1页 *
基于n-gram多特征的流量载荷类型分类方法;丁杰等;《计算机应用与软件》;20170228;第34卷(第2期);第153~157页 *

Also Published As

Publication number Publication date
CN112839051A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
CN112163594B (en) Network encryption traffic identification method and device
CN109951444B (en) Encrypted anonymous network traffic identification method
CN110311829B (en) Network traffic classification method based on machine learning acceleration
TWI769754B (en) Method and device for determining target business model based on privacy protection
CN112235264B (en) Network traffic identification method and device based on deep migration learning
CN112839051B (en) Encryption flow real-time classification method and device based on convolutional neural network
WO2023056808A1 (en) Encrypted malicious traffic detection method and apparatus, storage medium and electronic apparatus
CN111835763B (en) DNS tunnel traffic detection method and device and electronic equipment
CN116647411B (en) Game platform network security monitoring and early warning method
CN112054967A (en) Network traffic classification method and device, electronic equipment and storage medium
CN114050912B (en) Malicious domain name detection method and device based on deep reinforcement learning
Islam et al. Network anomaly detection using lightgbm: A gradient boosting classifier
CN114338064A (en) Method, device, equipment and storage medium for identifying network traffic type
CN113408707A (en) Network encryption traffic identification method based on deep learning
US11461590B2 (en) Train a machine learning model using IP addresses and connection contexts
CN113746707B (en) Encrypted traffic classification method based on classifier and network structure
CN113938315A (en) Hidden channel detection method, device, equipment and storage medium
CN114338437A (en) Network traffic classification method and device, electronic equipment and storage medium
WO2023098222A1 (en) Multi-service scenario identification method and decision forest model training method
CN116192997B (en) Event detection method and system based on network flow
CN117527434A (en) Model training method, asset identification method, device, equipment and medium
CN112733689B (en) HTTPS terminal type classification method and device
CN115694947B (en) Network encryption traffic threat sample generation mechanism method based on countermeasure generation DQN
Kang Malicious encrypted traffic detection based on Bert and one-dimensional CNN model
Shaked et al. Sequence preserving network traffic generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant