CN114448905B - Encryption traffic identification method, system, terminal and storage medium - Google Patents

Encryption traffic identification method, system, terminal and storage medium Download PDF

Info

Publication number
CN114448905B
CN114448905B CN202011231169.XA CN202011231169A CN114448905B CN 114448905 B CN114448905 B CN 114448905B CN 202011231169 A CN202011231169 A CN 202011231169A CN 114448905 B CN114448905 B CN 114448905B
Authority
CN
China
Prior art keywords
data packet
network
traffic
network traffic
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011231169.XA
Other languages
Chinese (zh)
Other versions
CN114448905A (en
Inventor
叶可江
林鹏
胡奕绅
须成忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN202011231169.XA priority Critical patent/CN114448905B/en
Publication of CN114448905A publication Critical patent/CN114448905A/en
Application granted granted Critical
Publication of CN114448905B publication Critical patent/CN114448905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application relates to an encrypted traffic identification method, an encrypted traffic identification system, a terminal and a storage medium. Comprising the following steps: acquiring network flow data packets, learning the relevance among byte contents of each network flow data packet, and training a neural network model with data packet coding capability; encoding byte content of the network traffic data packet by the neural network model; learning time sequence relations among the coded network traffic data packets, obtaining characteristic representations of the network traffic data packets, and learning length information of the network traffic data packets; and fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet. The application can ensure that the neural network can learn the original byte information and the length information of the data packet, and achieve better encryption flow identification effect while maintaining the information integrity of the data packet.

Description

Encryption traffic identification method, system, terminal and storage medium
Technical Field
The application belongs to the technical field of traffic identification, and particularly relates to an encrypted traffic identification method, an encrypted traffic identification system, a terminal and a storage medium.
Background
Traffic identification, which aims at classifying different network traffic into suitable categories, is a fundamental task in network management and network space security. The traditional traffic identification method mainly adopts a method based on port numbers, and the method carries out port matching according to a list provided by IANA (INTERNET ASSIGNED Numbers Authority, internet number distribution bureau) to determine the type of traffic. But this approach has become unreliable as more and more applications masquerade using dynamically allocated ports or generic communication protocol ports. Meanwhile, with the increasing awareness of security and privacy of people, most of application traffic is currently encrypted through various encryption protocols, such as IPsec, SSL/TLS, SSH, etc., which makes the traditional traffic classification method ineffective.
In recent years, some students use flow characteristics (such as data packet message types, packet length sequences, statistical characteristics and the like) of encrypted flows to model in combination with a machine learning method, so that a certain effect is achieved. The method specifically comprises the following steps:
1. A classification method based on message type; the header portion of each SSL/TLS has a field identifying the message type of the packet, which can abstract the packet sequence into a sequence of message types with different probability transition relationships between different classes of message types. The method based on the message type is to learn state transition matrixes of different message types by establishing a Markov model of the message type. However, considering the computational problem, the message sequence based approach can basically only be trained using a first or second order markov model, that is, it can only be trained using 2 or 3 time steps of data, so the learned time information is very limited. At the same time, since the number of message types is very small, this results in a sequence of similar message types between many different traffic, overlapping message types can result in a low differentiation between traffic, and different categories of traffic of similar message types will not be exactly separated. In addition, for the method of combining handshake information, not all SSL/TLS traffic will contain this information in the real scenario: when the session just lost is recovered in a short time, the client and the server do not need to carry out handshake again, and the network traffic does not contain handshake packet information.
2. A classification method based on a length sequence; the length sequence based approach is similar to the message type based approach in that it abstracts the network stream into a length sequence and then models the sequence using a Markov model, or other machine learning approach. The disadvantage of this method is that: merely representing a packet by its length is obviously a very naive simplification and tends to lose a lot of detail. When the lengths of the packets are the same or close (e.g., packet fragmentation at the IP layer), the length sequence will lose differentiation.
3. A method based on statistical features; the main idea of this type of method is to extract the flow level of the network packets to represent a communication flow, and then classify it in combination with other machine learning algorithms. These statistical features typically include the average size, average interval, transmission rate, etc. of the data packets, and there are many open source tools that provide the extraction of these features. The disadvantage of this method is that: (1) Features are highly abstract such that fine-grained operations are not possible (e.g., learning the association between two packets); (2) Extracting the flow statistics requires setting a listening interval, say 10s,15s, which makes real-time traffic classification impossible.
Disclosure of Invention
The application provides an encryption traffic identification method, an encryption traffic identification system, a terminal and a storage medium, which aim to solve at least one of the technical problems in the prior art to a certain extent.
In order to solve the problems, the application provides the following technical scheme:
An encrypted traffic identification method comprising the steps of:
acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
Encoding byte content of the network traffic data packet by the neural network model;
Using a transducer to learn the time sequence relation among the coded network traffic data packets, obtaining the characteristic representation of each network traffic data packet, and using a bidirectional LSTM to learn the length information of each network traffic data packet;
And fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the training of a neural network model with data packet encoding capability includes:
and constructing an encryption traffic identification model, wherein the encryption traffic identification model comprises a pre-training layer, a data packet coding layer, a time sequence layer, a supplement layer and a classification layer, and training the neural network model at the pre-training layer of the encryption traffic identification model.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the training of a neural network model with data packet encoding capability includes:
All network traffic packets are grouped according to the same five-tuple: dividing the source IP, the target IP, the source port, the target port and the transmission protocol, wherein each group represents a bidirectional communication flow;
Extracting byte content of each network flow data packet above an IP layer, and converting the extracted byte content into a 16-system file;
Randomly masking byte contents in each 16-system file according to a set proportion, and adding a [ PACKET ] mark into the head of each file respectively;
Learning associations between byte content of the individual network traffic packets using Transformer Encoder and recovering masked byte content, training the neural network model using cross entropy as a loss function.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the encoding byte content of the network traffic data packet by the neural network model includes:
Extracting byte content of each network flow data packet above an IP layer at a data packet coding layer of the encryption flow identification model, and converting the extracted byte content into a 16-system file;
Respectively adding a [ PACKET ] mark into the head of each 16-system file, and cutting or filling the byte content of each 16-system file to a preset length;
And after the byte content of each 16-system file is respectively encoded by using the neural network model, using a vector corresponding to each [ PACKET ] tag as a vector representation of a corresponding network flow data PACKET.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the step of using a transducer to learn the time sequence relation among the encoded network traffic data packets, and the step of obtaining the characteristic representation of each network traffic data packet comprises the following steps:
At the time sequence layer of the encryption traffic identification model, vector e i of each network traffic data packet is processed by Transformer Encoder respectively, so that information of other network traffic data packets is fused, and a new vector representation v i=Transformer(ei of each network traffic data packet is obtained;
Splicing the vector representation v i of each network flow data packet to obtain the characteristic representation h 1=Concat(v1,v2,…,vm)Wo of each network flow data packet; where W o is the weight matrix of the neural network, concat represents stitching the two vectors.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the learning the length information of each network traffic packet using the bidirectional LSTM includes:
Extracting original length information of each network traffic data packet at a complementary layer of the encryption traffic recognition model respectively, and constructing a length sequence :L={l1,l2,…,lm}={length(p1),length(p2),…,length(pm)};, wherein l i represents the length of the data packet, p i represents the ith data packet, and length () represents the length information of the extracted data packet;
The length sequence L is learned using a bi-directional LSTM to obtain length information h2=Concat(LSTM→(l1,l2,…,lm),LSTM←(l1,l2,…,lm)); for each network traffic packet, where L i represents the length of the packet and Concat represents the concatenation of the two vectors.
The technical scheme adopted by the embodiment of the application further comprises the following steps: the fusing the characteristic representation and the length information of each network traffic data packet, and the Softmax classification includes:
At a classification layer of the encrypted traffic identification model, performing full connection and Softmax classification on h 1 in the time sequence layer to obtain a predicted value gamma 1: and calculates the cross entropy loss function value loss 1; wherein W, b is the parameter to be learned by the neural network;
Full ligation and Softmax classification of h 2 in the supplementary layer resulted in the predicted value γ 2: and calculates the cross entropy loss function value loss 2;
splicing the h 1、h1 to obtain h 3=Concat(h1,h2);
full ligation and Softmax classification of h 3 gave the predicted value γ 3: And calculates the cross entropy loss function value loss 3;
calculate the sum of loss 1、loss2、loss3 And updating network parameters of the encrypted traffic identification model by adopting a gradient descent algorithm according to the calculation result.
The embodiment of the application adopts another technical scheme that: an encrypted traffic identification system comprising:
The pre-training module: the method comprises the steps of acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
and a data packet coding module: encoding byte content of the network traffic data packet by the neural network model;
And the characteristic learning module is used for: the method comprises the steps of using a transducer to learn time sequence relations among all encoded network traffic data packets and obtaining characteristic representations of all network traffic data packets;
and a length learning module: learning length information of each network traffic packet using a bidirectional LSTM;
Fusion and classification module: and the method is used for fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet.
The embodiment of the application adopts the following technical scheme: a terminal comprising a processor, a memory coupled to the processor, wherein,
The memory stores program instructions for implementing the encrypted traffic identification method;
The processor is configured to execute the program instructions stored by the memory to control encrypted traffic identification.
The embodiment of the application adopts the following technical scheme: a storage medium storing program instructions executable by a processor for performing the encrypted traffic identification method.
Compared with the prior art, the embodiment of the application has the beneficial effects that: according to the encrypted network traffic identification method, the encrypted network traffic identification system, the encrypted network traffic identification terminal and the encrypted network traffic identification storage medium, through the unsupervised pre-training method that part of data packet byte contents are randomly covered and recovered through a transducer, the relevance among different data packets can be well learned, network traffic bytes are better expressed, and therefore better data packet coding capacity is achieved; the method has the advantages that the characteristic representation, the length information and the fusion of the characteristic representation and the length information are respectively subjected to one-time loss function value calculation, and the network parameters are updated in a gradient manner by using the sum of the three loss function values, so that the neural network can learn the original byte information and the length information of the data packet, the network performance is improved, and the better encryption flow identification effect is achieved while the information integrity of the data packet is maintained.
Drawings
FIG. 1 is a flow chart of an encrypted traffic identification method according to a first embodiment of the present application;
FIG. 2 is a flow chart of an encrypted traffic identification method according to a second embodiment of the present application;
FIG. 3 is a schematic diagram of an encrypted traffic identification model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an encrypted traffic identification system according to an embodiment of the present application;
Fig. 5 is a schematic diagram of a terminal structure according to an embodiment of the present application;
Fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Aiming at the defects of the prior art, the encryption traffic identification method of the embodiment of the application provides a data packet level end-to-end encryption traffic identification framework, and the framework adopts a strategy of data packet original byte + length sequence, adopts an unsupervised traffic pre-training and self-attention mechanism method to learn deep connection among data packets so as to keep the information integrity of the data packets and achieve better identification effect.
Specifically, please refer to fig. 1, which is a flowchart of an encrypted traffic identification method according to a first embodiment of the present application. The encrypted traffic identification method of the first embodiment of the present application includes the steps of:
S1: acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
S2: encoding byte content of the network traffic data packet by the neural network model;
S3: using a transducer to learn the time sequence relation among the coded network traffic data packets, obtaining the characteristic representation of each network traffic data packet, and using a bidirectional LSTM to learn the length information of each network traffic data packet;
s4: and fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet.
Referring to fig. 2, a flow chart of an encrypted traffic identification method according to a second embodiment of the present application is shown. The encrypted traffic identification method according to the second embodiment of the present application includes the steps of:
S10: building an end-to-end encryption traffic identification model of a data packet level;
In this step, the encrypted traffic identification model structure is shown in fig. 3, and includes a pre-training layer, a packet coding layer, a timing layer, a complementary layer, and a classification layer.
S20: acquiring a large number of unsupervised network traffic data packets in a pre-training layer, and learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder (encoder) to train a neural network model with data packet coding capability;
in this step, the training process of the pre-training layer specifically includes:
S21: all network traffic packets are grouped according to the same five-tuple: dividing the source IP, the target IP, the source port, the target port and the transmission protocol, wherein each group represents one bidirectional communication flow;
S22: extracting byte content of each network flow data packet above an IP layer, and converting the extracted byte content into a 16-system file;
S23: according to a set proportion (the proportion is set to be 15% in the embodiment of the application, and the setting can be carried out according to actual operation), the byte content in each 16-system file is randomly covered, and a [ PACKET ] mark is added to the head of each file;
S24: learning the relevance between byte contents of each network traffic data packet by using Transformer Encoder, recovering the masked byte contents, and training a neural network model with the data packet coding capability by using cross entropy as a loss function.
In the above, the embodiment of the application adopts the unsupervised pretraining method of randomly covering part of byte content of the data packet and recovering the data packet through the transducer to train the neural network model in the pretraining layer, so that the network traffic bytes can be better expressed, and the better data packet coding capability can be achieved.
S30: in a data packet coding layer, respectively coding each byte content of a network traffic data packet by utilizing a neural network model obtained by training of a pre-training layer, and sending the coded network traffic data packet into a time sequence layer;
in this step, the implementation process of the neural network model for encoding the network traffic data packet specifically includes:
s31: extracting byte content of each network flow data packet above an IP layer, and converting the extracted byte content into a 16-system file;
S32: respectively adding a [ PACKET ] mark into the head of each 16-system file, and cutting or filling the byte content of each 16-system file to a preset length;
S33: after byte contents (including [ PACKET ] marks) of each 16-system file are respectively encoded by using a neural network model, vectors corresponding to the [ PACKET ] marks are respectively used as vector representations of corresponding network flow data PACKETs, and the vectors are sent to a time sequence layer;
In the above, the embodiment of the present application supplements the length information of each data packet in the data packet coding layer, so as to prevent the loss of the length information caused by cutting or filling, and maintain the integrity of the data packet information as much as possible.
S40: in the time sequence layer, using a transducer to learn the time sequence relation among the network flow data packets, and acquiring the characteristic representation of each network flow data packet;
In this step, the learning process of the timing sequence layer to the timing sequence relationship specifically includes:
S41: using Transformer Encoder to process the vector e i of each network traffic data packet respectively, so as to fuse the information of other network traffic data packets and obtain a new vector representation v i=Transformer(ei of each network traffic data packet);
S42: splicing the vector representation v i of each network flow data packet to obtain the characteristic representation h 1=Concat(v1,v2,…,vm)Wo of each network flow data packet; where W o is the weight matrix of the neural network, concat represents stitching the two vectors.
S50: in the supplementary layer, the length information of all network flow data packets is taken as input, and the hidden characteristics of the length sequence of the data packets are learned by using a bidirectional LSTM (Long Short-Term Memory network);
in this step, the length information learning process of the network traffic packet specifically includes:
S51: extracting original length information of each network traffic data packet respectively, and constructing a length sequence :L={l1,l2,…,lm}={length(p1),length(p2),…,length(pm)};, wherein l i represents the length of the data packet, p i represents the ith data packet, and length () represents the length information of the extracted data packet;
S52: the length sequence L is learned using bi-directional LSTM to obtain length information h2=Concat(LSTM→(l1,l2,…,lm),LSTM←(l1,l2,…,lm)); for each network traffic packet, where L i represents the length of the packet and Concat represents the concatenation of the two vectors.
S60: taking the characteristic representation h 1 and the length information h 2 of the network flow data packet as the input of a classification layer, and outputting the flow identification result of each network flow data packet by fusing the characteristic representation h 1 and the length information h 2 and classifying Softmax;
In this step, the fusion process of the classification layer pair feature representation h 1 and the length information h 2 specifically includes:
s61: full-join and Softmax classification of h 1 in the temporal layer yields the predicted value γ 1: And calculates the cross entropy loss function value loss 1; wherein W, b is the parameter to be learned by the neural network.
S62: full ligation and Softmax classification of h 2 in the supplementary layer resulted in the predicted value γ 2: and calculates the cross entropy loss function value loss 2;
S63: splicing the h 1、h1 to obtain h 3=Concat(h1,h2);
S64: full ligation and Softmax classification of h 3 gave the predicted value γ 3: And calculates the cross entropy loss function value loss 3;
s65: calculate the sum of three loss function values And updating network parameters by adopting a gradient descent algorithm according to the calculation result.
In the above, in the embodiment of the present application, the loss function value is calculated once for each part (h 1、h2、h3) in the classification layer, and the network parameters are updated by gradient descent with the sum of the three loss function values, so as to ensure that the neural network learns the original byte information and the length information of the data packet, and improve the network performance.
Based on the above, the encryption network traffic identification method of the embodiment of the application can well learn the relevance among different data packets by constructing the end-to-end encryption traffic identification model, and at the pre-training layer of the model, by an unsupervised pre-training method of randomly masking part of the byte content of the data packets and recovering through a transducer, better express the network traffic bytes and achieve better data packet coding capability; the length information of the data packet is supplemented in the data packet coding layer, so that the loss of the length information in the cutting or filling stage is prevented, and the integrity of the data packet information is kept as much as possible; in the classification layer, the calculation of a loss function value is respectively carried out on the characteristic representation in the time sequence layer, the length information in the supplementary layer and the fusion of the characteristic representation and the length information, and the gradient descent update of the network parameters is carried out by using the sum of the three loss function values, so that the neural network can learn the original byte information and the length information of the data packet, and the network performance is improved. The application uses an end-to-end strategy, does not need to additionally perform operations such as characteristic engineering and the like, and achieves better encryption flow identification effect while maintaining the information integrity of the data packet.
Fig. 4 is a schematic structural diagram of an encrypted network traffic identification system according to an embodiment of the application. The encrypted network traffic identification system 40 of the embodiment of the present application includes:
Pretraining module 41: the method comprises the steps of acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
packet encoding module 42: encoding byte content of the network traffic data packet by the neural network model;
feature learning module 43: the method comprises the steps of using a transducer to learn time sequence relations among all encoded network traffic data packets and obtaining characteristic representations of all network traffic data packets;
Length learning module 44: learning length information of each network traffic packet using a bidirectional LSTM;
fusion and classification module 45: and the method is used for fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet.
Fig. 5 is a schematic diagram of a terminal structure according to an embodiment of the application. The terminal 50 includes a processor 51, a memory 52 coupled to the processor 51.
The memory 52 stores program instructions for implementing the encrypted traffic identification method described above.
The processor 51 is operative to execute program instructions stored in the memory 52 to control encrypted traffic identification.
The processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
Fig. 6 is a schematic structural diagram of a storage medium according to an embodiment of the application. The storage medium of the embodiment of the present application stores a program file 61 capable of implementing all the methods described above, where the program file 61 may be stored in the storage medium in the form of a software product, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (RAM, random Access Memory), a magnetic disk, an optical disk, or other various media capable of storing program codes, or a terminal device such as a computer, a server, a mobile phone, a tablet, or the like.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. An encrypted traffic identification method, comprising:
acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
Encoding byte content of the network traffic data packet by the neural network model;
Using a transducer to learn the time sequence relation among the coded network traffic data packets, obtaining the characteristic representation of each network traffic data packet, and using a bidirectional LSTM to learn the length information of each network traffic data packet;
Fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet;
The training of a neural network model with data packet encoding capability includes:
All network traffic packets are grouped according to the same five-tuple: dividing the source IP, the target IP, the source port, the target port and the transmission protocol, wherein each group represents a bidirectional communication flow;
Extracting byte content of each network flow data packet above an IP layer, and converting the extracted byte content into a 16-system file;
Randomly masking byte contents in each 16-system file according to a set proportion, and adding a [ PACKET ] mark into the head of each file respectively;
Learning associations between byte content of the individual network traffic packets using Transformer Encoder and recovering masked byte content, training the neural network model using cross entropy as a loss function.
2. The encrypted traffic recognition method according to claim 1, wherein the training of a neural network model with data packet encoding capability is preceded by:
And constructing an encryption traffic identification model, wherein the encryption traffic identification model comprises a pre-training layer, a data packet coding layer, a time sequence layer, a supplement layer and a classification layer, and training the neural network at the pre-training layer of the encryption traffic identification model.
3. The encrypted traffic recognition method according to claim 2, wherein the encoding byte content of the network traffic data packet by the neural network model comprises:
Extracting byte content of each network flow data packet above an IP layer at a data packet coding layer of the encryption flow identification model, and converting the extracted byte content into a 16-system file;
Respectively adding a [ PACKET ] mark into the head of each 16-system file, and cutting or filling the byte content of each 16-system file to a preset length;
And after the byte content of each 16-system file is respectively encoded by using the neural network model, using a vector corresponding to each [ PACKET ] tag as a vector representation of a corresponding network flow data PACKET.
4. The encrypted traffic recognition method according to claim 3, wherein the learning the timing relationship between the encoded individual network traffic packets using a transducer includes:
At the time sequence layer of the encryption traffic identification model, vector e i of each network traffic data packet is processed by Transformer Encoder respectively, so that information of other network traffic data packets is fused, and a new vector representation v i=Transformer(ei of each network traffic data packet is obtained;
Splicing the vector representation v i of each network flow data packet to obtain the characteristic representation h 1=Concat(v1,v2,…,vm)Wo of each network flow data packet; where W o is the weight matrix of the neural network, concat represents stitching the two vectors.
5. The encrypted traffic recognition method according to claim 4, wherein the learning the length information of each network traffic packet using the bidirectional LSTM comprises:
Extracting original length information of each network traffic data packet at a complementary layer of the encryption traffic recognition model respectively, and constructing a length sequence :L={l1,l2,…,lm}={length(p1),length(p2),…,length(pm)};, wherein l i represents the length of the data packet, p i represents the ith data packet, and length () represents the length information of the extracted data packet;
The length sequence L is learned using a bi-directional LSTM to obtain length information h2=Concat(LSTM→(l1,l2,…,lm),LSTM←(l1,l2,…,lm)), for each network traffic packet, where L i represents the length of the packet and Concat represents the concatenation of the two vectors.
6. The encrypted traffic recognition method according to claim 5, wherein the fusing the characteristic representation and the length information of the respective network traffic packets and Softmax classification comprises:
At a classification layer of the encrypted traffic identification model, performing full connection and Softmax classification on h 1 in the time sequence layer to obtain a predicted value gamma 1: And calculates the cross entropy loss function value loss 1; wherein/> Is a parameter to be learned by the neural network;
Full ligation and Softmax classification of h 2 in the supplementary layer resulted in the predicted value γ 2: And calculates the cross entropy loss function value loss 2; wherein/> Is a parameter to be learned by the neural network;
splicing the h 1、h1 to obtain h 3=Concat(h1,h2);
full ligation and Softmax classification of h 3 gave the predicted value γ 3: And calculates the cross entropy loss function value loss 3; wherein/> Is a parameter to be learned by the neural network;
calculate the sum of loss 1、loss2、loss3 And updating network parameters of the encrypted traffic identification model by adopting a gradient descent algorithm according to the calculation result.
7. An encrypted traffic identification system using the encrypted traffic identification method according to claim 1, comprising:
The pre-training module: the method comprises the steps of acquiring network traffic data packets, learning the relevance among byte contents of each network traffic data packet by using Transformer Encoder, and training a neural network model with data packet coding capability;
and a data packet coding module: encoding byte content of the network traffic data packet by the neural network model;
And the characteristic learning module is used for: the method comprises the steps of using a transducer to learn time sequence relations among all encoded network traffic data packets and obtaining characteristic representations of all network traffic data packets;
and a length learning module: learning length information of each network traffic packet using a bidirectional LSTM;
Fusion and classification module: and the method is used for fusing the characteristic representation and the length information of each network flow data packet and classifying Softmax to obtain the flow identification result of each network flow data packet.
8. A terminal comprising a processor, a memory coupled to the processor, wherein,
The memory stores program instructions for implementing the encrypted traffic identification method according to any one of claims 1 to 6;
The processor is configured to execute the program instructions stored by the memory to control encrypted traffic identification.
9. A storage medium storing program instructions executable by a processor for performing the encrypted traffic identification method according to any one of claims 1 to 6.
CN202011231169.XA 2020-11-06 2020-11-06 Encryption traffic identification method, system, terminal and storage medium Active CN114448905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011231169.XA CN114448905B (en) 2020-11-06 2020-11-06 Encryption traffic identification method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011231169.XA CN114448905B (en) 2020-11-06 2020-11-06 Encryption traffic identification method, system, terminal and storage medium

Publications (2)

Publication Number Publication Date
CN114448905A CN114448905A (en) 2022-05-06
CN114448905B true CN114448905B (en) 2024-04-19

Family

ID=81361532

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011231169.XA Active CN114448905B (en) 2020-11-06 2020-11-06 Encryption traffic identification method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN114448905B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063777A (en) * 2018-08-07 2018-12-21 北京邮电大学 Net flow assorted method, apparatus and realization device
CN109543824A (en) * 2018-11-30 2019-03-29 腾讯科技(深圳)有限公司 A kind for the treatment of method and apparatus of series model
CN109831422A (en) * 2019-01-17 2019-05-31 中国科学院信息工程研究所 A kind of encryption traffic classification method based on end-to-end sequence network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9210181B1 (en) * 2014-05-26 2015-12-08 Solana Networks Inc. Detection of anomaly in network flow data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063777A (en) * 2018-08-07 2018-12-21 北京邮电大学 Net flow assorted method, apparatus and realization device
CN109543824A (en) * 2018-11-30 2019-03-29 腾讯科技(深圳)有限公司 A kind for the treatment of method and apparatus of series model
CN109831422A (en) * 2019-01-17 2019-05-31 中国科学院信息工程研究所 A kind of encryption traffic classification method based on end-to-end sequence network

Also Published As

Publication number Publication date
CN114448905A (en) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2022094926A1 (en) Encrypted traffic identification method, and system, terminal and storage medium
CN112163594B (en) Network encryption traffic identification method and device
CN110197234B (en) Encrypted flow classification method based on dual-channel convolutional neural network
CN113179223B (en) Network application identification method and system based on deep learning and serialization features
WO2019096099A1 (en) Real-time detection method and apparatus for dga domain name
CN108768986A (en) A kind of encryption traffic classification method and server, computer readable storage medium
CN113542259A (en) Encrypted malicious flow detection method and system based on multi-mode deep learning
CN112385193B (en) Method and device for processing message data
CN112036518B (en) Application program flow classification method based on data packet byte distribution and storage medium
Wu et al. TDAE: Autoencoder-based automatic feature learning method for the detection of DNS tunnel
CN112887291A (en) I2P traffic identification method and system based on deep learning
CN114448905B (en) Encryption traffic identification method, system, terminal and storage medium
De Souza et al. A distinguishing attack with a neural network
CN112839051A (en) Encryption flow real-time classification method and device based on convolutional neural network
CN116684133A (en) SDN network abnormal flow classification device and method based on double-layer attention and space-time feature parallel fusion
CN116340540A (en) Method for generating network security emergency response knowledge graph based on text
CN115643105A (en) Federal learning method and device based on homomorphic encryption and depth gradient compression
AU2022309301A1 (en) "cyber security"
CN103327363A (en) System and method for realizing control over video information encryption on basis of semantic granularity
WO2022016573A1 (en) Video monitoring analysis system and method
CN117240488A (en) Encryption flow identification method based on BoTNet fusion space-time characteristics
CN111835720B (en) VPN flow WEB fingerprint identification method based on feature enhancement
CN114205151A (en) HTTP/2 page access flow identification method based on multi-feature fusion learning
Mao et al. Semisupervised Encrypted Traffic Identification Based on Auxiliary Classification Generative Adversarial Network.
CN114301636A (en) VPN communication behavior analysis method based on flow multi-scale space-time feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant