CN115883263A

CN115883263A - Encryption application protocol type identification method based on multi-scale load semantic mining

Info

Publication number: CN115883263A
Application number: CN202310189712.1A
Authority: CN
Inventors: 吉庆兵; 谈程; 罗杰; 潘炜; 康璐; 倪绿林; 尹浩
Original assignee: CETC 30 Research Institute
Current assignee: CETC 30 Research Institute
Priority date: 2023-03-02
Filing date: 2023-03-02
Publication date: 2023-03-31
Anticipated expiration: 2043-03-02
Also published as: CN115883263B

Abstract

The invention provides an encryption application protocol type identification method based on multi-scale load semantic mining, which comprises the following steps: step 1, extracting load characteristics of original flow and converting the load characteristics into a decimal byte sequence; step 2, constructing a pyramid neural network based on a load semantic mining block, and processing a decimal byte sequence to obtain an input characteristic sequence; step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window sequentially moves to the tail end of the sequence, and features extracted from the splicing window obtain features of the input sequence; step 4, reducing the dimension of the features of the input sequence to be used as a new input sequence, repeating the step 3 to the step 4, and splicing the features obtained each time to obtain multi-scale features; and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics. The invention can extract the multi-scale characteristics in the encryption network application protocol message in a complex scene, and improve the speed and the precision of encryption flow identification.

Description

Encryption application protocol type identification method based on multi-scale load semantic mining

Technical Field

The invention relates to the field of flow analysis, in particular to an encryption application protocol type identification method based on multi-scale load semantic mining.

Background

Traffic classification has been used in a very wide range of applications, and is the basis for network security and network management, and detection of traffic classification from QoS services in network service providers to security applications in firewalls and intrusion detection systems has not been isolated. At present, the traffic classification mainly adopts methods based on port number, deep packet inspection, machine learning and the like, but has certain defects:

(1) Traditional port number-based approaches have long failed because newer applications either use well-known port numbers to mask their traffic or do not use standard registered port numbers.

(2) Deep packet inspection relies on finding keys in the packets, which fails in the face of encrypted traffic.

(3) Machine learning based methods of encrypted network traffic identification rely heavily on ergonomic features, which limits their popularity.

With the popularization of deep learning methods, researchers have studied the effects of these methods on traffic classification tasks and demonstrated higher accuracy on early mobile application traffic data sets. With the continuous upgrading of encryption protocols, the explosive growth of the number of mobile applications and the change of mobile application development modes, shallow deep learning models cannot meet the actual requirements of mobile application traffic identification in the current complex scene. Although the currently proposed transform-based encrypted traffic identification method has a good effect on feature learning, global features are more concerned in the feature extraction process, detail features hidden in high-fraction load data are ignored, and the local features are the key for realizing accurate classification in many cases.

Disclosure of Invention

In order to solve the problems that deep features in encrypted flow cannot be learned by a shallow neural network under the current complex scene and the existing deep neural network excessively focuses on global features to cause loss of detail features, the invention provides a new encryption network application protocol type identification method, which fully utilizes the global features and the local detail features of different scales in packet loads by extracting the features of different scales, thereby improving the identification precision.

The technical scheme adopted by the invention is as follows: the encryption application protocol type identification method based on multi-scale load semantic mining comprises the following steps:

step 1, preprocessing original flow of a mobile application encryption network, extracting load characteristics of a transmission layer load, and converting the load characteristics into a decimal byte sequence;

step 2, constructing a pyramid neural network based on a load semantic mining block, and acquiring a word embedding characteristic and a position coding characteristic of a decimal byte sequence, wherein an input characteristic sequence is obtained by adding the word embedding characteristic and the position coding characteristic;

step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window moves in sequence until the tail end of the input sequence, the features in the sliding window during each movement are extracted, and the features extracted in all the sliding windows are spliced in sequence to obtain the features of the input sequence;

step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 k, and splicing the features of the input sequence obtained each time to obtain the multi-scale features of the input sequence;

and 5, finishing classification of the encryption network application protocol types according to the multiple scale characteristics.

Further, the pretreatment process in the step 1 is as follows:

step 1.1, dividing the data packet into session flows according to quintuple;

step 1.2, cleaning the session stream, and removing the data packet retransmitted overtime, the data packet of the address resolution protocol and the data packet of the dynamic host configuration protocol;

step 1.3, extracting load characteristics of a transmission layer load in a data packet, and splicing the extracted load characteristics according to the arrival sequence of the data packet until the byte length after splicing reaches the set load characteristic length;

and 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.

Further, in step 1.3, if the byte length after the payload features of all the packets in the session stream are spliced is still smaller than the set payload feature length, the packet is padded with 0X 00.

Further, in the step 2, mapping byte features of the decimal byte sequence to a d-dimensional vector space to obtain word embedding features F1,

where R represents a real number in the matrix.

Further, in the step 2, the position coding feature calculating method includes:

（1）

（2）

（3）

where pos denotes the position where the byte appears in the byte sequence, left side of formula (1)

Position coding which indicates bytes in even positions, left-based or (2)>

Indicates the position-coding of the byte in the odd position, based on the value of the flag>

I is a position-coded dimension subscript modulo 2, and (1) indicates that even positions are->

And (2) represents the odd number position is based on->

，/>

For the position-coded dimension, <' > H>

For position-coding features, in formula (3)>

Indicating the position code of each byte in the byte sequence.

Further, the substep of step 3 comprises:

step 3.1, constructing a sliding window with the size of L bytes on the input sequence;

step 3.2, performing feature extraction on the data in the sliding window by adopting a multi-head attention mechanism to obtain a feature F4;

step 3.3, carrying out residual error connection and layer normalization processing on the input sequence F3 and the characteristic F4 to obtain a characteristic F5;

step 3.4, performing two-layer full-connection layer operation on the characteristic F5 to obtain a characteristic F6;

step 3.5, carrying out residual error connection and layer normalization processing on the characteristic F5 and the characteristic F6 to obtain a characteristic F7;

step 3.6, moving the sliding window backwards by L bytes, and repeating the step 3.2 to the step 3.6 until the sliding window moves to the tail end of the input sequence;

and 3.7, splicing the features F7 in all the sliding windows to obtain a feature F8 which is used as the feature of the input sequence.

Further, the substeps of step 3.2 are:

step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the incidence relation of byte sequences in the window;

and 3.2.2, repeating the step 3.2.1 for M times according to the set attention head number M, and splicing and linearly converting the extracted result every time to obtain the characteristic F4 of the data in the sliding window.

Further, in step 4, a one-dimensional maximum pooling layer is used to complete feature compression and dimension reduction, and each pooling operation halves the dimension of the first dimension of the feature.

Further, the substep of step 5 comprises:

step 5.1, inputting the extracted multi-scale features into a full connection layer and an activation function, wherein the output dimension is consistent with the quantity of flow categories;

and 5.2, calculating the type of the encrypted network application protocol according to the output.

Further, in the step 5.2, a specific calculation method of the category is as follows:

where Z represents the output of the multi-scale feature input fully-connected layer and the activation function.

Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows:

1. the pyramid network constructed based on the load semantic mining block can extract multi-scale features in the message type of the encryption network application protocol in the current complex scene, fully extract global features and multi-scale local features, and further improve the accuracy of encryption flow identification.

2. When the local features are extracted, a sliding window mode is adopted, each self-attention calculation is carried out in the range covered by the window, noise is avoided being introduced when the local features are extracted, meanwhile, model parameters are greatly reduced, and the calculation speed of the model is improved.

3. Learning and classifying based on load data above a transmission layer in the network traffic data, and having strong generalization capability without depending on IP address and port number information of the head of a network traffic data packet; strong identification information such as an IP address and port number information of a header of a network traffic data packet does not have universality, and may cause strong interference to a final identification result.

Drawings

Fig. 1 is a flowchart of an encryption application protocol type identification method based on multi-scale load semantic mining according to the present invention.

Fig. 2 is a schematic diagram of a pyramid network model structure according to an embodiment of the present invention.

FIG. 3 is a flowchart illustrating an implementation of a sliding window according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of multi-scale feature extraction according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar modules or modules having the same or similar functionality throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application. On the contrary, the embodiments of the application include all changes, modifications and equivalents coming within the spirit and terms of the claims appended hereto.

Aiming at the problems that deep level features in encrypted flow cannot be learned by a shallow neural network in the prior art under the current complex scene and the existing deep neural network excessively focuses on global features to cause loss of detail features, the embodiment provides an encryption application protocol type identification method for mining deep neural networks to extract multi-scale features based on load semantics, the features of different scales are extracted, the global features in packet loads and the local detail features of different scales are fully utilized, the identification precision is further improved, meanwhile, the local features are extracted in a sliding window mode, self-attention calculation is limited in a window range, model parameters are reduced, and the calculation speed of a model is improved, and the specific scheme is as follows:

as shown in fig. 1, the method for identifying the type of the encryption application protocol based on the multi-scale load semantic mining includes:

step 2, building a pyramid neural network based on the load semantic mining block; acquiring a word embedding characteristic and a position coding characteristic of the decimal byte sequence, and adding the word embedding characteristic and the position coding characteristic to obtain an input characteristic sequence;

step 3, the load semantic mining block constructs a sliding window on the input feature sequence, the sliding window moves in sequence until the end of the input sequence, the features in the sliding window during each movement are extracted, and the features extracted in all the sliding windows are spliced in sequence to obtain the features of the input sequence;

Since the pre-identification information such as the IP address and the port number information of the network traffic packet header has no universality and may cause strong interference to the identification result, in this embodiment, the learning and classification are performed based on the information and the data above the network traffic data transmission layer, and do not depend on the IP address and the port number of the network traffic packet header.

Before parsing, the original flow needs to be preprocessed, specifically:

step 1.1, dividing the received data packet into session flows according to a quintuple (source IP, destination IP, source port, destination port, transport layer protocol), and identifying the flow by taking the session flows as a unit.

Step 1.2, because the received data packets contain data packets irrelevant to the specific flow of the transmission content, the session stream needs to be cleaned, and the data packets retransmitted overtime, the data packets of an Address Resolution Protocol (ARP) and a Dynamic Host Configuration Protocol (DHCP) are removed. In this example, the washing was accomplished using the Tsharp tool from WireShark.

And step 1.3, after removing the irrelevant data packets, extracting the load characteristics of the transmission layer loads of the remaining data packets, and extracting the load characteristics of the transmission layer loads according to the arrival sequence of the data packets for splicing until the extracted byte length reaches the set load characteristic length N. It should be noted that, in this embodiment, if the length of the concatenated bytes of the payload features of all the packets in the session stream is smaller than N, padding is performed by 0X 00.

Preferably, the present embodiment uses the rdpcap method of the Scapy tool to extract the load characteristics of the transport layer load.

And step 1.4, converting the extracted and spliced binary load characteristics into a decimal byte sequence, namely converting each byte into a corresponding decimal number (0 to 255).

After the decimal byte sequence representing the transmission layer characteristics is obtained, the analysis of the traffic type can be started, and in this embodiment, the features of different scales in the payload (decimal byte sequence) are extracted by using the constructed Pyramid-type neural network (Pyramid-Transformer).

The current encryption flow identification model based on a transform (a deep learning framework) uses a self-attention mechanism to extract global features more, neglects extraction of local features, and the local features may be a key for realizing fine classification, and meanwhile, the local features have a phenomenon of inconsistent scales, and interference may exist in the extraction process.

As shown in fig. 2 and 4, in this embodiment, a Pyramid-type neural network (Pyramid-Transformer) constructed based on a plurality of load semantic mining blocks (Pyramid Transformer blocks) is proposed in step 2, and a one-dimensional maximum pooling layer is arranged between each load semantic mining block, so as to implement compression and dimension reduction in the feature extraction process. Each load semantic mining block has the same composition and comprises six parts of multi-head attention calculation, residual connection, layer normalization, two layers of fully-connected layers and activation functions, residual connection and layer normalization which are sequentially connected. Extracting deep multi-scale features in a stacking mode of a plurality of load semantic mining blocks, compressing feature dimensions to 1/2 after each load semantic mining block extracts the features, inputting the compressed features into the next load semantic mining block without changing the window size, extracting features with larger dimensions in the mode, gradually reducing the feature dimensions extracted by each load semantic mining block to form a pyramid shape, and splicing the features to obtain the final features.

The process of realizing feature extraction by the pyramid type neural network is specifically explained as follows:

in the pyramid type neural network, feature extraction is mainly completed through a load semantic mining block, and the input of the load semantic mining block is the combination of word embedding features and position coding features of a byte sequence, so that a decimal byte sequence needs to be processed firstly.

For byte sequence (in FIG. 2, FIG. 4, areB1、B2、…、BN-1、BN-2) Performing word embedding operation, mapping the byte features to a d-dimensional vector space to obtain word embedding features F1 as subsequent input,

where R represents a real number in the matrix.

Calculating position coding characteristics of byte sequence

，/>

Where R represents a real number in the matrix:

（1）

（2）

（3）

Indicates the position coding of the byte in the even position, and (2) left @>

Position-coding, which represents bytes in odd positions>

And (2) represents the odd number position is based on->

，/>

A dimension that encodes a position; (3) In the formula>

Indicating the position code of each byte in the byte sequence. Since the Transformer uses global information and cannot utilize the order information of bytes, which is very important for feature learning, the present embodiment acquires the position encoding feature.

Combining the word embedding characteristic and the position coding characteristic according to the formula (4) to obtain the input characteristic of the load semantic mining block

，/>

Where R represents a real number in the matrix.

（4）

After determining the input of the load semantic mining block, feature extraction can be performed through the load semantic mining block, which specifically includes:

step 3.1, because some detail features only exist on a small number of adjacent bytes, the direct feature extraction of the whole input sequence may cause interference to the local detail features, and the sliding window mode is used to ensure that the high-resolution local detail features are not damaged. Thus in the input feature

And constructing a sliding window with the size of L, and performing feature extraction on data inside the window as shown in FIG. 3.

Step 3.2, acquiring data in the sliding window as

，/>

The multi-head attention machine is adopted to control the paired judgment and judgment>

Performing feature extraction to obtain the feature->

，/>

，/>

Contains global dependencies of bytes within a window, and what is obtained here from the point of view of the entire byte sequence is a local feature within the window.

The specific process comprises the following steps:

step 3.2.1, pair

Performing multi-head self-attention calculation, and extracting the association relation of byte sequences in a window:

using a weight matrix

，/>

，/>

Counting feature>

Is/are>

The calculation process is shown as formula (5), formula (6) and formula (7):

（5）

（6）

（7）

by passing

The matrix operation of (a) implements a self-Attention mechanism (Attention), resulting in an output @>

，/>

：

（8）

wherein ,

is->

The number of columns of the matrix, i.e. the vector dimension, and->

In the same way>

Is a matrix transposition. The calculation matrix in the formula->

and />

The inner product of each row vector is divided by ^ to prevent the inner product from being too large>

. After Q is multiplied by the transpose of K, the number of rows and columns of the obtained matrix is L, where L is the window size, and this matrix can represent the strength of association between bytes. Get->

Thereafter, use->

The function (normalized exponential function) calculates the self-attention coefficient of each byte for the other bytes, and->

Each row of the matrix is normalized, i.e. the sum of each row becomes 1.

Step 3.2.2, setting the number M of attention heads, repeating the step 3.2.1M times to obtain M output Z, splicing and linearly transforming the M output Z to obtain characteristics

，/>

：

wherein ,

represents the output of the first calculation, is greater than or equal to>

Indicates the fifth->

The sub-calculated output->

Weight matrix representing a linear transformation>

。

Step 3.3, for

and />

Performing residual connection and layer normalization to obtain a characteristic->

：

（9）

Wherein LayerNorm indicates layer normalization operation.

Step 3.4, the

Performing a Forward propagation (Feed Forward) operation to obtain the characteristic->

，/>

。

（10）

Wherein, linear represents that one full connection layer operation is carried out; feed Forward consists of two fully connected layers, the first layer using the activation function RELU and the second layer not using the activation function.

Step 3.5, for

and />

，/>

：

（11）

Step 3.6, moving the sliding window backwards by L bytes, and re-executing the step 3.2 to the step 3.5 in the new window until the sliding window moves to the input characteristic

Ending;

step 3.7, feature obtained in each sliding window

Spliced to get->

，/>

：/>

（12）

wherein ,

characteristic representing the result of the first window>

，/>

Indicates the characteristic taken by the last window>

。

In order to extract the multi-scale features of the byte sequence, in step 4 of this embodiment, a one-dimensional maximum pooling layer pair of features is first adopted

Feature compression and dimension reduction to obtain features->

，/>

：

（13）

Wherein, maxPool1d represents one-dimensional maximum pooling operation, the pooling operation halves the dimension of the first dimension of the feature, and meanwhile, the new feature has richer semantic information.

And setting repetition times k according to the requirement, and repeating the steps 3-4 k times except for the input characteristic in the process of executing the step 3 for the first time

In addition, when step 3 is subsequently executed, the characteristic obtained in the last step 4 is->

As an input for this time.

Features obtained by each repeated execution

Splicing to obtain characteristics>

：

（14）

As shown in fig. 4, the repeated operation represents that the load semantic mining blocks of the pyramid network model are stacked multiple times, and the features of deeper higher semantics are extracted layer by layer, in fig. 4, the feature dimension is represented by N, d, N is the same as the length of the input byte sequence, and d is the same as the dimension of each byte extension after the word embedding operation. Wherein,

representing a characteristic taken by a first repeated operation>

，/>

Representing a characteristic taken by the kth operation>

。

The characteristics obtained at this time

I.e. a multi-scale feature in the required load. After obtaining the multi-scale features, the flow classification can be carried out:

in this embodiment, the classification process specifically includes:

step 5.1, extracting the multi-scale features

Input full connection layer and activation function>

Dimension of output and number of traffic classes->

And (5) the consistency is achieved.

（15）

wherein ,

a weight matrix representing a fully connected layer, <' >>

，/>

And 5.2, calculating and outputting the type of the encrypted network application protocol:

in the embodiment, a deep neural network, namely a pyramid neural network is constructed, and the network stacks load semantic mining blocks, so that deep features in an encryption protocol message type in a current complex scene can be extracted, and the accuracy of flow identification is improved.

It should be noted that, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "disposed" and "connected" should be interpreted broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; may be directly connected or may be indirectly connected through an intermediate. The specific meanings of the above terms in the present invention can be understood as specific cases to those of ordinary skill in the art; the drawings in the embodiments are used for clearly and completely describing the technical scheme in the embodiments of the invention, and obviously, the described embodiments are a part of the embodiments of the invention, but not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. The encryption application protocol type identification method based on multi-scale load semantic mining is characterized by comprising the following steps:

step 4, performing feature compression and dimension reduction on the features of the input sequence to serve as a new input sequence, repeating the steps 3-4 k times, and splicing the features of the input sequence obtained in the step 3 repeatedly each time to obtain multi-scale features of the input sequence;

2. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the preprocessing process in the step 1 is as follows:

step 1.1, dividing the data packet into session flows according to quintuple;

step 1.3, extracting load characteristics of transmission layer loads in the data packets, and splicing the extracted load characteristics according to the arrival sequence of the data packets until the byte length after splicing reaches the set load characteristic length;

and step 1.4, converting the extracted spliced load characteristics into a decimal byte sequence.

3. The encryption application protocol type identification method based on multi-scale load semantic mining as claimed in claim 2, wherein in step 1.3, if the byte length of the spliced load features of all the data packets in the session stream is still smaller than the set load feature length, the byte length is padded with 0X 00.

4. The encryption application protocol type recognition method based on multi-scale load semantic mining according to claim 1 or 2, characterized in that in the step 2, byte features of decimal byte sequence are mapped to d-dimensional vector space to obtain word embedding features F1,

where R represents a real number in the matrix.

5. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 4, wherein in the step 2, the position coding feature calculation method is as follows:

（1）

（2）

（3）

Position coding which indicates bytes in even positions, left-based or (2)>

And (2) represents the odd number position is based on->

，/>

For the position-coded dimension, <' > H>

For position-coding features, in formula (3)>

Indicating the position code of each byte in the byte sequence.

6. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, wherein the substep of step 3 comprises:

step 3.1, constructing a sliding window with the size of L bytes on the input characteristic sequence;

7. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 6, wherein the substep of the step 3.2 is:

step 3.2.1, performing multi-head self-attention calculation on the data in the sliding window, and extracting the association relation of byte sequences in the window;

8. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 1, characterized in that in the step 4, a one-dimensional maximum pooling layer is adopted to complete feature compression and dimension reduction, and the dimension of the first dimension of the feature is halved for each pooling operation.

9. The encryption application protocol type identification method based on multiscale load semantic mining according to claim 1, wherein the substep of the step 5 comprises:

10. The encryption application protocol type identification method based on multi-scale load semantic mining according to claim 9, wherein in the step 5.2, the specific calculation method of the category is as follows:

wherein ,

representing classes, and Z representing the multi-scale feature input fully-connected layer and the output of the activation function. />