CN115277216A

CN115277216A - Vulnerability exploitation attack encryption flow classification method based on multi-head self-attention mechanism

Info

Publication number: CN115277216A
Application number: CN202210905960.7A
Authority: CN
Inventors: 陈锦富; 马亮; 蔡赛华; 殷上; 宋锣
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2022-11-01

Abstract

The invention provides a vulnerability exploitation attack encryption flow classification method based on a multi-head self-attention mechanism. The method comprises the following steps: step 1: analyzing encrypted flow data to be classified into json format data, and filtering the analyzed data; step 2: analyzing metadata in the vulnerability exploitation attack encryption flow and key features in a TLS protocol, extracting core features required by flow classification, and finally converting the core features into a CSV format file; and step 3: dividing the processed encrypted traffic data into a training set and a test set in proportion, taking the malicious attack traffic type as a label, training a multi-head self-attention mechanism model by using the training set, judging the model by using the test set, and then optimizing the model to obtain a final vulnerability utilization attack encrypted traffic classification model; and 4, step 4: preprocessing the encrypted flow to be detected according to the step 1, then extracting features according to the step 2, and inputting the extracted features into a trained classification model to obtain a final flow classification result.

Description

Vulnerability exploitation attack encryption flow classification method based on multi-head self-attention mechanism

Technical Field

The invention belongs to the field of classification of network flow vulnerability attacks, and relates to a vulnerability exploitation attack encryption flow classification method based on a multi-head self-attention mechanism.

Background

In recent years, with the rise of security awareness of network privacy of users, the use of TLS protocol for protecting communication information has become more and more popular. Hackers working on malicious attacks through the network also notice and exploit the excellent properties of this protocol to use it for insidious exploitation of malicious attack behavior. Because the encryption protocol encrypts most information of the data packet through an asymmetric encryption algorithm, data content in the traffic packet cannot be decrypted forcibly basically, and the traditional traffic classification method based on load matching and the like cannot act on such malicious encrypted traffic, which brings a serious challenge to network protection and management. Therefore, it is imperative to study how to classify and detect encrypted malicious attack traffic without decryption.

Considering classifying encrypted malicious attack traffic without decryption, the invention firstly makes a detailed study on the TLS protocol and performs feature selection around relevant features of the TLS traffic, thereby classifying benign and malicious TLS traffic, for example, extracting metadata-related information and TLS protocol-related information, and the like, thereby realizing classification and identification of different malicious attack encrypted traffic. After the features are extracted, the mainstream method is to simply reduce the dimensions of the flow data features, but ignore the correlation among the flow features, so that the accuracy of flow classification is low. To make efficient use of TLS traffic characteristics, the present invention proposes to use a multi-headed self-attention mechanism to learn key features in TLS traffic and to look for potential associations that exist between different features. And mapping different types of features to the same low-dimensional space, performing feature cross modeling in the low-dimensional space, identifying a feature combination with high correlation by using a multi-head self-attention mechanism, and constructing a key high-order feature for model classification. Based on the method, a Multi-head Self-Attention mechanism-based vulnerability attack encrypted traffic classification (TLS-MHSA) model is constructed, encrypted traffic can be classified under the condition of not decrypting, the TLS protocol characteristics of vulnerability attack encrypted traffic can be emphasized by a neural network, and the model can classify malicious encrypted traffic with high accuracy.

Disclosure of Invention

Based on the fact that the accuracy rate of classification of the exploit attack encryption traffic is improved in the prior art, the related characteristics of malicious encryption traffic cannot be fully utilized by the current deep school model, and therefore the problem is solved by the exploit attack encryption traffic classification method based on the multi-head self-attention mechanism.

The invention provides a vulnerability exploitation attack encryption flow classification method based on a multi-head self-attention mechanism, which comprises the following steps of:

step 1, analyzing encrypted flow data to be classified into json format data, and filtering the analyzed data;

step 2, analyzing metadata in the vulnerability exploitation attack encryption flow and key features in a TLS protocol, extracting core features required by flow classification, and finally converting the core features into a CSV format file;

step 3, dividing the processed encrypted traffic data into a training set and a test set in proportion, using the malicious attack traffic type as a label, training the multi-head self-attention mechanism network model by using the training set, judging the model by using the test set, optimizing the model and obtaining a final vulnerability utilization attack encrypted traffic classification model;

and 4, preprocessing the encrypted flow to be detected according to the step 1, then extracting features according to the step 2, and inputting the extracted features into a trained classification model to obtain a final flow classification result.

Further, the step 1 specifically comprises the following steps:

step 1.1, analyzing the encrypted flow data set into json format data;

step 1.2, cleaning and filtering the analyzed flow data, deleting redundant flow data, and only keeping TLS encrypted flow data;

and 1.3, disordering the sequence of original flow data and improving the generalization capability of the model after learning.

Further, the specific implementation of the step 2 includes the following steps:

step 2.1, TLS protocol encryption flow characteristic selection: acquiring the difference between parameters and extension information of a TLS (transport layer Security) protocol used by vulnerability exploitation attack encryption traffic and normal encryption traffic, and extracting the difference parameters and the difference extension information in the TLS protocol as key features of the TLS protocol encryption traffic;

step 2.2, extracting metadata features: extracting metadata characteristics, such as IP addresses, ports, access bytes and the like, which are possessed in the traffic data as auxiliary characteristics;

and 2.3, marking a label for each flow data after the characteristics are extracted, namely marking the actual type of the flow.

Further, the specific implementation of step 3 includes the following steps:

step 3.1, dividing the processed flow data into a training set and a test set according to the proportion of 8;

step 3.2, constructing a neural network vulnerability utilization attack encryption traffic classification model TLS-MHSA based on a multi-head self-attention MHSA mechanism, and inputting preprocessed traffic data into the TLS-MHSA, wherein the model comprises an input layer, an embedded layer, a multi-head self-attention layer and an output layer; firstly, inputting a preprocessed network flow characteristic vector x into an input layer, mapping all characteristics to the same low-dimensional space through an embedding layer, and outputting a low-dimensional vector; then, mapping the vectors to a plurality of subspaces through a multi-head self-attention mechanism to be combined into different high-order features; multiple high-order feature combinations can be obtained through stacking of multiple multi-head self-attention layers, and effectiveness of the feature combinations is judged through an attention mechanism; finally, inputting the feature combination vector acquired by the upper layer into the full connection layer, and outputting a classification result through a softmax function;

and 3.3, training the classification model by using the training set divided in the step 3.1, and judging and optimizing parameters by using a corresponding test set to obtain a final vulnerability exploitation attack encryption flow classification model.

Further, the method also comprises the step of extracting the password suite, the TLS extension and the related information of the TLS extension as features, wherein the features are used as an important ring in a handshaking process when the TLS protocol establishes connection, and non-critical parameter information in the TLS protocol such as client key length and the like is reserved as auxiliary features.

Further, the step of obtaining the combination of the high-order features is as follows, if the feature is a, the following steps are included:

(1) Selecting any other feature b, and calculating the relevance scores of the features a and b under the attention head h, wherein the calculation formula is as follows:

wherein S is^(h)For the attention scoring function, a common dot product model was chosen, the formula being as follows:

wherein the content of the first and second substances,

all belong to transformation matrices, and the dimensions of all the transformation matrices are d' x d, that is, the original embedding space R^dMapping to a new space R^d′Can respectively convert the flow characteristics x_aAnd x_bVector representation as e_aAnd e_bI.e. by transforming the feature vector from a space in d dimensionsSpace to d' dimension;

(2) Coefficient of passage

All associated features are directed to update the attention weight of feature a in subspace h, i.e. each feature is represented as a weighted sum of all other relevant features. The new feature learned can be expressed as the following formula:

wherein the content of the first and second substances,

is also a feature of d 'x d, d' dimensional space

Is the feature x under subspace h_aThe combined characteristic obtained by crossing the relevant characteristic with the relevant characteristic represents that a new combined characteristic is learned by the method;

(3) The self-attention is expanded from one head to a plurality of heads by utilizing a multi-head self-attention mechanism, so that different feature cross information can be learned from subspaces represented by different heads, and feature cross results learned by different heads can be connected in series according to the following formula, wherein the feature cross results are obtained by connecting different heads in series, and the self-attention is expanded from one head to a plurality of heads by utilizing a multi-head self-attention mechanism

The symbol represents the operation in series, H represents the number of heads used by the multi-head self-attention mechanism,

is the feature x under subspace i (i =1,2, \ 8230;, H)_aCombined features crossed with their related features, feature crossing result

Is disclosedThe formula is as follows:

in order to enable the model to learn high-order traffic characteristics and simultaneously retain low-order original traffic characteristics, the method also adds a classical residual error network into a multi-head self-attention layer, wherein W is_ResIs to mix e_aDimension of and

aligned, reLU (t) = max (0, t) is a non-linear activation function, the formula is as follows:

the output layer is a classifier composed of a full connection layer and softmax together. Wherein, the full connection layer maps the input feature vector to the sample mark space R through linear transformation^cIn (3), obtaining a vector z ∈ R^CWhere C is the total number of classes of traffic to be classified. And then classifying by using a softmax classification function to obtain a final classification result.

Compared with the prior art, the invention has the beneficial effects that:

1. the method is characterized in that key feature information carried by a TLS protocol in the encrypted traffic is fully utilized, a multi-head self-attention mechanism vulnerability exploitation encrypted traffic classification model TLS-MHSA is provided, the model learns the potential relevance of key features and different features in the preprocessed traffic data by using the self-attention mechanism, and further learns important high-order combined feature information by using the multi-head mechanism, so that the accuracy of malicious attack encrypted traffic classification is improved.

Drawings

Fig. 1 is a general flowchart of a vulnerability detection method based on an improved time convolution network.

Fig. 2 is a frame diagram of a malicious attack encryption traffic classification method based on a multi-head self-attention mechanism.

FIG. 3 is a diagram of the TLS-MHSA model architecture.

FIG. 4 is a diagram of a multi-headed self-attention layer structure in the TLS-MHSA model.

FIG. 5 is data sample set information used in the experimental segment of the present invention.

FIG. 6 is a confusion matrix of the TLS-MHSA classification results.

Fig. 7 is a comparison of classification results of three malicious attack encryption traffic classification methods.

FIG. 8 shows the comparison of the accuracy of comparison experiments of RF, deepFM and TLS-MHSA proposed by the present invention, three kinds of exploits attack encryption traffic classification models.

FIG. 9 shows recall ratio comparisons of comparison experiments of RF, deepFM, and TLS-MHSA proposed by the present invention, three kinds of exploits attack encryption traffic classification models.

FIG. 10 shows F1-measure value comparison of comparison experiments of RF, deepFM and TLS-MHSA proposed by the present invention, three kinds of exploits attack encryption traffic classification models.

Detailed Description

The invention will be further described with reference to the accompanying drawings and embodiments, which are described for the purpose of facilitating an understanding of the invention and are not intended to be limiting in any way.

The invention aims to provide a classification method of the exploit attack encryption traffic based on a multi-head self-attention mechanism aiming at the exploit attack encryption traffic classification so as to effectively classify the exploit attack encryption traffic. The invention provides a perfect classification model, and fully tests are carried out to prove the effectiveness and feasibility of the method.

As shown in fig. 1 to 10, the method for classifying exploit attack encryption traffic based on the multi-head self-attention mechanism provided by the present invention includes:

step 1, analyzing encrypted flow data to be classified into json format data, and filtering and cleaning the analyzed data;

step 1.1, analyzing the encrypted flow data set into json format data;

and 1.3, disordering the sequence of the original flow data.

The purpose of analyzing the data into json format data in the embodiment of the invention is to facilitate subsequent operations such as flow filtering and the like, and the purpose of deleting redundant flow data is to prevent the model from learning useless information to reduce the classification effect, and to disturb the original flow data sequence so that the model has stronger generalization capability after learning.

Step 2, analyzing metadata in the vulnerability exploitation attack encryption flow and key features in a TLS protocol, and extracting core features required by flow classification;

step 2.2, metadata feature extraction: extracting metadata characteristics such as IP addresses, ports, access bytes and the like which are possessed in the traffic data as auxiliary characteristics;

and 2.3, marking a label for each piece of flow data after the characteristics are extracted, namely marking the actual type of the flow.

The characteristic extraction of the flow in the embodiment of the invention can be divided into two types, namely metadata characteristic selection and TLS protocol encryption flow characteristic selection. The TLS protocol encrypted flow characteristics mainly refer to parameter information and TLS extension information in the protocol, and the invention extracts the cipher suite, the TLS extension and related information of the TLS extension as characteristics. As an important ring in the handshaking process when the TLS protocol establishes a connection, the related information of the TLS certificate is also worth being used as a traffic characteristic. In addition, the invention also reserves non-key parameter information in TLS protocol such as client key length as auxiliary characteristic, so that the integral TLS encrypted flow characteristic is more comprehensive, and the subsequent classification process obtains better effect.

Metadata characteristics are characteristics in metadata owned by all traffic and are traffic characteristics used by traditional traffic classification, such as IP address, port, ingress and egress byte, etc. Most of the time, the IP address is meaningless for identifying the malicious traffic, and judgment of the model is easily misled, so all IP address information is deleted in the feature extraction. It makes sense to keep some metadata malicious attackers, such as incoming and outgoing bytes, incoming and outgoing packets, as they are only affected by the transmitted data, and considering that normal traffic and malicious traffic also have some differences in these behaviors. The invention uses the traditional standard port matching identification method for reference, and takes the port as one of the flow characteristics. Considering some differences in behavior between malicious traffic and benign traffic, the present invention also considers the relevant features of adding window sequence statistics in which the present system takes a method of using a markov transition matrix for capturing the relationship between adjacent packets. In addition to the characteristics, the byte distance average value, the standard deviation and the byte entropy value are reserved for characteristic identification, and the characteristics serving as the metadata of the traffic can help the model to improve the classification accuracy of the malicious attack encryption traffic.

Step 3, dividing the preprocessed encrypted flow data into a training set and a test set according to a proper proportion, taking different types of malicious attack flows as labels, training a multi-head self-attention mechanism network model by using the training set data, judging the model by using the test set data, and constructing a final vulnerability utilization attack encrypted flow classification model after optimization;

step 3.1, dividing the processed flow data into a training set and a test set according to a proper proportion;

and 3.2, constructing a neural network vulnerability utilization attack encryption traffic classification model TLS-MHSA based on the multi-head self-attention mechanism, and inputting the preprocessed traffic data into the TLS-MHSA, wherein the model comprises an input layer, an embedded layer, a multi-head self-attention layer and an output layer.

The input layer will base the input characteristics onThe difference in feature types translates into corresponding feature vectors, e.g., discrete features translate into unique heat vectors. Wherein A represents the total number of feature classes, x_iThe ith feature class is represented.

x＝[x₁；x₂；...；x_A]

The main role of the embedding layer is to convert sparse feature vectors into dense feature vectors suitable for learning. Mapping the feature vectors processed by the input layer into a low-dimensional space, and representing each flow classification feature by a plurality of low-dimensional vectors, O_iRepresenting an embedded matrix with a feature type i correspondence, x_iThen it is the one-hot coded expression vector of the corresponding feature of the feature type, and the conversion formula is as follows:

e_i＝O_ix_iif the feature is a multi-valued feature, in this case the class variable x_iNot a single heat vector but a multiple heat vector. In order to be compatible with such a case of multi-valued input, a multi-valued feature type S is expressed as an average value of corresponding feature vector vectors to normalize the features, where v represents the number of the multi-valued feature types, and a conversion formula is as follows:

e_i＝1/vO_ix_i

in order to enable the discrete features and the continuous features to be combined with each other, the continuous features are mapped into a low latitude dense feature vector space, and the continuous features are expressed as the multiplication result of feature values and corresponding embedded vectors. Wherein v is_aIs an embedded vector of feature type a, x_aIs a scalar value. The continuous features can be expressed as the following formula:

e_a＝v_ax_a

the role of the multi-head self-attention layer is to capture the correlation between multi-flow features and select meaningful features for high-order combination. The core operation in the attention mechanism is to directly obtain the attention weight of the feature combination through operation, and then judge the importance of the feature combination through weight and. The multi-head self-attention mechanism is to perform multi-group self-attention processing on input classified feature vectors, and then splice all the self-attention processing results together to perform linear transformation to obtain a final result.

The step of obtaining the important high-order feature combination is as follows, taking the feature a as an example, and explaining how to find the important high-order feature of the design feature a:

wherein S is^(h)For the attention scoring function, a common dot product model was chosen here, the formula being as follows:

wherein, the first and the second end of the pipe are connected with each other,

all belong to transformation matrices, and the dimensions of all the transformation matrices are d' x d, that is, the original embedding space R^dMapping to a new space R^d′Can respectively convert the flow characteristics x_aAnd x_bVector representation as e_aAnd e_bI.e. to change the feature vector from a space in d dimension to a space in d' dimension.

(2) Coefficient of passage

All associated features are directed to update the attention weight of feature a in subspace h, i.e. each feature is represented as a weighted sum of all other relevant features. The new features learned can be expressed as the following formula:

wherein the content of the first and second substances,

is also a feature of d 'x d, d' dimensional space

Is the feature x under subspace h_aThe combined feature obtained by crossing the relevant related features represents a new combined feature learned by the method.

(3) The self-attention is expanded from one head to a plurality of heads by utilizing a multi-head self-attention mechanism, so that different feature intersection information can be learned from subspaces represented by different heads, and feature intersection results learned by different heads can be connected in series according to the following formula, wherein the feature intersection results are obtained by connecting different heads in series

The formula of (1) is as follows:

in order to enable the model to learn the high-order traffic characteristics and simultaneously retain the low-order original traffic characteristics, the method also adds the classical residual error network into the multi-head self-attention layer. Wherein W_ResIs to mix e_aDimension of and

the output layer is a classifier composed of a fully connected layer and softmax together. Wherein, the full connection layer maps the input characteristic vector to the sample mark space R through linear transformation^cIn (3), obtaining a vector z ∈ R^CWhere C is the total number of classes of traffic to be classified. And then classifying by using a softmax classification function to obtain a final classification result.

The optimization method used by the invention is an Adaptive learning rate algorithm (Adam), and model parameters are saved after an optimal model is trained. In the relevant parameter p_iProbability vector, y, representing the result of predictive detection_iRepresenting the category of the actual sample label, K represents the total number of categories of the sample label, and Adam has the specific calculation formula:

and 4, analyzing and cleaning the encrypted flow to be detected according to the flow in the step 1, extracting features according to the mode in the step 2, inputting the extracted feature data into a trained classification model, and obtaining a final flow classification result.

Claims

1. A vulnerability exploitation attack encryption flow classification method based on a multi-head self-attention mechanism is characterized by comprising the following steps:

step 3, dividing the processed encrypted traffic data into a training set and a test set in proportion, taking the malicious attack traffic type as a label, training the multi-head self-attention mechanism network model by using the training set, judging the model by using the test set, optimizing the model and obtaining a final vulnerability utilization attack encrypted traffic classification model;

2. The method for classifying the exploit attack encrypted traffic based on the multi-head self-attention mechanism as claimed in claim 1, wherein the step 1 is implemented by the following steps:

step 1.1, analyzing the encrypted flow data set into json format data;

step 1.2, cleaning, filtering and analyzing the flow data, deleting redundant flow data, and only keeping TLS encrypted flow data;

and 1.3, disturbing the sequence of the original flow data for improving the generalization capability of the model after learning.

3. The method as claimed in claim 1, wherein the step 2 is implemented by the following steps:

4. The method as claimed in claim 1, wherein the step 3 is implemented by the following steps:

step 3.2, a neural network vulnerability utilization attack encryption traffic classification model TLS-MHSA based on a multi-head self-attention MHSA mechanism is constructed, preprocessed traffic data are input into the TLS-MHSA, and the model comprises an input layer, an embedding layer, a multi-head self-attention layer and an output layer; firstly, inputting a preprocessed network flow characteristic vector x into an input layer, then mapping all characteristics to the same low-dimensional space through an embedding layer, and outputting a low-dimensional vector; then, mapping the vectors to a plurality of subspaces through a multi-head self-attention mechanism to be combined into different high-order features; multiple high-order feature combinations can be obtained through stacking of multiple multi-head self-attention layers, and effectiveness of the feature combinations is judged through an attention mechanism; finally, inputting the feature combination vector acquired by the upper layer into the full connection layer, and outputting a classification result through a softmax function;

5. The method of claim 3, further comprising extracting the cipher suite and the TLS extension and the information related to the TLS extension as features, wherein the features are used as an important ring in a handshake process when the TLS protocol establishes a connection, and non-critical parameter information in the TLS protocol such as a client key length is reserved as an auxiliary feature.

6. The method of claim 1 wherein the step of obtaining the combination of high order features comprises, if feature a:

wherein the content of the first and second substances,

all belong to transformation matrices, and the dimensions of all the transformation matrices are d' x d, that is, the original embedding space R^dMapping to a new space R^d′Can respectively convert the flow characteristics x_aAnd x_bVector representation as e_aAnd e_bThat is, the feature vector is changed from a space of d dimensions to a space of d' dimensions;

(2) Coefficient of passage

wherein the content of the first and second substances,

is also a feature of d 'x d, d' dimensional space

Is the feature x under subspace h_aThe combined characteristic obtained by crossing the relevant characteristic with the relevant characteristic represents a new combined characteristic learned by the method;

The formula of (1) is as follows:

in order to enable the model to learn the high-order flow characteristics and simultaneously retain the low-order original flow characteristics, the invention also adds a classical residual error network into a multi-head self-attention layer, wherein W is_ResIs to mix e_aDimension of and

aligned, reLU (t) = max (0, t) non-linear activationThe function, the formula is as follows:

the output layer is a classifier composed of a fully connected layer and softmax together. Wherein, the full connection layer maps the input feature vector to the sample mark space R through linear transformation^cIn (3), obtaining a vector z ∈ R^CWhere C is the total number of classes of traffic to be classified. And then classifying by using a softmax classification function to obtain a final classification result.