CN109309630B

CN109309630B - Network traffic classification method and system and electronic equipment

Info

Publication number: CN109309630B
Application number: CN201811113686.XA
Authority: CN
Inventors: 叶可江; 赵世林; 须成忠
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2021-09-21
Anticipated expiration: 2038-09-25
Also published as: CN109309630A; WO2020062390A1

Abstract

The application relates to a network traffic classification method, a network traffic classification system and electronic equipment. The method comprises the following steps: step a: collecting network flow data and labeling the network flow data; step b: extracting a bidirectional flow characteristic set according to the labeled network flow data; step c: and constructing a classification model based on the bidirectional flow feature set, and outputting a classification result of the network flow data through the classification model. The network traffic is classified by utilizing the bidirectional flow characteristics in the network traffic data, a large number of new applications in the internet can be accurately identified and classified, the classification accuracy is improved, and the high precision and high performance of network traffic classification can be effectively guaranteed.

Description

Network traffic classification method and system and electronic equipment

Technical Field

The present application relates to the field of network traffic classification technologies, and in particular, to a method, a system, and an electronic device for classifying network traffic.

Background

With the high-speed popularity of the internet, modern network environments have become increasingly complex and diverse due to the emergence of a large number of new applications. Traffic classification and network application identification play an important role in network management services and security systems, such as quality of service, intrusion detection systems, and traffic management systems. If the flow in the network system can be accurately classified and applied and identified, the network safety and the network management service efficiency are greatly improved, and the system time and the memory overhead can be reduced.

At present, the existing network traffic classification method mainly includes:

firstly, classifying network traffic based on characterization learning: the method comprises the steps of preprocessing the acquired network traffic data, extracting the characteristics of the preprocessed network traffic data by using a characterization learning algorithm, generating network flow vectors from the network traffic data, and classifying the network traffic data according to the network flow vectors, so that the network traffic can be classified efficiently.

Secondly, network traffic classification based on semi-supervised learning: acquiring network flows of marked types and unmarked types, and extracting flow characteristics in each network flow according to a preset fixed quantity to obtain a network flow characteristic vector; according to the marked type of the network flow, calculating the information gain of each flow characteristic in a preset fixed quantity, and performing characteristic weighting on each flow characteristic according to the information gain; mixing the network flows of the marked type and the unmarked type, and clustering the mixed network flows by using a k-means algorithm to obtain k clusters; acquiring the number of marked network flow feature vectors in each cluster of the k clusters, and determining the proportion value of each type in each cluster; wherein the fraction value is equal to a ratio of a number of tagged network flow feature vectors of each type to a number of total tagged network flow feature vectors in the cluster; when the sum of the total number of the marked network flow characteristic vectors in each cluster is smaller than a preset network flow threshold value, judging the corresponding cluster as an unknown protocol cluster, otherwise, judging the corresponding cluster as a type with the largest proportion in the marked network flow characteristic vectors; repeating the two steps until the k clusters determine the flow cluster of the flow type; and taking the flow cluster with the judged flow type as training data to train a flow classifier on the line. The method utilizes the advantages of semi-supervised learning, and has better accuracy and stability compared with the traditional supervised learning algorithm which only uses labeled data to train the model.

Thirdly, self-adaptive semi-supervised network traffic classification: acquiring network flows of marked types and unmarked types, and extracting preset fixed quantity of flow characteristics in each network flow to obtain a network flow characteristic vector; calculating the centroid of the network flow feature vector set in each type according to the marked network flow feature vectors to obtain a vector set M; taking the vector set M as an initial central point of k-means clustering, carrying out self-adaptive semi-supervised k-means clustering on a mixed marked type and unmarked type network flow characteristic vector set X, and outputting clustering of k-means; mapping the obtained network flow in each type of cluster to the flow type according to the maximum posterior probability of the marked network flow characteristic vector of each cluster in the output cluster to obtain the flow cluster of the known type; and taking the known type of flow cluster as training data to train a flow classifier on the outlet.

In summary, the existing network traffic classification methods mainly focus on network traffic classification at the algorithm level, and all kinds of optimization and improvement algorithms are proposed for the classification algorithm part in the training phase, but the problem of how to extract a large number of relevant effective feature sets from network data packets is not solved, and a large number of new applications in the internet cannot be accurately identified and classified.

Disclosure of Invention

The application provides a network traffic classification method, a network traffic classification system and electronic equipment, and aims to solve at least one of the technical problems in the prior art to a certain extent.

In order to solve the above problems, the present application provides the following technical solutions:

a network traffic classification method comprises the following steps:

step a: collecting network flow data and labeling the network flow data;

step b: extracting a bidirectional flow characteristic set according to the labeled network flow data;

step c: and constructing a classification model based on the bidirectional flow feature set, and outputting a classification result of the network flow data through the classification model.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step a, the acquiring network traffic data and the labeling the network traffic data specifically include:

step a 1: selecting an application category in the network traffic;

step a 2: collecting a network flow data packet corresponding to each application and a system network log of a corresponding time period;

step a 3: analyzing the network flow data packet, and finding out the natural attribute of each application and the IP address and the transmission protocol communicated with other applications;

step a 4: and extracting the IP end points and the transmission packet number associated with each application in the system network log, and performing association fusion by combining an IP address and a transmission protocol to finish the labeling processing of the network flow data.

The technical scheme adopted by the embodiment of the application further comprises the following steps: in the step b, the extracting a bidirectional flow feature set according to the labeled network traffic data specifically includes:

step b 1: analyzing according to the labeled network traffic data, and respectively counting bidirectional network flow information between each pair of { source IP address, destination IP address } and { destination IP address- > source IP address } based on different port numbers in the network traffic data;

step b 2: finding out forward network flows between each pair of { source IP address- > destination IP address }, and extracting all forward network flow feature sets from the forward network flows;

step b 3: finding out reverse network flows between each pair of { destination IP address- > source IP address }, and extracting all reverse network flow feature sets from the reverse network flows;

step b 4: and combining the forward and reverse network flow feature sets between each pair of the { source IP address and the destination IP address } to form a bidirectional flow feature set of the M-dimensional features.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the step b further comprises the following steps: and optimizing the bidirectional flow feature set by using a maximum variance interpretation mechanism.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the optimizing the bidirectional flow feature set by using the maximum variance interpretation mechanism specifically comprises:

step b 5: performing standard normalization on the network traffic data;

step b 6: on the network flow data, calculating the average value of each feature on the bidirectional flow feature set;

step b 7: subtracting the average value corresponding to each feature from the normalized network flow data to obtain a new result of each feature, and performing variance normalization on the new result of each feature;

step b 8: calculating a covariance matrix of the bidirectional flow feature set, and sequencing the features from small to large according to the variance value of each feature on a main diagonal in the covariance matrix to obtain the N-dimensional features with the highest and closest association degree in the bidirectional flow feature set;

step b 9: calculating eigenvalues and eigenvectors of the covariance matrix, sorting the eigenvalues according to sizes, and selecting eigenvectors corresponding to the first N optimized bidirectional flow characteristics;

step b 10: projecting the network traffic data onto the N eigenvectors;

step b 11: and optimizing the M-dimensional bidirectional flow feature set of the network traffic data into an N-dimensional bidirectional flow feature set.

Another technical scheme adopted by the embodiment of the application is as follows: a network traffic classification system comprising:

a data acquisition module: the system is used for collecting network flow data;

a data preprocessing module: the system is used for labeling the network flow data;

a feature extraction module: the bidirectional flow characteristic set is used for extracting a bidirectional flow characteristic set according to the network flow data subjected to the labeling processing;

a model construction module: and the bidirectional flow feature set is used for constructing a classification model based on the bidirectional flow feature set, and outputting a classification result of the network flow data through the classification model.

The technical scheme adopted by the embodiment of the application further comprises the following steps:

the data acquisition module specifically acquires network traffic data and comprises: selecting application types in network flow, and collecting a network flow data packet corresponding to each application and a system network log corresponding to a time period;

the data preprocessing module is used for labeling the network traffic data and specifically comprises the following steps: analyzing the network flow data packet, and finding out the natural attribute of each application and the IP address and the transmission protocol communicated with other applications; and extracting the IP end points and the transmission packet number associated with each application in the system network log, and performing association fusion by combining an IP address and a transmission protocol to finish the labeling processing of the network flow data.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the feature extraction module specifically extracts a bidirectional flow feature set according to the labeled network traffic data, and includes:

analyzing according to the labeled network traffic data, and respectively counting bidirectional network flow information between each pair of { source IP address, destination IP address } and { destination IP address- > source IP address } based on different port numbers in the network traffic data;

finding out forward network flows between each pair of { source IP address- > destination IP address }, and extracting all forward network flow feature sets from the forward network flows;

finding out reverse network flows between each pair of { destination IP address- > source IP address }, and extracting all reverse network flow feature sets from the reverse network flows;

and combining the forward and reverse network flow feature sets between each pair of the { source IP address and the destination IP address } to form a bidirectional flow feature set of the M-dimensional features.

The technical scheme adopted by the embodiment of the application further comprises a feature optimization module, wherein the feature optimization module is used for optimizing the bidirectional flow feature set by utilizing a maximum variance interpretation mechanism.

The technical scheme adopted by the embodiment of the application further comprises the following steps: the feature optimization module specifically optimizes the bidirectional flow feature set by using a maximum variance interpretation mechanism, and comprises the following steps:

performing standard normalization on the network traffic data;

on the network flow data, calculating the average value of each feature on the bidirectional flow feature set;

subtracting the average value corresponding to each feature from the normalized network flow data to obtain a new result of each feature, and performing variance normalization on the new result of each feature;

calculating a covariance matrix of the bidirectional flow feature set, and sequencing the features from small to large according to the variance value of each feature on a main diagonal in the covariance matrix to obtain the N-dimensional features with the highest and closest association degree in the bidirectional flow feature set;

calculating eigenvalues and eigenvectors of the covariance matrix, sorting the eigenvalues according to sizes, and selecting eigenvectors corresponding to the first N optimized bidirectional flow characteristics;

projecting the network traffic data onto the N eigenvectors;

and optimizing the M-dimensional bidirectional flow feature set of the network traffic data into an N-dimensional bidirectional flow feature set.

The embodiment of the application adopts another technical scheme that: an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the one processor to cause the at least one processor to perform the following operations of the network traffic classification method described above:

step a: collecting network flow data and labeling the network flow data;

Compared with the prior art, the embodiment of the application has the advantages that: the network traffic classification method, the network traffic classification system and the electronic equipment in the embodiment of the application classify the network traffic by using the bidirectional flow characteristics in the network traffic data, and can accurately identify and classify a large number of new applications in the internet; meanwhile, the method of the maximum variance interpretation mechanism is used for carrying out optimization association on the bidirectional flow characteristics, so that the high cohesion of the bidirectional flow characteristics is guaranteed, the classification accuracy is improved, and the high precision and the high performance of network flow classification can be effectively guaranteed.

Drawings

Fig. 1 is a flowchart of a network traffic classification method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a process of collecting and labeling network traffic data;

FIG. 3 is a schematic diagram of a bidirectional flow feature set extraction and optimization process;

fig. 4 is a schematic structural diagram of a network traffic classification system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a hardware device of a network traffic classification method according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Please refer to fig. 1, which is a flowchart illustrating a network traffic classification method according to an embodiment of the present application. The network traffic classification method of the embodiment of the application comprises the following steps:

step 100: collecting network flow data and labeling the network flow data;

in step 100, the process of collecting and labeling network traffic data is shown in fig. 2, and the specific steps are as follows:

step 101: selecting an application category in the network traffic;

step 102: continuously capturing fixed application traffic through high-performance network monitoring software;

step 103: collecting a network flow data packet corresponding to each application type and a system network log of a corresponding time period;

step 104: analyzing the network flow data packet, and finding out the natural attribute of each application and key information communicated with other applications, such as an IP address, a transmission protocol and the like;

step 105: and extracting the IP end points and the transmission packet number associated with each application in the system network log, and performing association fusion by combining the IP address and the transmission protocol to finish the labeling processing of the network flow data.

Step 200: extracting a bidirectional flow characteristic set from the labeled network flow data, and optimizing the bidirectional flow characteristic set by using a maximum variance interpretation mechanism;

in step 200, the process of extracting and optimizing the bidirectional flow feature set is shown in fig. 3, and specifically includes the following steps:

step 201: analyzing according to the labeled network flow data, and respectively counting bidirectional (forward and reverse) network flow information between each pair of { source IP address, destination IP address } and { destination IP address- > source IP address } based on different port numbers in the network flow data, wherein each pair of { source IP address, destination IP address } has two network flow information in opposite directions;

step 202: finding forward network flows between each pair of { source IP address- > destination IP address }, and extracting all forward network flow feature sets F1 in each forward network flow;

step 203: finding out reverse network flows between each pair of { destination IP address- > source IP address }, and extracting all reverse network flow feature sets F2 in each reverse network flow;

step 204: combining forward and reverse network flow feature sets { F1, F2} between each pair of { source IP address, destination IP address }, to form a bidirectional flow feature set F of M-dimensional features, denoted as F { F1, F2 };

in step 204, a uniform optimization is performed by combining all the forward and reverse network flow feature sets.

Step 205: performing standard normalization on the network flow data, and normalizing the network flow data set into a data set with a mean value of 0 and a variance of 1; the normalized formula is: x ═ x/δ, where u is the mean of all network traffic data and δ is the standard deviation of all network traffic data;

step 206: on the network flow data, the average value of each feature on a bidirectional flow feature set F is obtained;

step 207: subtracting the average value corresponding to each feature from the normalized network flow data to obtain a new result of each feature, and performing variance normalization on the new result of each feature;

step 208: calculating a covariance matrix of a bidirectional flow feature set F, and sequencing the covariance matrix from small to large according to a variance value of each feature on a main diagonal in the covariance matrix to obtain an N-dimensional feature with the highest and closest relevance in the bidirectional flow feature set F;

in step 208, the covariance between every two features is on the main diagonal, and the covariance is greater than 0, which indicates that the two features are in a positive correlation trend; the covariance is less than 0, which indicates that the two characteristics are in a negative correlation trend; covariance equal to 0, indicating independence between the two features; the larger the absolute value of the covariance, the tighter the connection between two features and vice versa. According to the 5 conditions, the N-dimensional features with the highest and closest relevance in the bidirectional flow feature set F can be calculated. The method and the device utilize a maximum variance interpretation mechanism to perform priority combination on the features with the closest association degree on the bidirectional network flow feature sets in the network flow data, and screen out the feature sets which can most embody the network flow categories.

Step 209: calculating eigenvalues and eigenvectors of the covariance matrix, sorting the eigenvalues according to sizes, and selecting eigenvectors corresponding to the first N optimized bidirectional flow characteristics;

step 210: projecting the network flow data to the selected N eigenvectors: assuming that the sample number of the network traffic data is p, the feature number is q, a sample matrix obtained by subtracting a feature mean value from the network traffic data is DataTransform (p × q), a covariance matrix of a bidirectional flow feature set is p × q, and a matrix formed by N selected feature vectors is EigenVectors (q × N), the projected network traffic data is: OptimizeData (p × N) ═ DataTransform (p × q) X EigenVectors (q × N);

in step 210, by projecting the network traffic data onto the feature vector corresponding to the optimized bidirectional flow feature, the degree of polymerization of the data can be improved, the influence of noise data can be reduced, and the classification accuracy can be improved.

Step 211: and optimizing the M-dimensional bidirectional flow feature set of the network traffic data into an N-dimensional bidirectional flow feature set.

Step 300: based on the optimized bidirectional flow characteristic set, a classification model is constructed by adopting a random forest algorithm of supervised learning, and a classification result of the network flow data is output through the classification model;

in step 300, a random forest algorithm of supervised learning is adopted for modeling, the optimized bidirectional flow feature set is input into a classification model for classification training, and the performance of the classification model is optimized through performance evaluation of the classification model. The trained classification model is tested by using the test data set in the verification stage, and the test result shows that the classification model constructed based on the optimized bidirectional flow characteristic set obviously has very high classification precision, so that the classification efficiency can be improved on the premise of ensuring higher classification accuracy, and the overall performance is improved.

Please refer to fig. 4, which is a block diagram of a network traffic classification system according to an embodiment of the present application. The network traffic classification system comprises a data acquisition module, a data preprocessing module, a feature extraction module, a feature optimization module and a model construction module.

A data acquisition module: the system is used for collecting network flow data; the network flow data acquisition mode comprises the following steps: selecting application types in the network flow, continuously capturing fixed application type flow through high-performance network monitoring software, and collecting network flow data packets corresponding to each application type and system network logs corresponding to a time period.

A data preprocessing module: the system is used for labeling the network flow data; the labeling process of the network traffic data specifically comprises the following steps: analyzing the network flow data packet, and finding out the natural attribute of each application and key information communicated with other applications, such as an IP address, a transmission protocol and the like; and extracting the IP end points and the transmission packet number associated with each application in the system network log, and performing association fusion by combining the IP address and the transmission protocol to finish the labeling processing of the network flow data.

A feature extraction module: the bidirectional flow feature set is used for extracting the bidirectional flow feature set from the labeled network flow data; specifically, the bidirectional flow feature set extraction method includes:

a. analyzing according to the labeled network flow data, and respectively counting bidirectional (forward and reverse) network flow information between each pair of { source IP address, destination IP address } and { destination IP address- > source IP address } based on different port numbers in the network flow data, wherein each pair of { source IP address, destination IP address } has two network flow information in opposite directions;

b. finding forward network flows between each pair of { source IP address- > destination IP address }, and extracting all forward network flow feature sets F1 in each forward network flow;

c. finding out reverse network flows between each pair of { destination IP address- > source IP address }, and extracting all reverse network flow feature sets F2 in each reverse network flow;

d. the forward and reverse network flow feature sets { F1, F2} between each pair of { source IP address, destination IP address } are combined to form a bi-directional flow feature set F of M-dimensional features, denoted as F { F1, F2 }.

A feature optimization module: the device is used for optimizing the extracted bidirectional flow characteristic set by utilizing a maximum variance interpretation mechanism; specifically, the bidirectional flow feature set optimization method includes:

a. performing standard normalization on the network flow data, and normalizing the network flow data set into a data set with a mean value of 0 and a variance of 1; the normalized formula is: x ═ x/δ, where u is the mean of all network traffic data and δ is the standard deviation of all network traffic data;

b. on the network flow data, the average value of each feature on a bidirectional flow feature set F is obtained;

c. subtracting the average value corresponding to each feature from the normalized network flow data to obtain a new result of each feature, and performing variance normalization on the new result of each feature;

d. calculating a covariance matrix of a bidirectional flow feature set F, and sequencing the covariance matrix from small to large according to a variance value of each feature on a main diagonal in the covariance matrix to obtain an N-dimensional feature with the highest and closest relevance in the bidirectional flow feature set F; the main diagonal line is the covariance between every two characteristics, the covariance is greater than 0, and the two characteristics show positive correlation trend; the covariance is less than 0, which indicates that the two characteristics are in a negative correlation trend; covariance equal to 0, indicating independence between the two features; the larger the absolute value of the covariance, the tighter the connection between two features and vice versa. According to the 5 conditions, the N-dimensional features with the highest and closest relevance in the bidirectional flow feature set F can be calculated. The method and the device utilize a maximum variance interpretation mechanism to perform priority combination on the features with the closest association degree on the bidirectional network flow feature sets in the network flow data, and screen out the feature sets which can most embody the network flow categories.

e. Calculating eigenvalues and eigenvectors of the covariance matrix, sorting the eigenvalues according to sizes, and selecting eigenvectors corresponding to the first N optimized bidirectional flow characteristics;

f. projecting the network flow data to the selected N eigenvectors: assuming that the sample number of the network traffic data is p, the feature number is q, a sample matrix obtained by subtracting a feature mean value from the network traffic data is DataTransform (p × q), a covariance matrix of a bidirectional flow feature set is p × q, and a matrix formed by N selected feature vectors is EigenVectors (q × N), the projected network traffic data is: OptimizeData (p × N) ═ DataTransform (p × q) X EigenVectors (q × N);

g. and optimizing the M-dimensional bidirectional flow feature set of the network traffic data into an N-dimensional bidirectional flow feature set.

A model construction module: the method comprises the steps that a classification model is constructed by adopting a random forest algorithm of supervised learning based on an optimized bidirectional flow characteristic set, and a classification result of network flow data is output through the classification model; the method comprises the steps of modeling by adopting a random forest algorithm of supervised learning, inputting an optimized bidirectional flow characteristic set into a classification model for classification training, and optimizing the performance of the classification model through performance evaluation of the classification model. The trained classification model is tested by using the test data set in the verification stage, and the test result shows that the classification model constructed based on the optimized bidirectional flow characteristic set obviously has very high classification precision, so that the classification efficiency can be improved on the premise of ensuring higher classification accuracy, and the overall performance is improved.

Fig. 5 is a schematic structural diagram of a hardware device of a network traffic classification method according to an embodiment of the present application. As shown in fig. 5, the device includes one or more processors and memory. Taking a processor as an example, the apparatus may further include: an input system and an output system.

The processor, memory, input system, and output system may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The memory, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules. The processor executes various functional applications and data processing of the electronic device, i.e., implements the processing method of the above-described method embodiment, by executing the non-transitory software program, instructions and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the processing system over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input system may receive input numeric or character information and generate a signal input. The output system may include a display device such as a display screen.

The one or more modules are stored in the memory and, when executed by the one or more processors, perform the following for any of the above method embodiments:

step a: collecting network flow data and labeling the network flow data;

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the methods provided in the embodiments of the present application.

Embodiments of the present application provide a non-transitory (non-volatile) computer storage medium having stored thereon computer-executable instructions that may perform the following operations:

step a: collecting network flow data and labeling the network flow data;

Embodiments of the present application provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the following:

step a: collecting network flow data and labeling the network flow data;

The network traffic classification method, the network traffic classification system and the electronic equipment in the embodiment of the application classify the network traffic by using the bidirectional flow characteristics in the network traffic data, and can accurately identify and classify a large number of new applications in the internet; meanwhile, the method of the maximum variance interpretation mechanism is used for carrying out optimization association on the bidirectional flow characteristics, so that the high cohesion of the bidirectional flow characteristics is guaranteed, the classification accuracy is improved, and the high precision and the high performance of network flow classification can be effectively guaranteed.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A network traffic classification method is characterized by comprising the following steps:

step a: collecting network flow data and labeling the network flow data;

step c: constructing a classification model based on the bidirectional flow feature set, and outputting a classification result of the network traffic data through the classification model;

the step b further comprises the following steps: optimizing the bidirectional flow feature set by using a maximum variance interpretation mechanism;

the optimizing the bidirectional flow feature set by using the maximum variance interpretation mechanism specifically comprises:

step b 5: performing standard normalization on the network traffic data;

step b 10: projecting the network traffic data onto the N eigenvectors;

2. The method for classifying network traffic according to claim 1, wherein in the step a, the collecting network traffic data and labeling the network traffic data specifically include:

step a 1: selecting an application category in the network traffic;

3. The method according to claim 2, wherein in the step b, the extracting a bidirectional flow feature set according to the labeled network traffic data specifically includes:

4. A network traffic classification system, comprising:

a data acquisition module: the system is used for collecting network flow data;

a model construction module: the bidirectional flow feature set is used for constructing a classification model based on the bidirectional flow feature set, and a classification result of the network flow data is output through the classification model;

the system also comprises a feature optimization module, wherein the feature optimization module is used for optimizing the bidirectional flow feature set by utilizing a maximum variance interpretation mechanism;

the feature optimization module specifically optimizes the bidirectional flow feature set by using a maximum variance interpretation mechanism, and comprises the following steps:

performing standard normalization on the network traffic data;

projecting the network traffic data onto the N eigenvectors;

5. The network traffic classification system of claim 4,

6. The network traffic classification system according to claim 5, wherein the extracting, by the feature extraction module, the bidirectional flow feature set according to the labeled network traffic data specifically includes:

7. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of classifying network traffic of any of claims 1 to 3.