CN116155821A - ET-BERT flow classification method, storage medium and equipment based on multitask learning - Google Patents

ET-BERT flow classification method, storage medium and equipment based on multitask learning Download PDF

Info

Publication number
CN116155821A
CN116155821A CN202310084193.2A CN202310084193A CN116155821A CN 116155821 A CN116155821 A CN 116155821A CN 202310084193 A CN202310084193 A CN 202310084193A CN 116155821 A CN116155821 A CN 116155821A
Authority
CN
China
Prior art keywords
bert
bandwidth
duration
task
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310084193.2A
Other languages
Chinese (zh)
Inventor
刘兰
余永杰
吴亚峰
惠占发
陈桂铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN202310084193.2A priority Critical patent/CN116155821A/en
Publication of CN116155821A publication Critical patent/CN116155821A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an ET-BERT flow classifying method, a storage medium and equipment based on multi-task learning, which aim to assume that a plurality of learning tasks are not completely independent, and a plurality of auxiliary tasks can promote the learning of another task through hard parameter sharing, so that the requirement for a large number of marked training samples is reduced by combining a bidirectional encoder (ET-BERT model) from a converter when a main task is executed. The method comprises the steps of: acquiring a flow data set, and preprocessing the flow data set; acquiring time sequence characteristics of a data set; according to the time sequence characteristics, taking the predicted bandwidth and duration as auxiliary tasks, and pre-training an ET-BERT model; obtaining the optimal value of the bandwidth and duration frequency divider, converting the optimal value into a token for batch optimization and training by an Adam optimizer; and fine-tuning parameters in the pre-trained ET-BERT model, and carrying out main task flow category prediction by adopting the ET-BERT model after fine-tuning the parameters.

Description

ET-BERT flow classification method, storage medium and equipment based on multitask learning
Technical Field
The invention relates to the technical fields of deep learning, network traffic analysis and network space security application, in particular to an ET-BERT traffic classification method, a storage medium and equipment based on multi-task learning.
Background
Network traffic classification has a wide range of applications in today's Internet, such as resource allocation, qoS provisioning, ISP billing, anomaly detection, etc. These methods rely on human labor to continually find patterns or matching port numbers in the unencrypted payload. New approaches, such as Random Forest (RF) and K-nearest neighbor (KNN), have emerged on the basis of classical machine learning algorithms due to inefficiency and poor accuracy.
Classical machine learning algorithms have achieved the most advanced accuracy in traffic classification tasks for several years. However, these relatively simple methods cannot capture the more complex patterns present in today's Internet traffic, and therefore their accuracy has degraded. Recently, deep learning models have achieved the most advanced performance in terms of traffic classification. Their ability to learn complex patterns and perform automatic feature extraction makes them ideal choices for traffic classification.
Although deep learning methods can achieve high accuracy, they require a large amount of labeled training data. Labeling is a time consuming and cumbersome task in the network traffic classification task. To properly label each flow, researchers typically isolate and capture the flow of each class in a controlled environment with minimal background traffic. This process is time consuming and laborious. Furthermore, the observed flow patterns in a controlled environment may be quite different from the actual flow, which makes the inference inaccurate.
Disclosure of Invention
In order to overcome the technical defects, the invention provides an ET-BERT traffic classification method, a storage medium and equipment based on multi-task learning, which can reduce the requirement for a large number of marked training samples in network traffic classification tasks.
In order to solve the problems, the invention is realized according to the following technical scheme:
in a first aspect, the present invention provides an ET-BERT traffic classification method based on multitask learning, comprising the steps of:
acquiring a flow data set, and preprocessing the flow data set;
acquiring time sequence characteristics of a data set;
according to the time sequence characteristics, taking the predicted bandwidth and duration as auxiliary tasks, and pre-training an ET-BERT model;
obtaining the optimal value of the bandwidth and duration frequency divider, converting the optimal value into a token for batch optimization and training by an Adam optimizer;
and fine-tuning parameters in the pre-trained ET-BERT model, and carrying out main task flow category prediction by adopting the ET-BERT model after fine-tuning the parameters.
As an improvement to the above solution, the pre-training of the ET-BERT model further comprises multiplying the input of the traffic class softmax layer by the mask vector.
As an improvement of the above solution, the obtaining the optimal value of the bandwidth and duration divider comprises the steps of:
dividing the bandwidth and duration values into five classes and finding the average duration of each class;
sequencing the average value of each class of the bandwidth from top to bottom to obtain a bandwidth intermediate point between every two continuous bandwidth average values, wherein the bandwidth intermediate point is an optimal value obtained by a bandwidth data set;
the average value of each class of duration is ordered from top to bottom, resulting in a duration intermediate point between two consecutive duration averages, which is the best value obtained by the duration dataset.
As an improvement of the scheme, the method for converting the optimal value into the Token to perform batch optimization and training by an Adam optimizer by adopting a Token3Embedding method comprises the following steps:
converting the optimal value of the bandwidth and duration frequency divider into hexadecimal sequences, and encoding the sequences;
the token is represented by a byte pair code, and a special tag for the token is added to the code sequence.
As an improvement of the above solution, the time series features are packet length, arrival time and payload of the packets.
As an improvement of the above scheme, the multitask learning parameter tuning formula in the ET-BERT model is expressed as:
Figure BDA0004068457650000021
wherein ,
Figure BDA0004068457650000022
the weight in the common linear regression is equivalent to the weight in the common linear regression, the weight lambda of the loss function is the weight of the importance of the main flow class prediction task, the rho is a regularized weight factor, the model coefficient is reduced, the complexity of the model is reduced, and the overfitting phenomenon W= [ W ] is prevented 1 ,w 2 ,...,w k ] nxk Is a weight matrix under multitask learning, < ->
Figure BDA0004068457650000023
w i =[W i,1 ,W i,2 ,…,W i,k ],A i Input representing the ii-th data sample, +.>
Figure BDA0004068457650000024
and />
Figure BDA0004068457650000025
Representing the respective outputs of bandwidth, duration and traffic class prediction tasks, denoted B, D and T, respectively.
In a second aspect, the present invention provides a computer readable storage medium having stored therein at least one instruction, at least one program, code set or instruction set, which is loaded and executed by a processor to implement the ET-BERT traffic classification method based on multitasking learning as described in the first aspect.
In a third aspect, the present invention provides an apparatus comprising a processor and a memory having stored therein at least one instruction, at least one program, code set or instruction set loaded and executed by the processor to implement the ET-BERT traffic classification method based on multitasking learning as described in the first aspect.
Compared with the prior art, the invention has the following beneficial effects:
according to the method and the device, bandwidth and duration are used as auxiliary tasks to pretrain an ET-BERT model according to the acquired time sequence characteristics, parameters of the ET-BERT model are finely adjusted, main task flow category prediction is carried out by using the ET-BERT model, flow category prediction can be improved, manual marking of a data set is reduced, and the requirement of a large number of marking training samples in network flow classification tasks is reduced.
Drawings
The invention is described in further detail below with reference to the attached drawing figures, wherein:
FIG. 1 is a flow chart of an ET-BERT flow classification method based on multi-task learning in one embodiment of the present application;
FIG. 2 is a schematic diagram of a multi-task learning in one embodiment of the present application;
FIG. 3 is a schematic flow chart of step S4 according to one embodiment of the present application;
FIG. 4 is an architecture diagram of an ET-BERT model architecture based on a multitasking learning framework in one embodiment of the present application.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
It should be noted that, the numbers mentioned herein, such as S1 and S2 … …, are merely used as distinction between steps and do not represent that the steps must be strictly performed according to the order of the numbers.
The invention provides an ET-BERT flow classification method based on a multi-task learning framework, which aims to assume that a plurality of learning tasks are not completely independent, and a plurality of auxiliary tasks can promote the learning of another task through hard parameter sharing, so that the requirement for a large number of marked training samples is reduced by combining a bidirectional encoder (ET-BERT model) from Transformers when a main task is executed.
In one embodiment, as shown in fig. 1, an ET-BERT traffic classification method based on multitasking learning includes the steps of:
s1: acquiring a flow data set, and preprocessing the flow data set;
in particular, the acquisition dataset is an ISCX VPN-non-VPN dataset, which is captured at the university of new torsemide, which contains the original PCAP files of several traffic types. The dataset provides fine-grained labels allowing for different classifications: based on application (e.g., AIM Chat, gmail, facebook, etc.), based on traffic type (e.g., chat, streaming media, voIP, etc.), and VPN/non-VPN. The datasets are classified into 5 classes, which have different QoS requirements and bandwidth/duration characteristics: chat, email, file transfer, streaming media, and VoIP. All flows are associated with one traffic type label, and in traffic class prediction, only a small fraction of the labels are used to predict traffic class in primary task learning.
As the data set is captured at the data link layer. Thus, it includes an ethernet header. The data link header contains information about the physical link, such as a Media Access Control (MAC) address, which is necessary for forwarding frames in the network, but is not necessary for traffic classification. Thus, in the preprocessing stage, the ethernet header is first deleted. In the transport layer section, the header lengths of the Transmission Control Protocol (TCP) and the User Datagram Protocol (UDP) are different. The former typically has a header length of 20 bytes, while the latter has a header length of 8 bytes. To make the transport layer segments uniform, zeros are injected at the end of the UDP segment header, making them equal to the length of the TCP header. The packet is then converted from bits to bytes, which helps reduce the input size. It also contains some irrelevant data packets that are not of interest and should be discarded. In particular, the data set includes some TCP segments, where the SYN, ACK, or FIN flag is set to 1, and does not contain a payload. These fragments are required for the three-way handshake process when setting up a connection or completing a connection, but they do not carry information about the application that generated them and can therefore be safely discarded. In addition, there are some Domain Name Service (DNS) fragments in the dataset. These segments are used for hostname resolution, i.e. converting URLs to IP addresses. These segments are independent of the EI-ther application identity or traffic characteristics and therefore may be omitted from the dataset.
S2: acquiring time sequence characteristics of a data set;
in particular, in the traffic classification task, only the first few packets are available, not the entire data stream, and therefore, the first few time series features are obtained by observing the first few packets, where the obtained time series features are the packet lengths, arrival times and payloads of the first k packets.
S3: according to the time sequence characteristics, taking the predicted bandwidth and duration as auxiliary tasks, and pre-training an ET-BERT model;
specifically, according to the time sequence characteristics, the predicted bandwidth and duration are used as auxiliary task training. The auxiliary task is characterized in that: 1. it is highly correlated to the primary task traffic class; 2. the tag should be readily available. The bandwidth and duration are taken as separate tasks as outputs rather than the usual traffic classification method using these values as inputs.
The ET-BERT model multitask learning model architecture uses bi-directional encoders from convertors (BERT model); in natural language processing, the model achieves optimal results for a plurality of tasks. In addition, the model is in visual languageThe widespread use of intersecting fields such as speech and computer vision has demonstrated the advantage that they use unlabeled data to help learn robust feature representations over finite labeled data. The overall architecture of the method is shown in fig. 2, with the rectified linear unit (relu) activation also serving as an activation function in the overall model. Bandwidth, duration and traffic class prediction tasks are denoted B, D and T, respectively. Wherein, there are N training data, A i Representing the input of the ith data sample,
Figure BDA0004068457650000041
and />
Figure BDA0004068457650000042
Representing the bandwidth, duration and traffic class prediction tasks. The parameter tuning target formula of the multitask learning method can be expressed as:
Figure BDA0004068457650000051
wherein
Figure BDA0004068457650000052
Equivalent to the weights within the normal linear regression,
W=[w 1 ,w 2 ,…,w k ] nxk is a weight matrix under the multi-task learning,
Figure BDA0004068457650000053
and w is i =[W i,1 ,W i,2 ,...,W i,k ]. This corresponds to a one-time row-wise thinning of the parameter matrix W, i.e. a row-wise feature selection.
Where l is the cross entropy loss function. λ is a weight representing the importance of the traffic class prediction task. Since there are far fewer training data samples for this task than for the other two auxiliary tasks, λ can be increased to slightly compensate for the deficiency in the marker data. Bandwidth and duration labels may be used for all training data. However, only a small portion of the data samples have traffic class labels.
During the training process, we multiply the input of the traffic class softmax layer by the mask vector to prevent back propagation of this task for data samples without traffic class labels.
S4: obtaining the optimal value of the bandwidth and duration frequency divider, converting the optimal value into a token for batch optimization and training by an Adam optimizer;
in one embodiment, as shown in fig. 3, the step S4 includes the following steps:
s41: dividing the bandwidth and duration values into five classes and finding the average duration of each class;
specifically, the labels defining the bandwidth and duration classes are shown in table 1, and the bandwidth and duration values are divided into five classes, namely [ bw1, bw2, bw3, bw4] and [ d1, d2, d3, d4] are bandwidth and time dividers. For example, if the bandwidth of a flow is between bw1 and bw2, then class number 2 is assigned to the flow as a label. The number of categories of bandwidth and duration prediction tasks may be different from the number of traffic categories. They may depend on the application, the scenario and the needs of the ISP. For example, an ISP that only considers the difference between short-term and long-term flows may define only two duration classes.
Class number Bandwidth B Duration D
1 B<bw1 D<d1
2 bw1<B<bw2 d1<D<d2
3 bw2<B<bw3 d2<D<d3
4 bw3<B<bw4 d3<D<d4
5 B>b4 D>d4
Table 1 bandwidth and duration class definition table
S42: sequencing the average value of each class of the bandwidth from top to bottom to obtain a bandwidth intermediate point between every two continuous bandwidth average values, wherein the bandwidth intermediate point is an optimal value obtained by a bandwidth data set;
s43: sequencing the average value of each class of duration from top to bottom to obtain a duration middle point between two continuous duration average values, wherein the duration middle point is the optimal value obtained by the duration data set;
specifically, to find the optimum value of the duration divider [ d1, d2, d3, d4], the average duration of each class is first found. Then we rank the averages from top to bottom and then find the intermediate point between two consecutive averages as [ D1, D2, D3, D4]. A similar approach is also used to obtain a bandwidth divider. These values are the best values obtained from the entire dataset. However, these small amounts of marking data are used only in the main task.
S44: converting the optimal value of the bandwidth and duration frequency divider into hexadecimal sequences, and encoding the sequences;
s45: the token is represented by a byte pair code, and a special tag for the token is added to the code sequence.
In particular, encrypted traffic differs significantly from natural language and images in that it does not contain human-understandable content and explicit semantic elements. A Token3Embedding method is provided, the found optimal values of the bandwidth and duration frequency dividers are converted into tokens similar to languages for batch optimization and training by an Adam optimizer.
The key 3Embedding method has the main principle of bidirectional coding according to the context relation of traffic bytes. The found optimal value of the bandwidth and duration divider is first converted to a hexadecimal sequence and then encoded, where each cell consists of two adjacent bytes. The tokens are then represented using byte pair encodings, with each token ranging from 0 to 65535. Specific markers [ CLS ], [ SEP ], [ PAD ] and [ MASK ] are added to the training task. As shown in fig. 4, the first token of each sequence is always [ CLS ], and the final hidden layer state associated with that token is used to represent the complete sequence of classification tasks. The token [ PAD ] is a filler symbol that meets the minimum length requirement. Tokens MASK may appear during pre-training to learn the context of the traffic. The optimal values for each set of bandwidth and duration dividers are divided into two locations for the SBP task. We indicate the location of the segment by a special flag SEP, whether it belongs to segment a or segment B. We denote segment a as position a and segment B as position a. Where position a is the network packet with the optimum value of the bandwidth allocator and position B is the network packet with the optimum value of the duration divider.
Each Token obtained by Token3Embedding method is represented by three kinds of embeddings, token embedment, location embedment and segment embedment. A complete token representation is constructed by summing the three embeddings. The complete tokenized datagram is taken as the original input. The first set of embedding vectors is randomly initialized, where the embedding dimension is h=768. After N Transformer encodings, the final token embedding is obtained.
Position embedding: since the transfer of traffic data is closely related to order, we use location embedding to ensure that the model learns to focus on the duration and bandwidth relationships of the tokens by relative location. We assign an H-dimensional vector to each input token to represent its position information in the sequence. The embedding dimension H is set to 768.
Segment embedding: the embedding dimension is set to 768 based on the segment embedding a and B learned from the input position sequence.
S5: and fine-tuning parameters in the pre-trained ET-BERT model, and carrying out main task flow category prediction by adopting the ET-BERT model after fine-tuning the parameters.
In particular, fine tuning can serve traffic classification tasks well because: 1. the auxiliary task pre-training representation is highly correlated to traffic class classification; 2. because the input of the auxiliary task pre-training model is at the data packet byte level, the main task needing to classify the packets and streams is converted into corresponding data packet byte tokens to be classified by the model; 3. the special CLS tokens output by the pre-training model a representation of the overall input traffic, which can be used directly for classification. Since the primary task model tuning and the auxiliary task pre-training model structure are substantially identical, we input the task-specific data packet byte token representation into the pre-training ET-BERT and fine tune all parameters in the end-to-end model.
When the number of training samples of one task in the multi-task learning is obviously smaller than that of other tasks, the sharing parameters of the ET-BERT model are more influenced by the tasks with rich data in the training process. Thus, increasing the weight of the loss function of a task with less data may compensate for the lack of data during training and increase the impact of this task on the training process. Increasing λ helps the model fit the traffic class prediction task until maximum accuracy is reached. However, further increasing λ reduces the accuracy of all tasks. This is because when λ is very large, the model highly overfits the flow classification training data, and therefore the data performs poorly under all task tests. Furthermore, when λ is very large, the gradient updated values become very large for traffic class prediction compared to other tasks, which makes it extremely difficult for the training process to converge to local minima without fine tuning the learning rate. This phenomenon affects the execution of all tasks. Thus, for a multitasking learning method, a suitable lambda value should be found as a hyper-parameter. A good starting point is to set λ to compare the experimental effect with the ratio of the number of samples of the bandwidth and duration tasks to the number of samples of the traffic classification tasks.
According to the method and the device, bandwidth and duration are used as auxiliary tasks to pretrain an ET-BERT model according to the acquired time sequence characteristics, parameters of the ET-BERT model are finely adjusted, main task flow category prediction is carried out by using the ET-BERT model, flow category prediction can be improved, manual marking of a data set is reduced, and the requirement of a large number of marking training samples in network flow classification tasks is reduced.
In one embodiment, a computer readable storage medium is provided, the computer readable storage medium storing a computer program, which when executed by a processor, causes the processor to implement the ET-BERT traffic classification method based on multi-task learning provided in the first aspect.
Those of ordinary skill in the art will appreciate that all or some of the steps, systems, functional modules/units in the apparatus, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. In a hardware implementation, the division between the functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed cooperatively by several physical components. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer-readable storage media, which may include computer-readable storage media (or non-transitory media) and communication media (or transitory media).
The term computer-readable storage medium includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer-readable instructions, data structures, program modules or other data, as known to those skilled in the art. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer. Furthermore, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media.
The computer readable storage medium may be an internal storage unit of the network management device according to the foregoing embodiment, for example, a hard disk or a memory of the network management device. The computer readable storage medium may also be an external storage device of the network management device, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the network management device.
In one embodiment, an apparatus is provided that includes a processor and a memory for storing a computer program; the processor is configured to execute the computer program and implement the ET-BERT traffic classification method based on the multi-task learning provided in the first aspect of the present invention when the computer program is executed.
It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field-programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. Wherein the general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims (8)

1. The ET-BERT flow classification method based on multitask learning is characterized by comprising the following steps:
acquiring a flow data set, and preprocessing the flow data set;
acquiring time sequence characteristics of a data set;
according to the time sequence characteristics, taking the predicted bandwidth and duration as auxiliary tasks, and pre-training an ET-BERT model;
obtaining the optimal value of the bandwidth and duration frequency divider, converting the optimal value into a token for batch optimization and training by an Adam optimizer;
and fine-tuning parameters in the pre-trained ET-BERT model, and carrying out main task flow category prediction by adopting the ET-BERT model after fine-tuning the parameters.
2. The ET-BERT traffic classification method based on multi-task learning of claim 1, wherein the pre-training of the ET-BERT model further comprises multiplying an input of a traffic class softmax layer by a mask vector.
3. The ET-BERT traffic classification method based on multi-task learning according to claim 1, wherein the obtaining of the optimal value of the bandwidth and duration divider comprises the steps of:
dividing the bandwidth and duration values into five classes and finding the average duration of each class;
sequencing the average value of each class of the bandwidth from top to bottom to obtain a bandwidth intermediate point between every two continuous bandwidth average values, wherein the bandwidth intermediate point is an optimal value obtained by a bandwidth data set;
the average value of each class of duration is ordered from top to bottom, resulting in a duration intermediate point between two consecutive duration averages, which is the best value obtained by the duration dataset.
4. The ET-BERT traffic classification method based on multi-task learning according to claim 1, wherein the converting the optimal value into a Token for batch optimization and training by Adam optimizer by Token3Embedding method comprises the steps of:
converting the optimal value of the bandwidth and duration frequency divider into hexadecimal sequences, and encoding the sequences;
the token is represented by a byte pair code, and a special tag for the token is added to the code sequence.
5. The ET-BERT traffic classification method based on multi-task learning according to any of claims 1 to 4, wherein the time series features packet length, arrival time and payload of the packets.
6. The ET-BERT flow classification method based on the multi-task learning according to claim 5, wherein the multi-task learning objective formula of the ET-BERT model is expressed as:
Figure FDA0004068457630000021
wherein ,
Figure FDA0004068457630000025
the weight is equal to the weight in the common linear regression, i is the weight of the loss function, and lambda is the weight of the importance of the main flow class prediction task; ρ is a regularized weight factor for reducing model coefficients and reducing model complexity, preventing overfitting;
W=[w 1 ,w 2 ,...,w k ] nxk for a weight matrix under multitasking learning,
Figure FDA0004068457630000022
w i =[W i,1 ,W i,2 ,...,W i,k ],A i representing the input of the ith data sample,
Figure FDA0004068457630000023
and />
Figure FDA0004068457630000024
Representing the respective outputs of bandwidth, duration and traffic class prediction tasks, denoted B, D and T, respectively.
7. A computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by a processor to implement the ET-BERT traffic classification method based on multitasking learning of any of claims 1 to 6.
8. An apparatus comprising a processor and a memory having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by the processor to implement the ET-BERT traffic classification method based on multi-tasking according to any of claims 1 to 6.
CN202310084193.2A 2023-01-16 2023-01-16 ET-BERT flow classification method, storage medium and equipment based on multitask learning Pending CN116155821A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310084193.2A CN116155821A (en) 2023-01-16 2023-01-16 ET-BERT flow classification method, storage medium and equipment based on multitask learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310084193.2A CN116155821A (en) 2023-01-16 2023-01-16 ET-BERT flow classification method, storage medium and equipment based on multitask learning

Publications (1)

Publication Number Publication Date
CN116155821A true CN116155821A (en) 2023-05-23

Family

ID=86350258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310084193.2A Pending CN116155821A (en) 2023-01-16 2023-01-16 ET-BERT flow classification method, storage medium and equipment based on multitask learning

Country Status (1)

Country Link
CN (1) CN116155821A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115118653A (en) * 2022-08-26 2022-09-27 南京可信区块链与算法经济研究院有限公司 Real-time service traffic classification method and system based on multi-task learning
US20220398462A1 (en) * 2021-06-14 2022-12-15 Microsoft Technology Licensing, Llc. Automated fine-tuning and deployment of pre-trained deep learning models
CN115563533A (en) * 2022-09-23 2023-01-03 哈尔滨理工大学 Encrypted flow classification system, method, computer and storage medium based on multi-task learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220398462A1 (en) * 2021-06-14 2022-12-15 Microsoft Technology Licensing, Llc. Automated fine-tuning and deployment of pre-trained deep learning models
CN115118653A (en) * 2022-08-26 2022-09-27 南京可信区块链与算法经济研究院有限公司 Real-time service traffic classification method and system based on multi-task learning
CN115563533A (en) * 2022-09-23 2023-01-03 哈尔滨理工大学 Encrypted flow classification system, method, computer and storage medium based on multi-task learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINJIE LIN等: "ET-BERT: A Contextualized Datagram Representation with Pre-training Transformers for Encrypted Traffic Classification", pages 1 - 4, Retrieved from the Internet <URL:http://arxiv.org/abs/2202.06335> *

Similar Documents

Publication Publication Date Title
Han et al. Adaptive gradient sparsification for efficient federated learning: An online learning approach
CN111226238B (en) Prediction method, terminal and server
US11689944B2 (en) Traffic flow classification using machine learning
Kim et al. Dynamic clustering in federated learning
Zhang et al. Deep learning–based network application classification for SDN
US7493346B2 (en) System and method for load shedding in data mining and knowledge discovery from stream data
US20180302297A1 (en) Methods and systems for controlling data backup
Rottenstreich et al. Optimal rule caching and lossy compression for longest prefix matching
WO2023056808A1 (en) Encrypted malicious traffic detection method and apparatus, storage medium and electronic apparatus
CN111444966A (en) Media information classification method and device
US10931538B2 (en) Machine learning algorithms for quality of service assurance in network traffic
CN116112434B (en) Router data intelligent caching method and system
CN112468324B (en) Graph convolution neural network-based encrypted traffic classification method and device
Wu et al. Online multimedia traffic classification from the QoS perspective using deep learning
CN115118653A (en) Real-time service traffic classification method and system based on multi-task learning
CN111178543B (en) Probability domain generalization learning method based on meta learning
Zai-jian et al. Internet video traffic classification using QoS features
Nekouei et al. Convergence analysis of quantized primal-dual algorithms in network utility maximization problems
Li et al. Experimental design networks: A paradigm for serving heterogeneous learners under networking constraints
Youssef et al. A novel QoE model based on boosting support vector regression
CN116155821A (en) ET-BERT flow classification method, storage medium and equipment based on multitask learning
Wang et al. On the Local Cache Update Rules in Streaming Federated Learning
Babaria et al. Flowformers: Transformer-based models for real-time network flow classification
CN116668351A (en) Quality of service prediction method, device, computer equipment and storage medium
CN113904961B (en) User behavior identification method, system, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230523