CN115913992A - Anonymous network traffic classification method based on small sample machine learning - Google Patents
Anonymous network traffic classification method based on small sample machine learning Download PDFInfo
- Publication number
- CN115913992A CN115913992A CN202211592847.4A CN202211592847A CN115913992A CN 115913992 A CN115913992 A CN 115913992A CN 202211592847 A CN202211592847 A CN 202211592847A CN 115913992 A CN115913992 A CN 115913992A
- Authority
- CN
- China
- Prior art keywords
- data
- flow
- flow sequence
- classified
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000010801 machine learning Methods 0.000 title claims abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 31
- 238000013145 classification model Methods 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims abstract description 27
- 238000013528 artificial neural network Methods 0.000 claims abstract description 17
- 238000013507 mapping Methods 0.000 claims abstract description 9
- 230000006978 adaptation Effects 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 230000004913 activation Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 2
- 238000007635 classification algorithm Methods 0.000 abstract description 8
- 238000013508 migration Methods 0.000 abstract description 3
- 230000005012 migration Effects 0.000 abstract description 3
- 238000004422 calculation algorithm Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 230000032683 aging Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 241000234282 Allium Species 0.000 description 1
- 235000002732 Allium cepa var. cepa Nutrition 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/50—Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses an anonymous network flow classification method based on small sample machine learning, which comprises the steps of mapping acquired flow data and data to be classified to a feature space through a deep neural network, using original labeled data for pre-training a deep classification model, using a small amount of newly acquired labeled data for calculating a class center of flow data features in the feature space, clustering by using the class center as a clustering center of target flow data to be classified, giving a pseudo label to the target flow data to be classified, and completing knowledge migration of the original labeled data by optimizing classification loss functions of the original labeled flow data and the target pseudo labeled data, thereby reducing the influence of data timeliness on the model and eliminating the problem of distribution difference of the training data and the data to be classified caused by data timeliness. The method solves the problem that the timeliness of the originally acquired flow sequence data is reduced due to the update of an anonymous system, so that the performance of an anonymous network flow classification algorithm is reduced.
Description
Technical Field
The invention relates to a network security technology, in particular to an anonymous network traffic classification method based on small sample machine learning.
Background
With the development of the internet, people design and develop various anonymous communication systems, and corresponding attack methods also appear. The anonymity of the Tor anonymous network can be effectively destroyed by a web site fingerprint (WF) attack method. In the website loading process, due to the loading items and other contents of different websites, different mode information exists in the traffic sequence between the client and the server, and convenience is provided for an attacker to destroy anonymity. The anonymous network traffic classification method based on deep learning is remarkably superior to a non-deep anonymous network traffic classification method in performance, deep anonymous network traffic classification needs a large amount of labeled data as a training set, and when a data set changes, such as Tor traffic data of different versions caused by updating of Tor browser versions, the changes can cause the performance of an anonymous network traffic classification algorithm to be reduced.
Currently, two methods are used for solving the problem that the anonymous network traffic classification performance is reduced due to the scarcity of labeled traffic data, namely TF (triple referencing) [1] and TLFA (Transfer Learning referencing access) [2], but the TF method has the problem of large calculation amount, and the TLFA method only uses a small amount of newly acquired labeled traffic to fine-tune a pre-trained classification model, so that the model classification performance is not improved enough.
Therefore, the labeling data scarcity caused by the data set change in the anonymous network traffic classification brings great challenges to the practical performance and deployment application of the algorithm.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the defects in the prior art and provides an anonymous network traffic classification method based on the machine learning of a small sample,
the problem of timeliness of the data set caused by data set change in anonymous network traffic classification brings a great challenge to the practical deployment and application of the algorithm. To address the above challenges, this document is based on clustering assumptions: the method comprises the steps that samples belonging to the same cluster in clustering belong to the same category, an anonymous network flow classification algorithm based on clustering analysis is provided, originally collected flow data, a small amount of newly collected labeled data and data to be classified are mapped to a feature space through a deep neural network, the category center of the newly collected labeled data is calculated in the feature space, the category center is used as the clustering center of the target flow data to be classified for clustering, a pseudo label is given to the target flow data, and the knowledge migration of the originally labeled data is completed by optimizing the classification loss function of the originally labeled flow data and the target pseudo labeled data, so that the influence of data timeliness on a model is reduced.
The technical scheme is as follows: the invention discloses an anonymous network traffic classification method based on small sample machine learning, which comprises the following steps:
step (1), collecting network flow to obtain an original flow sequence X s Newly collected small amount of marked flow X' s And a sequence of flows X to be classified t ;
Wherein, the original flow sequence X s The data of (A) are marked with:n is the number of the original flow sequence data, and>and &>Respectively representing the records and the corresponding labels of the flow sequence; the newly collected small amount of annotation traffic is expressed as: />The flow sequence to be classified is represented as: />N and m are the data of a small number of newly acquired marked samples and the number of data samples to be classified respectively;
step (2) constructing a classification model
Splicing a feature extractor G and a task classifier C to form a classification model, wherein the feature extractor G adopts a deep convolution network, and the task classifier C comprises two layers of fully-connected neural networks;
step (3) pre-training classification model
The marked original flow sequence X s Inputting the data into the depth model (classification model), calculating a classification loss function based on the obtained original flow data class prediction probability and the real label, and pre-training the depth classification model constructed in the previous step;
step (4) training classification model
Step (4.1) will have marked original flow sequence X s And a newly collected small amount of annotation flow X' s Mapping the flow sequence characteristics to a characteristic space through a neural network, and calculating the central point of each category of the newly acquired small quantity of marked flow sequence characteristics;
step (4.2) taking the obtained category central point as a clustering central point of newly acquired flow sequence features to be classified, calculating the distance from each flow sequence feature to be classified to each clustering central point, and giving a category label of the nearest category center of the flow sequence features to be classified, wherein the category label is used as a pseudo label of the flow sequence to be classified;
step (4.3) mapping the features of the feature space by a classifier to obtain class prediction probability, and calculating a clustering loss function according to the pseudo label and the prediction probability; updating the network weight of the feature extractor G and the task classifier C according to the obtained cluster adaptation loss;
the steps (4.1) to (4.3) are circulated for multiple times to finish model training; finally, the feature center of the newly acquired flow sequence in the feature space is aligned with the feature center of the original flow sequence, so that the features of the same category are mapped to the same region by the classifier, and the problem of performance reduction of a deep anonymous network flow classification algorithm caused by training data aging is effectively solved.
Further, the structures of the feature extractor G and the task classifier C in the step (2) are as follows:
the feature extractor G is provided with three convolution modules, wherein the first convolution module comprises two convolution layers, the second convolution modules comprise three convolution layers, a maximum pooling layer (Max boosting) and a Dropout layer are adopted behind each convolution module, an ELU activation function is adopted in each convolution module, and the activation function is beneficial to shortening training time and improving accuracy in a neural network; the task classifier C adopts two layers of fully-connected neural networks, and a dropout layer is added behind each layer of network, so that the overfitting problem is avoided.
Further, when the labeled original flow sequence data is pre-trained by using the classification model in the step (3), the classification loss function is calculated as follows, as in the conventional supervised deep model training:
wherein y' s Predicted probability output, y, for the classifier on the original traffic data belonging to each class s Is a true tag (one-hot encoded version) of the traffic,represents the cross entropy loss function, calculated as follows:
where p (x) represents the prediction probability that sample x belongs to each class and q (x) represents the one-hot coding of the true label of sample x.
Further, the specific calculation method of the clustering center in the step (4.1) is as follows:
given a newly acquired small amount of flow sequence data input asAssuming that the original flow sequence data has K categories, there is a clustering center C k Comprises the following steps:
wherein, f' i =G(x′ i ) When y' i When k is not less than k, I i =1, otherwise I i =0。n k And the number of samples of the original flow sequence data with the label of K belongs to K e {1,2,3, …, K }.
Further, the method for calculating the pseudo label of the traffic sequence to be classified in the step (4.2) comprises the following steps:
after the newly acquired flow sequence passes through a neural network for mapping the original flow sequence, the distance between the new flow sequence characteristic and the clustering center is measured by adopting cosine similarity in a characteristic space, and the distance is calculated as follows:
Calculating the distance between each newly collected sample and all cluster centers; then, the class of the nearest clustering center of the newly acquired flow sequence is given, a pseudo label is given to the new flow sequence in each class cluster, and the pseudo label is obtained in the following way:
further, the step (4.3) of calculating the cluster adaptation loss and updating the network weight comprises the following specific processes:
the clustering loss function is calculated as follows:
whereinFor the classifier on the newly acquired flow sequence->In a respective category, a prediction probability output in conjunction with which a decision is made>A pseudo tag (one-hot encoded form) obtained by the above formula (5);
the overall optimization objective function of the final anonymous network traffic classification algorithm is as follows:
min G,C L=L clu (x s ,y s )+λL clu (x t ) (7)
wherein, the lambda is a hyper-parameter of the classification loss and the clustering loss in the balance training.
Has the advantages that: the method comprises the steps of mapping originally acquired flow data, small sample marking data and data to be classified to a feature space through a deep neural network, calculating a class center of the small sample marking data in the feature space, clustering by taking the class center as a clustering center of target flow data to be classified, giving a pseudo label to the target flow data, using the originally acquired marking flow data for depth model pre-training, and optimizing classification loss functions of the originally marked flow data and the target pseudo marking data to finish knowledge migration of the originally marked data and reduce the influence of data failure on a model.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of a feature extractor of the present invention;
FIG. 3 is a diagram of a task classifier according to the present invention.
Detailed Description
The technical solution of the present invention is described in detail below, but the scope of the present invention is not limited to the embodiments.
To address the above challenges, the present invention is based on clustering assumptions: the algorithm maps originally acquired and newly acquired labeled flow data and data to be classified into a feature space through a deep neural network, calculates a class center of the newly acquired labeled data in the feature space, clusters the class center serving as a cluster center of target flow data to be classified, gives a pseudo label to the target flow data, and optimizes classification loss functions of the originally labeled flow data and the target pseudo labeled data, so that the knowledge transfer of the originally labeled data is completed, and the influence of data on the model performance due to the aging problem is reduced.
The invention realizes effective anonymous network traffic classification by the following technical characteristics, and solves the problem of reduced anonymous network traffic classification performance caused by the distribution difference of training data and test data:
1. based on clustering assumptions: samples in the clusters belonging to the same type of clusters belong to the same type, and the target flow sequence cluster center is aligned with the original labeled data type center in the feature space, so that the problem of distribution difference caused by data timeliness is solved.
2. Extracting flow sequence characteristics by using a deep network, aligning an original labeled data class center and a flow sequence clustering center to be classified in a characteristic space, and optimizing a model by using an end-to-end mode
As shown in fig. 1, the anonymous network traffic classification method based on small sample machine learning of this embodiment includes the following steps:
step (1), collecting network flow to obtain an original flow sequence X s Newly collected small amount of marked flow X' and flow sequence X to be classified t ;
Wherein, the original flow sequence X s The data of (A) are marked with:n is the number of the original flow sequence data, and>and &>Respectively representing the records and the corresponding labels of the flow sequences; the small amount of tagged traffic newly collected is expressed as:the flow sequence to be classified is represented as: />N and m are the data of a small number of newly collected labeled samples and the number of samples of data to be classified respectively;
collecting a small amount of new access flow by using a packet capturing tool, converting the captured flow into an available format, and marking a label of a corresponding website on each flow;
step (2) of constructing a classification model
Splicing a feature extractor G and a task classifier C to form a classification model, wherein the feature extractor G adopts a deep convolution network, and the task classifier C comprises two layers of fully-connected neural networks;
step (3) pre-training classification model
The marked original flow sequence X s Inputting the data into the classification model, calculating a classification loss function based on the obtained original flow data class prediction probability and the real label, and pre-training the depth classification model constructed in the previous step;
step (4), training classification model
Step (4.1) will have marked original flow sequence X s And a newly collected small amount of annotation flow X' s Mapping the sample characteristics to a characteristic space through a neural network, and calculating the category center points of all categories of the newly acquired labeled sample characteristics;
step (4.2) using the obtained category central point as a clustering central point of the flow sequence features to be classified to calculate the distance from each flow sequence feature to be classified to each clustering central point, and giving a category label of the nearest category center of the sequence features to be classified, wherein the label is a pseudo label of the sequence features to be classified;
step (4.3) mapping the characteristics of the characteristic space by a classifier to obtain a class prediction probability, and calculating a clustering loss function through a pseudo label and the prediction probability; updating the network weight of the feature extractor G and the task classifier C according to the obtained cluster adaptation loss;
and (4) circulating the steps (4.1) to (4.3) for multiple times to finish model training.
The above-described embodiments use a well-collected test set to perform performance evaluation on the classification algorithm.
The detailed process of the algorithm comprises the following steps:
the embodiment is as follows:
the present embodiment is based on the anonymous communication system Tor as an environment for acquiring traffic. Tor is based on an onion routing technology, data packets of anonymous network users are transmitted through a plurality of proxy nodes, source IP, target IP and information in the data packets are encrypted, so that the real source and destination of the data packets cannot be tracked, and privacy information of the users is effectively protected. In an actual application scenario, due to the problems of the Tor-browser version (TBB), the setting of the Tor-browser, the aging between newly acquired data and originally acquired data, and the like, distribution difference exists between the originally acquired data and the newly acquired data, and the difference causes performance degradation of an anonymous network traffic classification model on originally acquired traffic sequence data, so that the requirements of actual application are difficult to meet, and time and labor are consumed for re-acquiring labeled data trained by a deep anonymous network traffic classification model. The present embodiment solves this problem by the following steps.
Step (1) of collecting network flow
Downloading a source code of Tor agent service from an official site https of Tor,// www.torproject.org/download/Tor, uploading the source code to a cloud server and installing, collecting access flow by using a packet capturing tool, simulating a Tor user network browsing habit, then accessing a website, capturing flow between a user and a Tor network first-hop node, converting the captured flow into an available format, marking a label of a corresponding website on each flow, and dividing a training set and a testing set for a collected flow data set. For the convenience of description of the subsequent experiment, the originally acquired flow sequence data is marked and expressed asDividing a small number of marked traffic into ^ or ^ in a test set>The remaining flows of the test set are taken as a flow sequence to be classified and are denoted as->/>
When collecting flow data, according to the general settlement of industry, divide into two types with the website of visiting: a monitoring website and a non-monitoring website. The monitored websites refer to websites in which the attacker is interested, the non-monitored websites are websites which are not visited by the user or in which the attacker is not interested, and the data set composition is shown in table 1.
Table 1: tor network traffic data set
Step (2) constructing a classification model
And splicing the feature extractor G and the task classifier C to form a classification model, wherein the model structure of the extractor G is shown in FIG. 2, and the model structure of the task classifier is shown in FIG. 3. The feature extractor G is composed of a convolution neural network, and the task classifier C is composed of two layers of fully-connected neural networks.
Step (3) pre-training classification model
The marked original flow sequence dataInputting the data into the depth model, calculating a classification loss function based on the obtained original flow data class prediction probability and the real label, minimizing the classification loss function of the depth classification model constructed in the previous step based on a random gradient descent algorithm, completing pre-training of the model, and calculating a loss value as shown in a formula (1):
where p (x) represents the prediction probability that sample x belongs to each class and q (x) represents the one-hot coding of the true label of sample x.
Step (4) training classification model
Flow sequence with label newly collectedBy the steps ofMapping the pre-trained neural network into a feature space, and calculating the central point of each category of the newly acquired flow sequence features as shown in the following formula (3):
wherein f' i =G(x′ i ) When y' i When k is not less than k, I i =1, otherwise I i =0。n k And the number of samples of the original flow sequence data with the label of K belongs to {1,2,3, …, K }.
Taking the obtained category central point as a clustering central point of the flow sequence characteristics to be collected, calculating the distance from each flow sequence characteristic to be classified to each clustering central point,
calculating the distance between each newly collected sample and all cluster centers; then, the class of the nearest clustering center of the newly acquired flow sequence is given, a pseudo label is given to the new flow sequence in each class cluster, and the pseudo label is obtained in the following way:
the class prediction probability is obtained after the features of the feature space are mapped by a classifier, and a cluster adaptation loss function is calculated through the pseudo label and the prediction probability, wherein the cluster adaptation loss function is shown in a formula (6) as follows:
whereinFor the classifier on the newly acquired flow sequence->Is in each category, is output based on the prediction probability in the respective category>A pseudo tag (one-hot encoded form) obtained by the above formula (5);
the overall optimization objective function of the final anonymous network traffic classification algorithm is shown in the following formula (7):
min G,C L=L clu (x s ,y s )+λL clu (x t ) (7)
wherein, lambda is a hyper-parameter of the classification loss and the clustering loss in the balance training. And updating the network weights of the feature extractor G and the task classifier C based on a random gradient descent algorithm. And circulating the process for multiple times to finish model training.
And (3) selecting a five-fold cross validation and grid search method to carry out hyper-parameter tuning on the classifier based on the training set, the validation set and the test set data set in the step one, and determining the optimal hyper-parameter in the classification process. And (3) evaluating the classification effect of the classification model by using the test set data, and calculating a classification accuracy index, wherein the result is shown in table 2.
Table 2: the classification effect of different models (N-shot represents that N newly collected samples with labels are provided, and the evaluation index in the table is classification accuracy (%))
TF [1] in Table 2 refers to More reactive and portable website formatting with n-shot learning of P.Sirinam et al, and TLFA [2] refers to Few-shot website formatting ack of M.Chen et al.
The embodiment shows that a large amount of labeled training data are needed for deep anonymous network traffic classification, and the anonymous network system such as version update of a Tor browser and the like can bring about the reduction of the effectiveness of the labeled data, so that the performance of the current deep anonymous network traffic classification algorithm is reduced. The algorithm is based on clustering assumptions: samples belonging to the same type of clusters in the clusters belong to the same category, and the target flow sequence clustering center is aligned with the original labeled data category center in the feature space, so that the problem of distribution difference caused by data timeliness is solved, and the anonymous network flow classification performance is effectively improved.
Claims (6)
1. An anonymous network traffic classification method based on small sample machine learning is characterized by comprising the following steps: the method comprises the following steps:
step (1), collecting network flow to obtain an original flow sequenceThe newly collected small number of marked flow>And the flow sequence to be classified->;
Wherein the original flow sequenceThe data of (A) are marked with: />,/>Means the number of the original flow sequence data, and->And &>Respectively representing the records and the corresponding labels of the flow sequence; the small amount of tagged traffic newly collected is expressed as:the flow sequence to be classified is expressed as: />,/>Respectively marking the data of a small number of newly collected marked samples and the number of data samples to be classified;
step (2) constructing a classification model
Splicing a feature extractor G and a task classifier C to form a classification model, wherein the feature extractor G adopts a deep convolution network, and the task classifier C comprises two layers of fully-connected neural networks;
step (3) pre-training classification model
Original flow sequence with labelInputting the data into a classification model, calculating a classification loss function based on the obtained original flow data class prediction probability and a real label, and pre-training the depth classification model constructed in the previous step;
step (4) training classification model
Step (4.1) will have the original flow sequence markedAnd a newly acquired small number of marked flow>Mapping the flow sequence characteristics to a characteristic space through a neural network, and calculating each category center of the newly acquired small quantity of labeled flow sequence characteristicsPoint;
step (4.2) taking the obtained category central point as a clustering central point of newly acquired flow sequence features to be classified, calculating the distance from each flow sequence feature to be classified to each clustering central point, and giving a category label of the nearest category center of the flow sequence features to be classified, wherein the category label is used as a pseudo label of the flow sequence to be classified;
step (4.3) mapping the features of the feature space by a classifier to obtain class prediction probability, and calculating a clustering loss function according to the pseudo label and the prediction probability; updating the network weight of the feature extractor G and the task classifier C according to the obtained cluster adaptation loss;
and (4) circulating the steps (4.1) to (4.3) for multiple times to finish model training.
2. The anonymous network traffic classification method based on small sample machine learning of claim 1, wherein: the structures of the feature extractor G and the task classifier C in the step (2) are as follows:
the feature extractor G is provided with three convolution modules, wherein the first convolution module comprises two convolution layers, the last two convolution modules comprise three convolution layers, a maximum pooling layer and a Dropout layer are adopted behind each convolution module, and an ELU activation function is adopted in each convolution module; and the task classifier C adopts two layers of fully-connected neural networks, and a dropout layer is added behind each layer of network.
3. The anonymous network traffic classification method based on small sample machine learning of claim 1, wherein: when the classification model is used for pre-training the marked original flow sequence data, the classification loss function is calculated as follows:
whereinFor the prediction probabilities of the classifier for each class of raw flow data, <' >>One-hot encoding of a real tag for a traffic>Represents the cross entropy loss function, calculated as follows:
4. The anonymous network traffic classification method based on small sample machine learning of claim 1, wherein: the specific calculation method of the clustering center in the step (4.1) comprises the following steps:
given a newly acquired small amount of flow sequence data input asAssuming that the original traffic sequence data has K categories, then there is a cluster center >>Comprises the following steps:
5. The anonymous network traffic classification method based on small sample machine learning of claim 1, wherein: the method for calculating the pseudo label of the traffic sequence to be classified in the step (4.2) comprises the following steps:
the cosine similarity is adopted in the feature space to measure the distance between the new flow sequence feature and the cluster center, and the distance is calculated as follows:
Calculating the distance between each newly collected sample and all cluster centers; then, the class of the nearest clustering center of the newly acquired flow sequence is given, a pseudo label is given to the new flow sequence in each class cluster, and the pseudo label acquisition mode is as follows:
6. the anonymous network traffic classification method based on small sample machine learning of claim 1, wherein: the step (4.3) of calculating the cluster adaptation loss and updating the network weight comprises the following specific processes:
the clustering loss function is calculated as follows:
whereinFor a classifier on newly acquired flow sequences>Is predicted to output, is greater than or equal to>Is a false label;
the final overall optimization objective function is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211592847.4A CN115913992A (en) | 2022-12-13 | 2022-12-13 | Anonymous network traffic classification method based on small sample machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211592847.4A CN115913992A (en) | 2022-12-13 | 2022-12-13 | Anonymous network traffic classification method based on small sample machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115913992A true CN115913992A (en) | 2023-04-04 |
Family
ID=86477931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211592847.4A Pending CN115913992A (en) | 2022-12-13 | 2022-12-13 | Anonymous network traffic classification method based on small sample machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115913992A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117155707A (en) * | 2023-10-30 | 2023-12-01 | 广东省通信产业服务有限公司 | Harmful domain name detection method based on passive network flow measurement |
-
2022
- 2022-12-13 CN CN202211592847.4A patent/CN115913992A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117155707A (en) * | 2023-10-30 | 2023-12-01 | 广东省通信产业服务有限公司 | Harmful domain name detection method based on passive network flow measurement |
CN117155707B (en) * | 2023-10-30 | 2023-12-29 | 广东省通信产业服务有限公司 | Harmful domain name detection method based on passive network flow measurement |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | Joint air quality and weather prediction based on multi-adversarial spatiotemporal networks | |
CN113705712B (en) | Network traffic classification method and system based on federal semi-supervised learning | |
US20180063168A1 (en) | Automatic detection of network threats based on modeling sequential behavior in network traffic | |
CN106992994A (en) | A kind of automatically-monitored method and system of cloud service | |
CN113762595B (en) | Traffic time prediction model training method, traffic time prediction method and equipment | |
CN113378899B (en) | Abnormal account identification method, device, equipment and storage medium | |
CN111461784B (en) | Multi-model fusion-based fraud detection method | |
CN108549907A (en) | A kind of data verification method based on multi-source transfer learning | |
CN110990718A (en) | Social network model building module of company image improving system | |
CN115913992A (en) | Anonymous network traffic classification method based on small sample machine learning | |
CN114584406B (en) | Industrial big data privacy protection system and method for federated learning | |
CN111159241B (en) | Click conversion estimation method and device | |
CN115660147A (en) | Information propagation prediction method and system based on influence modeling between propagation paths and in propagation paths | |
CN114896977A (en) | Dynamic evaluation method for entity service trust value of Internet of things | |
CN109657725B (en) | Service quality prediction method and system based on complex space-time context awareness | |
CN113938290A (en) | Website de-anonymization method and system for user side traffic data analysis | |
CN117271899A (en) | Interest point recommendation method based on space-time perception | |
CN117194742A (en) | Industrial software component recommendation method and system | |
Yu et al. | Sports event model evaluation and prediction method using principal component analysis | |
CN115438753B (en) | Method for measuring security of federal learning protocol data based on generation | |
Kumar et al. | Progressive machine learning approach with WebAstro for Web usage mining | |
CN116545871A (en) | Multi-mode network traffic prediction method, device and medium | |
CN114757391B (en) | Network data space design and application method oriented to service quality prediction | |
CN105447148A (en) | Cookie identifier association method and apparatus | |
CN111079175A (en) | Data processing method, data processing device, computer readable storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |