CN105516020A - Parallel network traffic classification method based on ontology knowledge inference - Google Patents

Parallel network traffic classification method based on ontology knowledge inference Download PDF

Info

Publication number
CN105516020A
CN105516020A CN201510974162.XA CN201510974162A CN105516020A CN 105516020 A CN105516020 A CN 105516020A CN 201510974162 A CN201510974162 A CN 201510974162A CN 105516020 A CN105516020 A CN 105516020A
Authority
CN
China
Prior art keywords
network
network traffics
inference
traffics
decision
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510974162.XA
Other languages
Chinese (zh)
Other versions
CN105516020B (en
Inventor
陶晓玲
韦毅
王勇
孔德艳
亢蕊楠
伍欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yunche Technology Co ltd
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201510974162.XA priority Critical patent/CN105516020B/en
Publication of CN105516020A publication Critical patent/CN105516020A/en
Application granted granted Critical
Publication of CN105516020B publication Critical patent/CN105516020B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention relates to a parallel network traffic classification method based on ontology knowledge inference, which comprises the steps of: I, training a network traffic training sample set marked with an application type by utilizing decision tree algorithm, establishing a decision tree classification model of network traffics, and converting the decision tree classification model into an inference rule set; and II, constructing the inference rule set into an inference engine by adopting a Jena tool, carrying out parallel computing on a frame by virtue of MapReduce, calling the inference engine to carry out parallel knowledge inference, mining a corresponding relationship between network traffic instances in network traffic ontologies and network application types, and marking the network traffic instances with the network application types so as to complete network traffic classification. According to the parallel network traffic classification method based on ontology knowledge inference, which is disclosed by the invention, a parallel processing technology MapReduce is introduced, and cloud computing is used as storage and computing resources of network traffic ontology knowledge inference for carrying out parallel classification on the network traffic instances, so that classification efficiency is effectively improved; and machine learning and ontology knowledge inference are combined, the inference rule set is constructed, and effective classification is carried out directly on the network traffic instances in the network traffic ontologies.

Description

A kind of parallel network flow sorting technique based on ontology knowledge reasoning
Technical field
The present invention relates to technical field of network management, be specially a kind of parallel network flow sorting technique based on ontology knowledge reasoning.
Background technology
Along with the fast development of Web technology and improving constantly of IT application in enterprises demand, many new network application models and application demand are arisen at the historic moment, thing followed network flow data also presents explosive increase, bring unprecedented challenge to network supervision, the demand also making user carry out fine-grained management to network traffics is more and more stronger.As management and the key technology optimizing disparate networks resource, net flow assorted is widely used in network monitoring, QoS (QualityofService, service quality) field such as management, network security, Study on Trend is the important step of effective implemention network management, flow control and safety detection.
Net flow assorted refers to based in the Internet of ICP/IP protocol, and according to the application type (such as WWW, FTP, MAIL, P2P etc.) of network, the two-way TCP flow amount produce network service or UDP flow are classified.
Attentiveness has been turned to the machine learning classification method of traffic statistics feature Network Based by many researchers in recent years, according to the statistical information of some attribute of flow (as long in average packet, average packet interval time etc.), adopt machine learning method to classify to flow, the method is not by the impact of dynamic port, payload encryption and network address translation.The comparatively widely used machine learning method of current net flow assorted mainly contains: Bayes, neural net, SVMs and decision tree etc.
The net flow assorted technique study of Cambridge University Moore mainly bayes and the research of improving one's methods thereof.CharalamposRotsos and Moore etc. introduce semi-supervised traffic classification method training classifier, and adopt NB and kernel estimates NB two kinds of algorithms to carry out modeling to grader, experimental result shows that the method can obtain more high-class performance than conventional method.But this type of algorithm is the learning method based on probability statistics, too relies on the distribution of sample space, has potential unsteadiness.
The net flow assorted method of use feedforward neural network effectively eliminates the drawback based on port or the sorting technique based on load, testing authentication the method comparatively NB has better stability and robustness, and the application in net flow assorted has good performance and prospect.But even Application of Neural Network BP algorithm widely, also exposing many defects in the application, can not get global optimum as easily formed local minimum, frequency of training makes learning efficiency low more, and convergence rate is slow.
Obtain network flow parameters from network data packet header, then carry out the training of regular deviation and zero deflection training contrast svm classifier algorithm, when processing big-sample data collection, computation complexity is high, and training speed is slow.Carry out net flow assorted with SVM decision tree, solve SVM traffic classification and there is None-identified region and training time longer problem.But research still can not thoroughly solve calculated performance bottleneck problem, and the method is a kind of learning method having supervision, can not find the new opplication in network traffics well.
WeiLi and Moore, in order to avoid detecting the load of bag, extracts 12 statistical natures in the network packet from network traffics, and consider delay and throughput, under C4.5 decision tree traffic classification method, classification accuracy reaches 99.8% simultaneously.The people such as TomaszBujlow propose a kind of C5.0 machine learning algorithm, verify that this algorithm on average classifies rate of accuracy reached to 99.3-99.9% by experiment.But decision tree lacks retractility, and easily increasing the overhead of sorting algorithm when processing large data sets, reducing the accuracy of classification.
Under high speed large-scale complex network environment, each sensor network node uses different network traffics acquisition system collection network packets, and network flow data form differs, semantic, syntactic metacharacter.Therefore the feature of network flow data is multi-source at present, isomery, magnanimity, existing net flow assorted technology simply can only format network flow data mostly, lack Heterogeneous data (form isomery, syntactic metacharacter, Semantic Heterogeneous) effective workaround, also the description to flow information (as obtained environment etc.) and knowledge reasoning is lacked, there is inconsistency in the data on flows obtained, can not share and lack the problems such as net flow assorted knowledge, thus existing traffic classification method is difficult to provide the resource information needed for network management decisions analysis.
At artificial intelligence field, body is applied to that knowledge engineering, intelligent information are integrated gradually, data mining, magnanimity information the field such as tissue and process in.Body describes problem provide effective approach for solving resource specification, unambiguity and extensibility, has versatility, opening, intelligent, accuracy and the plurality of advantages such as comprehensive in description resource.Body is also used to the instrument of DSS as a kind of knowledge representation, and knowledge reasoning is the critical function of body in DSS, and it is also applied to (such as Images Classification etc.) problem of classifying.
Recent study person attempts introducing body to net flow assorted field.Pietrzyk, Marcin attempt the classification of formal definitions stream first, use classical exploitation body criterion, and iteration builds a tree of the category classification based on body example, is intended to the ambiguity eliminating traffic category definition.The people such as ChengjieGu propose a kind of automatic measure on line net flow assorted framework based on flowing profile and body, realize traffic classification by the mapping relations between stream profile and traffic classes.But the net flow assorted method at present based on body can't be applied to large-scale complex network, and body still belongs to the starting stage in the application in net flow assorted field.
Cloud computing is data-centered intensive supercomputing technology, processes, analyzes, and provide High-effective Service to user to large data sets, has the features such as parallelization, virtual, on-demand service.Its parallel processing technique MapReduce can provide sufficient parallel computation semantic for the large-scale data parallel computation process problem that can divide, be widely accepted.Cloud computing technology provides new method for solving mass data processing problem in net flow assorted.Therefore, body combines with cloud computing and is applied to net flow assorted, by playing their each comfortable magnanimity isomeric datas, the advantage with process aspect is described, body is used for network traffic information resource consistency and describes and information management, and cloud computing to be the structure of body and information management provide storage and computational resource.
Summary of the invention
The object of the invention is openly a kind of parallel network flow sorting technique based on ontology knowledge reasoning, for the network traffics example in large-scale network traffic body, the knowledge reasoning by machine learning method and body realizes net flow assorted.
A kind of parallel network flow sorting technique based on ontology knowledge reasoning of the present invention's design, according to the network traffics body of the information resource achitecture multilayer of Internet flow collection environment and flow, by a network traffics example in the every bar network traffics map network flowmeter body in the Internet, as follows network traffics are classified:
I, set up Decision-Tree Classifier Model and generate set of inference rules
Network traffics are chosen as sample in the Internet, the network traffics sample of tag application type is as network traffics training sample set, utilize decision Tree algorithms training network flow training sample set, set up the Decision-Tree Classifier Model of network traffics, and Decision-Tree Classifier Model is changed into set of inference rules;
II, by knowledge reasoning, parallelization classification is carried out to network traffics example
Adopt Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine, to the network traffics body built, by MapReduce parallel computation frame, call inference machine and carry out parallel knowledge reasoning, namely the corresponding relation of network traffics example and network application type in network traffics body is excavated, network application type mark is carried out to network traffics example, completes net flow assorted.Described Jena kit is the kit for ontological construction and reasoning thereof, and it is the open source code semantic net kit based on Java of Hewlett-Packard Corporation's exploitation in 2004.
Below each step is described in detail.
Described step I specifically comprises following sub-step:
I-1, train by the network traffics training sample set of decision Tree algorithms to tag application type, set up the Decision-Tree Classifier Model of network traffics, described set A={ a 1, a 2..., a irepresent the set of concentrating the statistical characteristics of i network traffics to form by network traffics training sample; Set T={t 1, t 2..., t jrepresent the set of concentrating the application type belonging to j kind network traffics to form by network traffics training sample; Set V={v 1, v 2..., v krepresenting the set be made up of k decision-making determinating reference value, it is drawn through decision Tree algorithms statistical computation, as the judgment basis choosing decision path in decision tree by each element in set A;
All a point class.path is considered as from root node to the path of each cotyledon in the Decision-Tree Classifier Model of I-2, network traffics, with decision-making determinating reference value for foundation, class.path is divided all to change into " IF-THEN " the every bar in the Decision-Tree Classifier Model of network traffics, namely " IF-THEN " structure, sets up the network flow classified model of IF-THEN structure;
The network flow classified model of the IF-THEN structure I-3, adopting the inference rule syntactic description step I-2 of Jena kit to set up, and generate set of inference rules.
Described step II specifically comprises following sub-step:
II-1, adopt Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine;
The data scale of the network traffics example II-2, described in the performance of each computing node and network traffics body, the network traffics body built is split, obtain multiple network traffics body burst, network traffics body burst is uploaded to Hadoop distributed file system, and each network traffics body burst is identified;
II-3, start mapping (Map) function of multiple MapReduce, with < network traffics body segmental identification symbol, network traffics body burst > is key-value pair, is input to mapping function;
II-4, the inference machine that mapping function utilizes step II-1 to construct carries out knowledge reasoning to network traffics body burst, obtains the network application type label that in network traffics body burst, every bar network traffics example is corresponding;
II-5, with < network application type label, network traffics example > is key-value pair, outputs to stipulations function;
II-6, stipulations function merges network traffics example according to network application type label, forms sorter network flow example set;
II-7, export sorter network flow example set, complete net flow assorted.
Compared with prior art, the advantage of a kind of parallel network flow sorting technique based on ontology knowledge reasoning of the present invention is: the parallel processing technique MapReduce 1, introducing large-scale dataset, therefore cloud computing can be adopted as the storage of network traffics ontology knowledge reasoning and computational resource, for user provides the High-effective Service with features such as parallelization, virtual, on-demand services; 2, by knowledge reasoning, parallelization classification is carried out to network traffics example, effectively improve classification effectiveness; Suitable increase computing node can accelerate classification; 3, in conjunction with the knowledge reasoning of machine learning method and body, directly effectively classify for the network traffics example in network traffics body by building set of inference rules.
Accompanying drawing explanation
Fig. 1 is originally based on the general frame of the parallel network flow sorting technique embodiment of ontology knowledge reasoning;
Fig. 2 is originally based on the Organization Chart of the parallel network flow sorting technique embodiment step II of ontology knowledge reasoning;
Fig. 3 is this based on knowledge reasoning classification time correlation curve figure under parallel network flow sorting technique embodiment stand-alone environment of ontology knowledge reasoning and cluster environment;
Fig. 4 is that this is based on the speed-up ratio curve chart under the cluster environment of the parallel network flow sorting technique embodiment different pieces of information scale of ontology knowledge reasoning, different node.
Embodiment
This parallel network flow sorting technique embodiment based on ontology knowledge reasoning adopts Cambridge University mole (Moore) to teach data set disclosed in team's collection also as network traffic information resource, this example is referred to as a mole data set, this example used mole of data set comprises 377526 network traffics samples, each network traffics sample is wherein complete transmission control protocol (TCP) bidirectional traffics, there are 248 network flow statistic features, by the source port number of network traffics, the statistical attribute compositions such as the Mean Time Between Replacement of the base attributes such as destination slogan and bag, last is labeled as the application type belonging to network traffics.
This example chooses 12 kinds of network application types of mole data centralization as class object, and 12 kinds of network application types are: World Wide Web (WWW) (www), game (Games), service (Service), mail (Mail), attack (Attack), database (Database), mutual (Interactive), file transfer protocol (FTP) controls (FTP-Control), file transfer protocol (FTP) is dynamically connected (FTP-Pasv), file transfer protocol (FTP) data (FTP-Data), multimedia (Multimedia) and point-to-point (P2P).Choose the foundation of 10 network flow statistic features as knowledge reasoning altogether, selected 10 statistical natures are server end slogan, client end slogan, the total bytes of contained data in the bag be in the same way forwarded, the total bytes of contained data in the reserved packet be forwarded, the contained total number pushing (PUSH) flag bit in transmission control protocol packet header in all bags in the same way, the contained total number pushing (PUSH) flag bit in transmission control protocol packet header in all reserved packet, the contained total number terminating (FIN) flag bit in transmission control protocol packet header in all bags in the same way, the contained total number terminating (FIN) flag bit in transmission control protocol packet header in all reserved packet, the total bytes of all windows of initialization packet in the same way, the total bytes of all reserved packet initial window.
In order to have more objectivity, a mole data set is split into two parts by this example, respectively as training sample set and the test sample book collection of this example, concentrate from training sample and randomly draw 3000 as training sample, concentrate from test sample book and randomly draw 300,000 as test sample book.
This the general frame based on the parallel network flow sorting technique embodiment of ontology knowledge reasoning as shown in Figure 1, this example builds the network traffics body of multilayer according to mole data set, by a network traffics example in the every bar network traffics map network flowmeter body in the test sample book of mole data set, the network traffics training sample of decision Tree algorithms to tag application type is utilized to train, set up the Decision-Tree Classifier Model of network traffics, and Decision-Tree Classifier Model is changed into set of inference rules, adopt Jena kit that set of inference rules is configured to corresponding inference machine, to the network traffics body built by MapReduce parallel computation frame, call inference machine and carry out parallel knowledge reasoning, namely the corresponding relation of network traffics example and network application type in network traffics body is excavated, network application type mark is carried out to network traffics example, completes net flow assorted.
I, set up Decision-Tree Classifier Model and generate set of inference rules
I-1, the training sample set of decision Tree algorithms to this example carried by machine learning and data mining software weka3.7.10 is trained, set up the Decision-Tree Classifier Model of network traffics, this routine set A represents that the training sample of this example concentrates the statistical nature value set of network traffics, set A={ server end slogan, client end slogan, the total bytes of contained data in the bag be in the same way forwarded, the total bytes of contained data in the reserved packet be forwarded, the contained total number pushing (PUSH) flag bit in transmission control protocol packet header in all bags in the same way, the contained total number pushing (PUSH) flag bit in transmission control protocol packet header in all reserved packet, the contained total number terminating (FIN) flag bit in transmission control protocol packet header in all bags in the same way, the contained total number terminating (FIN) flag bit in transmission control protocol packet header in all reserved packet, the total bytes of all windows of initialization packet in the same way, the total bytes of all reserved packet initial window }, set T represents that the training sample of this example concentrates the application type set belonging to network traffics, set T={ World Wide Web (WWW), game, service, mail, attack, database, alternately, file transfer protocol (FTP) controls, file transfer protocol (FTP) is dynamically connected, file transfer protocol (FTP) data, and multimedia is point-to-point }, set V={v 1, v 2..., v krepresenting the set be made up of k decision-making determinating reference value, it is drawn through decision Tree algorithms statistical computation, as the judgment basis choosing decision path in decision tree by each element in set A.
All a point class.path is considered as from root node to the path of each cotyledon in the Decision-Tree Classifier Model of I-2, network traffics, with decision-making determinating reference value for foundation, class.path is divided all to change into " IF-THEN " the every bar in the Decision-Tree Classifier Model of network traffics, namely " IF-THEN " structure, sets up the network flow classified model of IF-THEN structure;
The network flow classified model of the IF-THEN structure I-3, adopting the inference rule syntactic description step I-2 of Jena kit to set up, and generate set of inference rules.
II, by knowledge reasoning, parallelization classification is carried out to network traffics example
This step adopts Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine, to the network traffics body built, by MapReduce parallel computation frame, call Jena inference machine and carry out parallel knowledge reasoning, namely the corresponding relation of network traffics example and network application type in network traffics body is excavated, network application type mark is carried out to network traffics example, completes net flow assorted.Specifically comprise as following sub-step, as shown in Figure 2:
II-1, adopt Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine;
The data scale of the network traffics example II-2, described in the performance of each computing node and network traffics body, splits the network traffics body built, obtains multiple network traffics body burst (body burst O in Fig. 2 1to O n), network traffics body burst is uploaded to Hadoop distributed file system, and each network traffics body burst is identified;
II-3, mapping (Map) function (Map1 to the Mapn in Fig. 2) of multiple MapReduce is started, with < network traffics body segmental identification symbol, network traffics body burst > is key-value pair, is input to mapping function;
II-4, the inference machine that mapping function utilizes step II-1 to construct carries out knowledge reasoning to network traffics body burst, obtains network application type label (the type L in Fig. 2 that in network traffics body burst, every bar network traffics example is corresponding 1to L m);
II-5, with < network application type label, network traffics example > is key-value pair, outputs to stipulations function;
II-6, stipulations function (Reduce1 to the Reducem in Fig. 2) merges network traffics example according to network application type label, forms sorter network flow example set (the flow set C in Fig. 2 1to flow set C m);
II-7, export sorter network flow example set, complete net flow assorted.
For verifying the validity of the inventive method, to heterogeneous networks data on flows scale, under stand-alone environment and cluster environment, the knowledge reasoning classification time contrasts, and comparing result as shown in Figure 3.In Fig. 3, abscissa is network traffics instance number, and unit is ten thousand; Ordinate is the classification time, and unit is second.In Fig. 3, ▽ line represents unit, and line represents 2 machines, and ◇ line represents 3 machines, and △ line represents 4 machines.As can be seen from Figure 3, when network traffics instance number is less, the lead time needed for computing node net flow assorted of different number is little.Only have in the small-scale classification task of 60,000 at flow sample number, the classification time needed for stand-alone environment, even lower than the cluster environment only opening 2 nodes, approaches the cluster environment opening 3 nodes.Because when network traffics instance data amount is less, scheduler task and the step such as segmentation and recombination data of MapReduce still need to expend the regular hour.It can thus be appreciated that for the process of small-scale data, the advantage of the inventive method can not be embodied.But along with the increase of network traffics instance data scale, the gap of the classification spent time of unit and cluster environment is just increasing, now the overhead of MapReduce progressively tends towards stability, in the inventive method, the advantage of parallel processing displays gradually, embodies the high efficiency of the inventive method parallel processing.
In order to the lifting of the aspect of performance that the inventive method adopts Parallelizing Techniques to obtain can be weighed more accurately, use speed-up ratio R as evaluation index:
R=T s/T p
Variable T in formula sthe running time of this method under expression stand-alone environment, variable T pthe running time of this method under expression parallel environment.Fig. 4 gives when cluster environment is employing 2,3,4 machines, when namely computing node is respectively 2,3,4, and the speed-up ratio curve chart of this method.In Fig. 4, abscissa is network traffics instance number, and unit is ten thousand; Ordinate is the speed-up ratio of net flow assorted time.In Fig. 4, ▽ line represents 2 machines, and line represents 3 machines, and ◇ represents 4 machines.As shown in Figure 4, when network traffics instance number one timing, along with the increase of computing node, its speed-up ratio presents phase step type change; Along with the increase of network traffics instance number, speed-up ratio reduces gradually after increasing to a maximum, tends towards stability afterwards.Through known to the observation and analysis of each node running status, when network traffics instance number is less, the resource utilization of cluster is not high, and the resource of each computing node is not used effectively; Along with the increase of network traffics example, speed-up ratio presents nose-up tendency, is increased to maximum, and now the resource utilization of cluster reaches the highest, and in cluster, the resource of each node all can be dispatched well; Along with network traffics instance number continues to increase, speed-up ratio reduces gradually, then tends to be steady.This is because the utilization of cluster resource reaches bottleneck during speed-up ratio arrival maximum, the scheduler of cluster starts to adjust scheduling strategy, finally reaches a stable state.
Above experimental result shows, this method can improve execution efficiency effectively, and MapReduce concurrent technique can improve the classification effectiveness of network traffics example in large-scale network traffic body effectively.
Above-described embodiment, be only the specific case further described object of the present invention, technical scheme and beneficial effect, the present invention is not defined in this.All make within scope of disclosure of the present invention any amendment, equivalent replacement, improvement etc., be all included within protection scope of the present invention.

Claims (3)

1. the parallel network flow sorting technique based on ontology knowledge reasoning, according to the network traffics body of the information resource achitecture multilayer of Internet flow collection environment and flow, by a network traffics example in the every bar network traffics map network flowmeter body in the Internet, classify as follows:
I, set up Decision-Tree Classifier Model and generate set of inference rules
Network traffics are chosen as sample in the Internet, the network traffics sample of tag application type is as network traffics training sample set, decision Tree algorithms is utilized to train the network traffics training sample set of tag application type, set up the Decision-Tree Classifier Model of network traffics, and Decision-Tree Classifier Model is changed into set of inference rules;
II, by knowledge reasoning, parallelization classification is carried out to network traffics example
Adopt Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine; To the network traffics body built, by MapReduce parallel computation frame, call inference machine and carry out parallel knowledge reasoning, namely the corresponding relation of network traffics example and network application type in network traffics body is excavated, network application type mark is carried out to network traffics example, completes net flow assorted.
2. the parallel network flow sorting technique based on ontology knowledge reasoning according to claim 1, is characterized in that:
Described step I specifically comprises following sub-step:
I-1, train by the network traffics training sample set of decision Tree algorithms to tag application type, set up the Decision-Tree Classifier Model of network traffics, described set A={ a 1, a 2..., a irepresent the set of concentrating the statistical characteristics of i network traffics to form by network traffics training sample; Set T={t 1, t 2..., t jrepresent the set of concentrating the application type belonging to j kind network traffics to form by network traffics training sample; Set V={v 1, v 2..., v krepresenting the set be made up of k decision-making determinating reference value, it is drawn through decision Tree algorithms statistical computation, as the judgment basis choosing decision path in decision tree by each element in set A;
All a point class.path is considered as from root node to the path of each cotyledon in the Decision-Tree Classifier Model of I-2, network traffics, with decision-making determinating reference value for foundation, class.path is divided all to change into " IF-THEN " the every bar in the Decision-Tree Classifier Model of network traffics, namely " IF-THEN " structure, sets up the network flow classified model of IF-THEN structure;
The network flow classified model of the IF-THEN structure I-3, adopting the inference rule syntactic description step I-2 of Jena kit to set up, and generate set of inference rules.
3. the parallel network flow sorting technique based on ontology knowledge reasoning according to claim 1, is characterized in that:
Described step II specifically comprises following sub-step:
II-1, adopt Jena kit that the set of inference rules that step I generates is configured to corresponding inference machine;
The data scale of the network traffics example II-2, described in the performance of each computing node and network traffics body, the network traffics body built is split, obtain multiple network traffics body burst, network traffics body burst is uploaded to Hadoop distributed file system, and each network traffics body burst is identified;
II-3, start the mapping function of multiple MapReduce, with < network traffics body segmental identification symbol, network traffics body burst > is key-value pair, is input to mapping function;
II-4, the inference machine that mapping function utilizes step II-1 to build carries out knowledge reasoning to network traffics body burst, obtains the network application type label that in network traffics body burst, every bar network traffics example is corresponding;
II-5, with < network application type label, network traffics example > is key-value pair, outputs to stipulations function;
II-6, stipulations function merges network traffics example according to network application type label, forms sorter network flow example set;
II-7, export sorter network flow example set, complete net flow assorted.
CN201510974162.XA 2015-12-22 2015-12-22 A kind of parallel network flow sorting technique based on ontology knowledge reasoning Active CN105516020B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510974162.XA CN105516020B (en) 2015-12-22 2015-12-22 A kind of parallel network flow sorting technique based on ontology knowledge reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510974162.XA CN105516020B (en) 2015-12-22 2015-12-22 A kind of parallel network flow sorting technique based on ontology knowledge reasoning

Publications (2)

Publication Number Publication Date
CN105516020A true CN105516020A (en) 2016-04-20
CN105516020B CN105516020B (en) 2018-09-11

Family

ID=55723670

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510974162.XA Active CN105516020B (en) 2015-12-22 2015-12-22 A kind of parallel network flow sorting technique based on ontology knowledge reasoning

Country Status (1)

Country Link
CN (1) CN105516020B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878307A (en) * 2017-02-21 2017-06-20 电子科技大学 A kind of unknown communication protocol recognition method based on bit error rate model
CN107800787A (en) * 2017-10-23 2018-03-13 广州百兴网络科技有限公司 A kind of shared computer network system of distributed big data real-time exchange
CN109784370A (en) * 2018-12-14 2019-05-21 中国平安财产保险股份有限公司 Data map generation method, device and computer equipment based on decision tree
CN110245874A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Decision fusion method based on machine learning and knowledge reasoning
CN110322037A (en) * 2018-03-28 2019-10-11 普天信息技术有限公司 Method for predicting and device based on inference pattern
US10673765B2 (en) 2018-09-11 2020-06-02 Cisco Technology, Inc. Packet flow classification in spine-leaf networks using machine learning based overlay distributed decision trees
CN111914100A (en) * 2020-08-11 2020-11-10 中科院合肥技术创新工程院 Emergency decision knowledge representation method based on ontology
CN112784990A (en) * 2021-01-22 2021-05-11 支付宝(杭州)信息技术有限公司 Training method of member inference model
CN117313004A (en) * 2023-11-29 2023-12-29 南京邮电大学 QoS flow classification method based on deep learning in Internet of things

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102700A (en) * 2014-07-04 2014-10-15 华南理工大学 Categorizing method oriented to Internet unbalanced application flow
CN104702465A (en) * 2015-02-09 2015-06-10 桂林电子科技大学 Parallel network flow classification method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104102700A (en) * 2014-07-04 2014-10-15 华南理工大学 Categorizing method oriented to Internet unbalanced application flow
CN104702465A (en) * 2015-02-09 2015-06-10 桂林电子科技大学 Parallel network flow classification method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENGJIE GU,ET AL.: "Online Self-learning Internet Traffic Classification based on Profile and Ontology", 《JOURNAL OF CONVERGENCE TECHNOLOGY》 *
M.PIETRZYK,ET AL.: "Toward systematic methods comparison in traffic classification", 《WIRELESS COMMUNICATION AND MOBILE COMPUTING CONFERENCE》 *
胡婷 等: "网络流量分类方法的比较研究", 《桂林电子科技大学学报》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878307A (en) * 2017-02-21 2017-06-20 电子科技大学 A kind of unknown communication protocol recognition method based on bit error rate model
CN106878307B (en) * 2017-02-21 2019-10-29 电子科技大学 A kind of unknown communication protocol recognition method based on bit error rate model
CN107800787A (en) * 2017-10-23 2018-03-13 广州百兴网络科技有限公司 A kind of shared computer network system of distributed big data real-time exchange
CN107800787B (en) * 2017-10-23 2020-10-16 图斯崆南京科技有限公司 Distributed big data real-time exchange sharing computer network system
CN110322037A (en) * 2018-03-28 2019-10-11 普天信息技术有限公司 Method for predicting and device based on inference pattern
US11206218B2 (en) 2018-09-11 2021-12-21 Cisco Technology, Inc. Packet flow classification in spine-leaf networks using machine learning based overlay distributed decision trees
US10673765B2 (en) 2018-09-11 2020-06-02 Cisco Technology, Inc. Packet flow classification in spine-leaf networks using machine learning based overlay distributed decision trees
CN109784370A (en) * 2018-12-14 2019-05-21 中国平安财产保险股份有限公司 Data map generation method, device and computer equipment based on decision tree
CN109784370B (en) * 2018-12-14 2024-05-10 中国平安财产保险股份有限公司 Decision tree-based data map generation method and device and computer equipment
CN110245874A (en) * 2019-03-27 2019-09-17 中国海洋大学 A kind of Decision fusion method based on machine learning and knowledge reasoning
CN110245874B (en) * 2019-03-27 2024-05-10 中国海洋大学 Decision fusion method based on machine learning and knowledge reasoning
CN111914100A (en) * 2020-08-11 2020-11-10 中科院合肥技术创新工程院 Emergency decision knowledge representation method based on ontology
CN112784990A (en) * 2021-01-22 2021-05-11 支付宝(杭州)信息技术有限公司 Training method of member inference model
CN117313004A (en) * 2023-11-29 2023-12-29 南京邮电大学 QoS flow classification method based on deep learning in Internet of things
CN117313004B (en) * 2023-11-29 2024-03-12 南京邮电大学 QoS flow classification method based on deep learning in Internet of things

Also Published As

Publication number Publication date
CN105516020B (en) 2018-09-11

Similar Documents

Publication Publication Date Title
CN105516020A (en) Parallel network traffic classification method based on ontology knowledge inference
Shi et al. An efficient feature generation approach based on deep learning and feature selection techniques for traffic classification
CN108900432B (en) Content perception method based on network flow behavior
Yuan et al. An SVM-based machine learning method for accurate internet traffic classification
Perera et al. A comparison of supervised machine learning algorithms for classification of communications network traffic
Liu et al. A class-oriented feature selection approach for multi-class imbalanced network traffic datasets based on local and global metrics fusion
CN105591972B (en) A kind of net flow assorted method based on ontology
Cheng et al. MATEC: A lightweight neural network for online encrypted traffic classification
CN107786388A (en) A kind of abnormality detection system based on large scale network flow data
CN104052639A (en) Real-time multi-application network flow identification method based on support vector machine
Xu et al. A traffic classification method based on packet transport layer payload by ensemble learning
Zhai et al. Random forest based traffic classification method in sdn
Soleymanpour et al. An efficient deep learning method for encrypted traffic classification on the web
Saber et al. Online data center traffic classification based on inter-flow correlations
Hu et al. A novel SDN-based application-awareness mechanism by using deep learning
Tan et al. An Internet Traffic Identification Approach Based on GA and PSO-SVM.
CN105577438A (en) MapReduce-based network traffic ontology construction method
Liu et al. P2P traffic identification and optimization using fuzzy c-means clustering
Liu et al. Dynamic traffic classification algorithm and simulation of energy Internet of things based on machine learning
Zhang et al. A scalable network intrusion detection system towards detecting, discovering, and learning unknown attacks
Dong et al. Flow cluster algorithm based on improved K-means method
Min et al. Online Internet traffic identification algorithm based on multistage classifier
Wu et al. Detection of improved collusive interest flooding attacks using BO-GBM fusion algorithm in NDN
Singhal et al. State of the art review of network traffic classification based on machine learning approach
Soylu et al. Bit vector-coded simple CART structure for low latency traffic classification on FPGAs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20160420

Assignee: Guangxi Jun'an Network Security Technology Co.,Ltd.

Assignor: GUILIN University OF ELECTRONIC TECHNOLOGY

Contract record no.: X2022450000459

Denomination of invention: A parallel network traffic classification method based on ontology knowledge reasoning

Granted publication date: 20180911

License type: Common License

Record date: 20221228

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240409

Address after: 230000 B-1015, wo Yuan Garden, 81 Ganquan Road, Shushan District, Hefei, Anhui.

Patentee after: HEFEI MINGLONG ELECTRONIC TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 541004 1 Jinji Road, Qixing District, Guilin, the Guangxi Zhuang Autonomous Region

Patentee before: GUILIN University OF ELECTRONIC TECHNOLOGY

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240415

Address after: Room 301, 3rd Floor, Building 3, No. 18 Ziyue Road, Laiguangying Township, Chaoyang District, Beijing, 100020

Patentee after: Beijing Yunche Technology Co.,Ltd.

Country or region after: China

Address before: 230000 B-1015, wo Yuan Garden, 81 Ganquan Road, Shushan District, Hefei, Anhui.

Patentee before: HEFEI MINGLONG ELECTRONIC TECHNOLOGY Co.,Ltd.

Country or region before: China