CN107819646A - A kind of net flow assorted system and method for distributed transmission - Google Patents

A kind of net flow assorted system and method for distributed transmission Download PDF

Info

Publication number
CN107819646A
CN107819646A CN201710993791.6A CN201710993791A CN107819646A CN 107819646 A CN107819646 A CN 107819646A CN 201710993791 A CN201710993791 A CN 201710993791A CN 107819646 A CN107819646 A CN 107819646A
Authority
CN
China
Prior art keywords
module
data flow
feature
flow
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710993791.6A
Other languages
Chinese (zh)
Inventor
邢宁哲
闫忠平
纪雨彤
来骥
陈重韬
马跃
彭柏
金燊
赵庆凯
万莹
张阳洋
尚芳剑
张东辉
那琼澜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Jibei Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201710993791.6A priority Critical patent/CN107819646A/en
Publication of CN107819646A publication Critical patent/CN107819646A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/08Monitoring or testing based on specific metrics, e.g. QoS, energy consumption or environmental parameters
    • H04L43/0876Network utilisation, e.g. volume of load or congestion level
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/19Flow control; Congestion control at layers above the network layer
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2483Traffic characterised by specific attributes, e.g. priority or QoS involving identification of individual flows
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/31Flow control; Congestion control by tagging of packets, e.g. using discard eligibility [DE] bits
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/51Discovery or management thereof, e.g. service location protocol [SLP] or web services

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Environmental & Geological Engineering (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of net flow assorted system and method for distributed transmission, and in DPI business identifying systems, flow table detection module, for receiving data flow, whether detection current data stream is labeled;Data flow feature library, the feature of data storage stream;Traffic identification module, checks whether the data flow matches with any one traffic characteristic in data flow feature library;Protocol process module, it is respectively processed respectively according to the difference of classification according to the not isolabeling of data flow;In DFI flux recognition systems, sample acquisition module, by the stream feature extraction for the business that can be identified, it is divided into different classifications;Classifier training module, the sample provided sample acquisition module are trained acquisition disaggregated model;Grader classification prediction module, is classified to the data flow of the None- identified according to the disaggregated model, is sent the data flow to protocol process module using parallel transmission mode after the data flow of point good class is marked.

Description

A kind of net flow assorted system and method for distributed transmission
Technical field
The present invention relates to net flow assorted system and sorting technique, particularly relates to a kind of network traffics of distributed transmission Categorizing system and method.
Background technology
With the high speed development of network application layer service, how different network data flows is identified by technological means Amount, so as to control it and manage.The method of identification network data flow business mainly has at present:
Network data flow business identification technology based on port:This identification technology is applied by a variety of The different port number of registration is identified in IANA (Internet Assigned Numbers Authority).Such as When to detect port numbers be 80, then it is assumed that the application represents common online and applied.And some illegal applications in current network Detection and supervision can be hidden by the way of hiding or personation port numbers, cause the data flow of counterfeit legal message to corrode net Network.For example port used in new P2P agreements is change, therefore the accuracy rate of port numbers identification is more and more lower, This method has increasingly been not suitable for the identification to existing network data streaming service.
Deep-packet detection (DPI, Deep Packet Inspection) network data flow business identification technology:When meeting certain , will be helpless using the identification technology based on port when using the new agreement of dynamic port a bit.DPI technologies are except right Less than 4 layers of Back ground Information also add application layer analysis outside being analyzed, and identify various applications and its content.Exactly pass through The application layer load characteristic of volume of data bag is analyzed, finds out the tagged word of its application layer, so as to enter to miscellaneous service Row identification.This method is dealt with when application layer data encryption is run into will be extremely difficult.
Deep stream detects (DFI, Deep Flow Inspection) network data flow business identification technology:When DPI is identified When technology runs into application layer data encryption, it is difficult to that it is identified by analyzing the feature of application layer data.DFI Technology is that the technology business to be identified, i.e., different application types are embodied in session connection or data according to the feature of stream State on stream is had nothing in common with each other.The characteristics of DFI is that the feature of whole data flow is analyzed, such as the average bag each flowed It is long, time interval that each bag reaches etc..Application layer data need not be detected, thus whether application layer data is encrypted to this It is not different for kind identification technology.The feature for belonging to the data flow of same kind business is typically all very close, such as The traffic characteristics of both IM softwares of QQ and MSN may be just very close, thus be the shortcomings that this method can only be to network flow Several major classes of amount make a distinction.Such as IM, P2P, WEB etc..
However, above-mentioned, the accuracy rate based on port identification technology is low in the prior art, DPI and DFI technologies are respectively present pair Application layer data encryption business identification it is extremely difficult, and can only to network traffics carry out major class differentiation the defects of.
The content of the invention
In view of this, the present invention propose the net flow assorted system of DPI and the DFI distributed transmission being combined and Method, the accuracy and processing speed of lifting network traffics identification classification.
Based on a kind of net flow assorted system of above-mentioned purpose distributed transmission provided by the invention, including DPI business Identifying system and DFI flux recognition systems;
Wherein, in described DPI business identifying systems, including:
Flow table detection module, for receiving data flow, whether detection current data stream is labeled;If so, then send to Protocol process module;Otherwise, flow table detection module sends the data flow to traffic identification module;
Data flow feature library, the feature for data storage stream;
Traffic identification module, for check the data flow whether with any one traffic characteristic in data flow feature library Matching, if so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, described in renewal The state table in flow table detection module;Otherwise, the data flow that will be unable to identification is sent to the classification of DFI flux recognition systems Device classification prediction module;
Protocol process module, for being respectively processed respectively according to the difference of classification according to the not isolabeling of data flow;
In described DFI flux recognition systems, including:
Grader classification prediction module, for being divided according to the disaggregated model the data flow of the None- identified Class, the data flow is sent to protocol process module using parallel transmission mode after the data flow of point good class is marked.
As one embodiment, in described DFI flux recognition systems, in addition to:Sample acquisition module, for by DPI The stream feature extraction for the business that business identifying system can be identified accurately comes out, and is divided into different classifications, as classifier training The training sample of module;It is additionally operable to after line obtains the sample file of the data flow, the sample file is sent to grader and instructed Practice module;
Classifier training module, for being trained acquisition disaggregated model to the sample that sample acquisition module provides;
In described DPI business identifying systems, the traffic identification module, it is additionally operable to send the data flow that can be identified To sample acquisition module.
As one embodiment, DPI business identifying systems described in the system are connected in the network based on ICP/IP protocol.
As one embodiment, include in data flow feature library described in the system and be belonging respectively to multiple network traffics major classes A variety of business application layer feature.
As one embodiment, flow table detection module safeguards state table described in the system, and information includes in the state table:Source Ip addresses, purpose ip addresses, source port, destination interface, protocol number;
The traffic identification module is first analyzed network data flow application layer data, and by its application layer feature and number Be compared according to the feature in stream feature database, if the feature string of application layer data meet one in data flow feature library or The multiple features of person, then traffic identification module is marked as corresponding agreement ID, and flow renewal is detected into mould to flow table Block, if the feature with its characteristic character String matching is not present in data flow feature library, data traffic identification module is not to it It is marked, and is sent to grader classification prediction module, it is further identified by grader classification prediction module.
A kind of net flow assorted method of distributed transmission is additionally provided in another aspect of this invention, applied to depth The system that bag detection DPI business identifying systems are combined with deep stream detection DFI flux recognition systems, the DPI business identification System includes flow table detection module, data flow feature library, traffic identification module, protocol process module, the DFI flows identification system System includes sample acquisition module, classifier training module, grader classification prediction module, and this method comprises the following steps:
Flow table detection module receives data flow, and whether detection current data stream is labeled;If so, then send to agreement Manage module;Otherwise, flow table detection module sends the data flow to traffic identification module;
Traffic identification module be used for check the data flow whether with any one traffic characteristic in data flow feature library Matching, if so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, described in renewal The state table in flow table detection module;Otherwise, the data flow that will be unable to identification is sent to the classification of DFI flux recognition systems Device classification prediction module;
Grader classification prediction module is classified according to the disaggregated model to the data flow of the None- identified, will be divided The data flow of good class is sent the data flow to protocol process module using parallel transmission mode after being marked;
Protocol process module is respectively processed according to the difference of classification respectively according to the not isolabeling of data flow.
As one embodiment, also include in the DFI flux recognition systems described in this method:Sample acquisition module, it is used for The stream feature extraction for the business that DPI business identifying systems can be identified accurately comes out, and is divided into different classifications, as grader The training sample of training module;And after line obtains the sample file of the data flow, the sample file is sent to grader Training module;
Classifier training module, for being trained acquisition disaggregated model to the sample that sample acquisition module provides;
In described DPI business identifying systems, the traffic identification module sends the data flow that can be identified to sample Acquisition module.
As one embodiment, DPI business identifying systems described in this method are connected in the network based on ICP/IP protocol.
As one embodiment, include in data flow feature library described in this method and be belonging respectively to multiple network traffics major classes A variety of business application layer feature.
As one embodiment, flow table detection module safeguards state table described in this method, and information includes in the state table:Source Ip addresses, purpose ip addresses, source port, destination interface, protocol number;
The traffic identification module is first analyzed network data flow application layer data, and by its application layer feature and number Be compared according to the feature in stream feature database, if the feature string of application layer data meet one in data flow feature library or The multiple features of person, then traffic identification module is marked as corresponding agreement ID, and flow renewal is detected into mould to flow table Block, if the feature with its characteristic character String matching is not present in data flow feature library, data traffic identification module is not to it It is marked, and is sent to grader classification prediction module, it is further identified by grader classification prediction module.
From the above it can be seen that the net flow assorted system and method for distributed transmission provided by the invention, first DPI identifications are carried out to network data, the data flow of DPI None- identifieds is classified by DFI again, is divided in data flow from separator Class prediction module is transmitted during being sent to protocol process module using parallel distributed, so as to effectively increase to network flow The accuracy classified is measured, and greatly improves treatment effeciency.
Brief description of the drawings
Fig. 1 is the net flow assorted system structure diagram of distributed transmission of the embodiment of the present invention;
Fig. 2 be distributed transmission of the embodiment of the present invention net flow assorted system in DPI identification modules structured flowchart;
Fig. 3 be embodiment distributed transmission net flow assorted system in DFI identification modules structured flowchart;
Fig. 4 is the flow chart of the net flow assorted method of distributed transmission of the embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, and reference Accompanying drawing, the present invention is described in more detail.
It is shown in Figure 1, the net flow assorted system for the distributed transmission that DPI and DFI of the invention is combined, by DPI business identifying systems and DFI flux recognition system two systems are combined into;
Wherein, in described DPI business identifying systems, including:
Flow table detection module 11, for receiving data flow, whether detection current data stream is labeled;If so, then send To protocol process module;Otherwise, flow table detection module sends the data flow to traffic identification module;
Data flow feature library 12, the feature for data storage stream;
Traffic identification module 13, for check the data flow whether with any one flow spy in data flow feature library Sign matching, if so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, update institute State the state table in flow table detection module;Otherwise, the data flow that will be unable to identification is sent to point of DFI flux recognition systems Class device classification prediction module;
Protocol process module 14, for being located respectively according to the difference of classification respectively according to the not isolabeling of data flow Reason.
In described DFI flux recognition systems, including:
Sample acquisition module 15, the stream feature extraction of the business for DPI business identifying systems can be identified accurately go out Come, be divided into different classifications, the training sample as classifier training module;
Classifier training module 16, for being trained acquisition disaggregated model to the sample that sample acquisition module provides;
Grader classification prediction module 17, for being divided according to the disaggregated model the data flow of the None- identified Class, the data flow is sent to protocol process module using parallel transmission mode after the data flow of point good class is marked.
Wherein, the traffic identification module 13, it is additionally operable to send the data flow that can be identified to sample acquisition module;
The sample acquisition module 15, is additionally operable to after line obtains the sample file of the data flow, and the sample file is sent out Deliver to classifier training module.
In DPI business identifying systems of the present invention, the data flow feature library, including each major class of network traffics In partial service application layer feature.Such as:Belonging to the business of instant message this major class has QQ and Baidu HI etc., and QQ's should It is characterized as that packet is started with 0x02 with layer, is terminated with 0x03, Baidu HI application layer is characterized as that the first eight byte is 0x0000010031564d49.Belonging to the business of P2P this major class has TTlive and Sopcast etc., and TTlive application layer is special The payload length for first bag for levying each to flow is 52 bytes, and first three byte is 0xffff01, and most latter two byte is 0x0002, Sopcast application layer are characterized as that the tagged word of first packet for having net load is expressed as with regular expression: ^DESCRIBE.*User-Agent:WMPlayer.
The present invention is described in more detail below in conjunction with the accompanying drawings.
As shown in Fig. 2 in the net flow assorted system that the DPI and DFI of the present invention are combined, DPI business identification system System is connected in the network based on ICP/IP protocol, and which includes flow table detection module, protocol process module, flow identification mould Block and data flow feature library.
Include a variety of business for being belonging respectively to multiple network traffics major classes in data flow feature library.Citing is such as Under:
(1) belong to IM (instant messaging) this major class has a QQ and Baidu HI etc., QQ application layer be characterized as packet with 0x02 starts, and is terminated with 0x03, and Baidu HI application layer is characterized as that the first eight byte is 0x0000010031564d49.
(2) belonging to the business of P2P this major class has TTlive and Sopcast etc., and TTlive application layer is characterized as each The payload length of first bag of stream is 52 bytes, and first three byte is 0xffff01, and most latter two byte is 0x0002, Sopcast application layer is characterized as that the tagged word of first packet for having net load is expressed as with regular expression:^ DESCRIBE.*User-Agent:WMPlayer.
The feature of above-mentioned all kinds of business is stored with data flow feature library.
Flow table detection module safeguards a state table, and information includes five-tuple (source ip addresses, the purpose ip of data flow in table Address, source port, destination interface, protocol number) and affiliated protocol type ID, network data flow enter after first by oneself Five-tuple and state table in information compare, check whether in the state table, used if in the state table belonging to Protocol process module is sent into after the ID marks of protocol type.
Such as the information format safeguarded in the state table such as row of following table second
Source ip addresses Ip addresses at present Source port Destination interface Protocol type Agreement ID
119.147.18.47 10.8.7.43 8000 4000 0x11 5
Wherein 119.147.18.47 is source ip addresses, and 10.8.7.43 is purpose ip addresses, and 8000 be source port, and 4000 are Destination interface, 0x11 are protocol number (udp protocols), and 5 be the agreement ID that oneself can be defined, for example we determine QQ agreement ID For 5, then 5 just represent QQ data flow.Once there is new data stream to enter flow table detection module, first by the five-tuple of oneself with The first five items (five-tuple) of information in table are compared, if it find that there are the five-tuple of oneself in state table, then should Data flow is sent into protocol process module after being labeled with agreement ID, if being matched in state table without discovery with oneself five-tuple Record then enter traffic identification module.
Traffic identification module is first analyzed network data flow application layer data, and by its application layer feature and data flow Feature in feature database is compared, if the feature string of application layer data meets one in data flow feature library or more Individual feature, then traffic identification module is marked as corresponding agreement ID, and flow renewal is arrived into flow table detection module, if The feature with its characteristic character String matching is not present in data flow feature library, then data traffic identification module does not enter rower to it Note, but the grader classification prediction module of DFI flux recognition systems is sent to, it is entered by grader classification prediction module Row further identification.
Storage has the application layer tagged word of business identified in advance in data flow feature library, such as bitspirit 20 byte perseverances are 0x13426974546f7272656e742070726f746f636f6c before application layer, are published papers under PP click-throughs 5 byte perseverances are 0x3c00000001 before application layer during part.Traffic identification module be exactly by with aspect ratio in storehouse to judging Which kind of agreement whether data flow can identify and belong to.
As shown in figure 3, be the structured flowchart of the DFI parts in the net flow assorted system that DPI and DFI are combined, its In mainly have a sample acquisition module, classifier training module, and grader classification prediction module, sample acquisition module is by Fig. 1 The data flow that can accurately identify of traffic identification module as sample, the several network traffics divided before being classified to it is big In class, and required stream feature is therefrom extracted, for example QQ is that traffic identification module can accurately identify, and QQ belongs to IM (instant messaging) this major class, then each QQ network data flows can serve as the sample of IM this major class.Equally We can also accurately identify to Baidu HI, and Baidu HI falls within this major class of IM, then each Baidu HI network numbers Can also be as the sample of IM this major class according to stream.We calculate the stream feature of each sample after obtaining sample, such as The average bag length, the average time interval of bag etc. of the stream, and this sample is marked to determine the major class belonging to it.Using Same method we can by extracting the sample of this major class of P2P to TTlive and Sopcast network data flows, with And the sample of other several major classes, all these samples are concentrated in together us and are obtained with a sample file.Its text Part form such as following table:
Affiliated major class ID | aspect indexing I characteristic values Aspect indexing Characteristic value
1 1 1000 2 0.005 …………
2 1 450 2 0.03 …………
1 1 950 2 0.006 …………
3 1 100 2 0.07 ..........
....
A sample is all represented per a line in this document, the first character of each column represents the major class belonging to the row sample, Such as we P2P this major class, with 1, this ID is represented, IM (instant messaging) this major class is represented with 2, WEB application this One major class is represented with 3, then the first row and the third line expression of this file are P2P sample datas, and it is IM that the second row, which represents, The sample data of (instant messaging), fourth line expression are the sample datas of WEB application.Major class ID of the file per a line is followed by spy Sign indexes and the value of this feature, such as the average bag of stream is grown this first-class feature with 1 index by we, the average time that bag is reached 2 indexes of interval, then represent the average bag a length of 1000 that the first row indicates that this sample data, wrap the average time of arrival At intervals of 0.005.More than two of the feature affirmative each flowed, other features are no longer listed here.The effect of sample acquisition module It is extracted in the data flow that can be exactly accurately identified from traffic identification module and flows feature, by this feature in the form of sample file Preserve.
Classifier training module obtains a disaggregated model by the training of the sample obtained to sample acquisition module.
Grader classification prediction module is classified by disaggregated model to the flow of traffic identification module None- identified.
As shown in figure 1, online and offline two major classes can be divided into, and flow table detection module, protocol process module, stream Identification module is measured, data flow feature library, sample acquisition module, grader classification prediction module is online, classifier training mould Block is offline., it is necessary to first carry out one classification mould of sample acquisition and classifier training generation before online classification is carried out The process of type, at this time traffic identification module the data flow that can be accurately identified is sent directly into sample acquisition module.
Sample acquisition module can carry out off-line training after obtaining sample file online to grader, obtain classification mould Type, when the traffic identification module None- identified in DPI business identifying systems, then the grader classification prediction by DFI systems Module, grader classification prediction module are carried out according to the disaggregated model that training obtains to traffic identification module None- identified data flow Classification.
In another aspect of this invention, based on said system, a kind of net flow assorted of distributed transmission is additionally provided Method, the system being combined applied to DPI business identifying systems with DFI flux recognition systems, the DPI business identifying systems bag Flow table detection module, data flow feature library, traffic identification module, protocol process module are included, the DFI flux recognition systems include Sample acquisition module, classifier training module, grader classification prediction module.
Comprise the following steps:
(a) data flow first passes through the flow table detection module in DPI business identifying systems, flow table detection module detection current number Whether according to stream in the state table that flow table detection module is safeguarded, when the data flow is in state table, then flow table detection module is direct After current data flow label, send to protocol process module;When the data flow is not at state in table, then flow table detection module The data flow is sent to traffic identification module, into (b) step;
(b) whether traffic identification module checks the data flow containing in the data flow feature library in DPI business identifying systems Any one feature;When traffic identification module recognizes the flow spy for having matching with the data flow in data flow feature library Sign, then it is specific data flow to mark the data flow corresponding to current message, updates the state table safeguarded in flow table detection module, It will be sent simultaneously after current data flow label to protocol process module;When traffic identification module does not have in data flow feature library The traffic characteristic with the data stream matches is recognized, then is sent the data flow to DFI flux recognition systems, into (c) step;
(c) traffic identification module sends the data flow that can be identified to the sample acquisition mould in DFI flux recognition systems Block, after sample acquisition module obtains the sample file of the data flow online, the sample file is sent to classifier training mould Block carries out off-line training, obtains disaggregated model, and this disaggregated model is sent to grader to classify by classifier training module predicts mould Block;The disaggregated model that grader classification prediction module obtains according to training is to traffic identification module None- identified in (b) step Data flow is classified;
(d) data flow of point good class is carried out respective markers and sent to protocol process module by grader classification prediction module, Protocol process module according to, to the not isolabeling of data flow, carrying out specific business or for different major classes respectively in above step Processing.
The flow chart of the net flow assorted method of distributed transmission shown in Figure 4, as one embodiment, the party Method comprises the following steps:
Step 1, flow table detection module receives the data flow of input;
Step 2, whether labeled current data stream is detected;If so, then send to protocol process module;Otherwise, flow table Detection module sends the data flow to traffic identification module;
Step 3, traffic identification module check the data flow whether with any one flow spy in data flow feature library Sign matching, if so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, into step Rapid 4;Otherwise, the data flow that will be unable to identification is sent to the grader classification prediction module of DFI flux recognition systems, into step 5;
Step 4, the state table in the flow table detection module, return to step 1 are updated;
Step 5, grader classification prediction module is divided the data flow of the None- identified according to the disaggregated model Class, the data flow is sent to protocol process module using parallel transmission mode after the data flow of point good class is marked;
Wherein, the data flow that traffic identification module will also be able to identify is sent to sample acquisition module, sample acquisition module After line obtains the sample file of the data flow, the sample file is sent to classifier training module;
Classifier training module carries out off-line training and obtains disaggregated model, and sends to grader classification prediction module;
Step 6, protocol process module is respectively processed according to the difference of classification respectively according to the not isolabeling of data flow.
The processing procedure of network data when the flow is online classification, its premise are can be seen that from process step above It is that grader is trained to complete and obtain disaggregated model.
First, when network traffics reach, flow table detection module is arrived first at, the detection of preamble in message is currently reported Whether text is labeled.If the type of current message corresponding data stream is labeled, mode corresponding with type is used to handle Current data stream.If the type of current message corresponding data stream does not mark, judgement is identified into traffic identification module, The foundation of traffic identification module identification is exactly data flow feature library in Fig. 1, the more new stream if traffic identification module can identify Table detection module, so as to the message that makes to belong to same flow when flow table detects with regard to that can detect.If traffic identification module without Method identifies, then into grader classification prediction module, the classification mould that grader classification prediction module obtains according to DFI off-line trainings Type is classified to the flow of None- identified.Because all-network data traffic necessarily belongs to one kind in multiple major classes, so The flow of all DPI traffic identification module None- identified is all classified by major class herein.Classification is sent after completing Enter protocol process module, protocol process module is respectively processed according to the difference of classification.Here protocol process module includes Two big process objects, one is processing to specific business, and another is the processing to network major class.
Network traffics are handled through the above way, and than merely coming comprehensively using DPI or DFI, it can be to application Layer is accurately identified without the business of encryption, and the differentiation of major class can be also carried out to the business of application layer encryption.
Those of ordinary skills in the art should understand that:The discussion of any of the above embodiment is exemplary only, not It is intended to imply that the scope of the present disclosure (including claim) is limited to these examples;Under the thinking of the present invention, above example Or can also be combined between the technical characteristic in different embodiments, step can be realized with random order, and exist such as Many other changes of upper described different aspect of the invention, for simplicity, they are not provided in details.Therefore, it is all Within the spirit and principles in the present invention, any omission for being made, modification, equivalent substitution, improvement etc., it should be included in the present invention's Within protection domain.

Claims (10)

1. the net flow assorted system of a kind of distributed transmission, it is characterised in that flowed including DPI business identifying systems and DFI Measure identifying system;
Wherein, in described DPI business identifying systems, including:
Flow table detection module, for receiving data flow, whether detection current data stream is labeled;If so, then send to agreement Processing module;Otherwise, flow table detection module sends the data flow to traffic identification module;
Data flow feature library, the feature for data storage stream;
Traffic identification module, for check the data flow whether with any one traffic characteristic in data flow feature library Match somebody with somebody, if so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, update the stream State table in table detection module;Otherwise, the data flow that will be unable to identification is sent to the grader classification of DFI flux recognition systems Prediction module;
Protocol process module, for being respectively processed respectively according to the difference of classification according to the not isolabeling of data flow;
In described DFI flux recognition systems, including:
Grader classification prediction module, will for being classified according to the disaggregated model to the data flow of the None- identified The data flow of good class is divided to send the data flow to protocol process module using parallel transmission mode after being marked.
2. system according to claim 1, it is characterised in that in described DFI flux recognition systems, in addition to:Sample Acquisition module, the stream feature extraction of the business for DPI business identifying systems can be identified accurately come out, and are divided into different classes Not, the training sample as classifier training module;It is additionally operable to after line obtains the sample file of the data flow, by sample text Part is sent to classifier training module;
Classifier training module, for being trained acquisition disaggregated model to the sample that sample acquisition module provides;
In described DPI business identifying systems, the traffic identification module, it is additionally operable to send the data flow that can be identified to sample This acquisition module.
3. system according to claim 1, it is characterised in that the DPI business identifying systems are connected to based on TCP/IP In the network of agreement.
4. system according to claim 1, it is characterised in that include in the data flow feature library be belonging respectively to it is multiple The application layer feature of a variety of business of network traffics major class.
5. system according to claim 1, it is characterised in that the flow table detection module safeguards state table, the state table Middle information includes:Source ip addresses, purpose ip addresses, source port, destination interface, protocol number;
The traffic identification module is first analyzed network data flow application layer data, and by its application layer feature and data flow Feature in feature database is compared, if the feature string of application layer data meets one in data flow feature library or more Individual feature, then traffic identification module is marked as corresponding agreement ID, and data flow renewal is arrived into flow table detection module, If the feature with its characteristic character String matching is not present in data flow feature library, traffic identification module does not enter rower to it Note, and grader classification prediction module is sent to, it is further identified by grader classification prediction module.
6. a kind of net flow assorted method of distributed transmission, it is characterised in that identified applied to deep-packet detection DPI business The system that system is combined with deep stream detection DFI flux recognition systems, the DPI business identifying systems include flow table detection mould Block, data flow feature library, traffic identification module, protocol process module, the DFI flux recognition systems include sample acquisition mould Block, classifier training module, grader classification prediction module, this method comprise the following steps:
Flow table detection module receives data flow, and whether detection current data stream is labeled;If so, then send to protocol processes mould Block;Otherwise, flow table detection module sends the data flow to traffic identification module;
Traffic identification module is used to check whether the data flow matches with any one traffic characteristic in data flow feature library, If so, then marking current data stream to send to the protocol process module according to the traffic characteristic of the matching, the flow table is updated State table in detection module;Otherwise, the data flow that will be unable to identify, which is sent to the grader of DFI flux recognition systems, classifies in advance Survey module;
Grader classification prediction module is classified according to the disaggregated model to the data flow of the None- identified, by a point good class Data flow be marked after the data flow is sent to protocol process module using parallel transmission mode;
Protocol process module is respectively processed according to the difference of classification respectively according to the not isolabeling of data flow.
7. method according to claim 6, it is characterised in that
Also include in described DFI flux recognition systems:Sample acquisition module, for can be accurate by DPI business identifying systems The stream feature extraction of the business of identification comes out, and is divided into different classifications, the training sample as classifier training module;And After line obtains the sample file of the data flow, the sample file is sent to classifier training module;
Classifier training module, for being trained acquisition disaggregated model to the sample that sample acquisition module provides;
In described DPI business identifying systems, the traffic identification module sends the data flow that can be identified to sample acquisition Module.
8. according to the method for claim 6, it is characterised in that the DPI business identifying systems are connected to based on TCP/IP In the network of agreement.
9. according to the method for claim 6, it is characterised in that include in the data flow feature library be belonging respectively to it is multiple The application layer feature of a variety of business of network traffics major class.
10. according to the method for claim 6, it is characterised in that the flow table detection module safeguards state table, the state table Middle information includes:Source ip addresses, purpose ip addresses, source port, destination interface, protocol number;
The traffic identification module is first analyzed network data flow application layer data, and by its application layer feature and data flow Feature in feature database is compared, if the feature string of application layer data meets one in data flow feature library or more Individual feature, then traffic identification module is marked as corresponding agreement ID, and data flow renewal is arrived into flow table detection module, If the feature with its characteristic character String matching is not present in data flow feature library, traffic identification module does not enter rower to it Note, and grader classification prediction module is sent to, it is further identified by grader classification prediction module.
CN201710993791.6A 2017-10-23 2017-10-23 A kind of net flow assorted system and method for distributed transmission Pending CN107819646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710993791.6A CN107819646A (en) 2017-10-23 2017-10-23 A kind of net flow assorted system and method for distributed transmission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710993791.6A CN107819646A (en) 2017-10-23 2017-10-23 A kind of net flow assorted system and method for distributed transmission

Publications (1)

Publication Number Publication Date
CN107819646A true CN107819646A (en) 2018-03-20

Family

ID=61608425

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710993791.6A Pending CN107819646A (en) 2017-10-23 2017-10-23 A kind of net flow assorted system and method for distributed transmission

Country Status (1)

Country Link
CN (1) CN107819646A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900374A (en) * 2018-06-22 2018-11-27 网宿科技股份有限公司 A kind of data processing method and device applied to DPI equipment
CN109275045A (en) * 2018-09-06 2019-01-25 东南大学 Mobile terminal encrypted video ad traffic recognition methods based on DFI
CN109327389A (en) * 2018-11-13 2019-02-12 南京中孚信息技术有限公司 Traffic classification label forwarding method, device and system
CN109361618A (en) * 2018-10-11 2019-02-19 平安科技(深圳)有限公司 Data traffic labeling method, device, computer equipment and storage medium
CN111404832A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Service classification method and device based on continuous TCP link
CN111917665A (en) * 2020-07-23 2020-11-10 华中科技大学 Terminal application data stream identification method and system
CN112350956A (en) * 2020-10-23 2021-02-09 新华三大数据技术有限公司 Network traffic identification method, device, equipment and machine readable storage medium
CN112383489A (en) * 2020-11-16 2021-02-19 中国信息通信研究院 Network data traffic forwarding method and device
WO2021104444A1 (en) * 2019-11-27 2021-06-03 华为技术有限公司 Data flow classification method, apparatus and system
CN113313216A (en) * 2021-07-30 2021-08-27 深圳市永达电子信息股份有限公司 Method and device for extracting main body of network data, electronic equipment and storage medium
CN114050926A (en) * 2021-11-09 2022-02-15 南方电网科学研究院有限责任公司 Data message depth detection method and device
CN115174240A (en) * 2022-07-13 2022-10-11 中国国家铁路集团有限公司 Railway encrypted flow monitoring system and method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605067A (en) * 2009-04-22 2009-12-16 网经科技(苏州)有限公司 Network behavior active analysis diagnostic method
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN101986609A (en) * 2009-07-29 2011-03-16 中兴通讯股份有限公司 Method and system for realizing network flow cleaning
CN102984076A (en) * 2012-12-03 2013-03-20 中国联合网络通信集团有限公司 Method and device for identifying flow service types

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101605067A (en) * 2009-04-22 2009-12-16 网经科技(苏州)有限公司 Network behavior active analysis diagnostic method
CN101986609A (en) * 2009-07-29 2011-03-16 中兴通讯股份有限公司 Method and system for realizing network flow cleaning
CN101645806A (en) * 2009-09-04 2010-02-10 东南大学 Network flow classifying system and network flow classifying method combining DPI and DFI
CN102984076A (en) * 2012-12-03 2013-03-20 中国联合网络通信集团有限公司 Method and device for identifying flow service types

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108900374A (en) * 2018-06-22 2018-11-27 网宿科技股份有限公司 A kind of data processing method and device applied to DPI equipment
CN109275045A (en) * 2018-09-06 2019-01-25 东南大学 Mobile terminal encrypted video ad traffic recognition methods based on DFI
CN109275045B (en) * 2018-09-06 2020-12-25 东南大学 DFI-based mobile terminal encrypted video advertisement traffic identification method
CN109361618A (en) * 2018-10-11 2019-02-19 平安科技(深圳)有限公司 Data traffic labeling method, device, computer equipment and storage medium
CN109361618B (en) * 2018-10-11 2022-10-28 平安科技(深圳)有限公司 Data flow marking method and device, computer equipment and storage medium
CN109327389A (en) * 2018-11-13 2019-02-12 南京中孚信息技术有限公司 Traffic classification label forwarding method, device and system
CN109327389B (en) * 2018-11-13 2021-06-08 南京中孚信息技术有限公司 Traffic classification label forwarding method, device and system
CN111404832A (en) * 2019-01-02 2020-07-10 ***通信有限公司研究院 Service classification method and device based on continuous TCP link
WO2021104444A1 (en) * 2019-11-27 2021-06-03 华为技术有限公司 Data flow classification method, apparatus and system
CN111917665A (en) * 2020-07-23 2020-11-10 华中科技大学 Terminal application data stream identification method and system
CN112350956B (en) * 2020-10-23 2022-07-01 新华三大数据技术有限公司 Network traffic identification method, device, equipment and machine readable storage medium
CN112350956A (en) * 2020-10-23 2021-02-09 新华三大数据技术有限公司 Network traffic identification method, device, equipment and machine readable storage medium
CN112383489A (en) * 2020-11-16 2021-02-19 中国信息通信研究院 Network data traffic forwarding method and device
CN113313216B (en) * 2021-07-30 2021-11-30 深圳市永达电子信息股份有限公司 Method and device for extracting main body of network data, electronic equipment and storage medium
CN113313216A (en) * 2021-07-30 2021-08-27 深圳市永达电子信息股份有限公司 Method and device for extracting main body of network data, electronic equipment and storage medium
CN114050926A (en) * 2021-11-09 2022-02-15 南方电网科学研究院有限责任公司 Data message depth detection method and device
CN115174240A (en) * 2022-07-13 2022-10-11 中国国家铁路集团有限公司 Railway encrypted flow monitoring system and method

Similar Documents

Publication Publication Date Title
CN107819646A (en) A kind of net flow assorted system and method for distributed transmission
CN101741744B (en) Network flow identification method
CN101645806B (en) Network flow classifying system and network flow classifying method combining DPI and DFI
CN111340191B (en) Bot network malicious traffic classification method and system based on ensemble learning
CN109361617B (en) Convolutional neural network traffic classification method and system based on network packet load
EP3469770B1 (en) Spam classification system based on network flow data
CN110796196B (en) Network traffic classification system and method based on depth discrimination characteristics
CN110113345A (en) A method of the assets based on Internet of Things flow are found automatically
CN104320304B (en) A kind of core network user flow application recognition methods of the multimode fusion easily extended
CN104270392A (en) Method and system for network protocol recognition based on tri-classifier cooperative training learning
CN101414939B (en) Internet application recognition method based on dynamical depth package detection
CN109117634A (en) Malware detection method and system based on network flow multi-view integration
CN107465643A (en) A kind of net flow assorted method of deep learning
CN110034966B (en) Data flow classification method and system based on machine learning
CN104468252A (en) Intelligent network service identification method based on positive transfer learning
CN109151880A (en) Mobile application flow identification method based on multilayer classifier
CN111698260A (en) DNS hijacking detection method and system based on message analysis
CN108173705A (en) First packet recognition methods, device, equipment and the medium of flow drainage
Kong et al. Identification of abnormal network traffic using support vector machine
CN109660656A (en) A kind of intelligent terminal method for identifying application program
CN106789416A (en) The recognition methods of industrial control system specialized protocol and system
CN111224998B (en) Botnet identification method based on extreme learning machine
CN114189350A (en) LightGBM-based train communication network intrusion detection method
CN105429817A (en) Illegal business identification device and illegal business identification method based on DPI and DFI
CN101764754A (en) Sample acquiring method in business identifying system based on DPI and DFI

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20180320

WD01 Invention patent application deemed withdrawn after publication