CN102739522A - Method and device for classifying Internet data streams - Google Patents

Method and device for classifying Internet data streams Download PDF

Info

Publication number
CN102739522A
CN102739522A CN2012101808261A CN201210180826A CN102739522A CN 102739522 A CN102739522 A CN 102739522A CN 2012101808261 A CN2012101808261 A CN 2012101808261A CN 201210180826 A CN201210180826 A CN 201210180826A CN 102739522 A CN102739522 A CN 102739522A
Authority
CN
China
Prior art keywords
characteristic
data
bunch
center
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012101808261A
Other languages
Chinese (zh)
Inventor
王磊
孙灵燕
吴富强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN2012101808261A priority Critical patent/CN102739522A/en
Publication of CN102739522A publication Critical patent/CN102739522A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for classifying Internet data streams. The method comprises following steps that characteristic data of the data streams to be classified are extracted according to a classification requirement; diversity indexes of the characteristic data with each cluster center are calculated, wherein the cluster center is formed after aggregating training data in a characteristic training set, and the diversity indexes are used for representing characteristic difference degrees between the represented characteristic data and the cluster centers; and if the diversity index between the characteristic data and one cluster center in the cluster centers is smaller than a preset threshold value, the data streams to be classified determinately belong to the class presented by the cluster center. By using the scheme provided by the invention, the classification of the Internet data streams is only related to characteristics and is unrelated to protocols, and new protocols and protocol variants can be timely classified and processed and do not need to be stored in a protocol data base, so the classification of the network data streams can be adapted to high speed variation frequencies of the network protocols and software and hardware resources do not need to be frequently upgraded.

Description

The sorting technique and the device of internet data stream
Technical field
The present invention relates to the communication technology, relate in particular to a kind of sorting technique and device of internet data stream.
Background technology
The sorting technique of existing internet data stream can be divided into several types: simple bag is resolved (Simple Packet Inspection; Abbreviate as: SPI), deep packet resolves (Deep Packet Inspection; Abbreviate as: DPI) (Deep Flow Inspection abbreviates as: DFI) for characteristic matching, DPI behavior identification and deep stream parsing.Wherein, SPI is mainly through analyzing to confirm the essential information of current data stream to the five-tuple (source address, destination address, source port, destination interface and protocol type) of message.The DPI characteristic matching mainly is to confirm professional application of being carried through finger print informations such as specific character string in the identification message or bit sequences.DPI behavior identification mainly is that the behavior model of cognition is studied and set up in the behavior at terminal, judges ongoing action in terminal or the action of soon implementing based on the behavior model of cognition.For example: see from the content of Email; The Business Stream of spam and surface mail is as broad as long at all between the two; Has only further analysis; Specifically, set up comprehensive model of cognition, just can judge whether to be spam according to analysis-by-synthesis such as the frequency of the size, frequency, purpose mail and the former addresses of items of mail that send mail, variation and unaccepted frequencies.DFI mainly is based on the application recognition technology of data flow behavior; Be that the state that different application types is embodied on session connection or the data flow is had nothing in common with each other; For example; The bag of session connection stream information such as interval long, that connect between speed, transmission amount of bytes, bag and the bag are come and the data flow model contrast, thereby realize differentiating application type.
The sorting technique of above-mentioned several kinds of internet datas stream all could be accomplished the classification of data flow based on the feature database of agreement.This just makes the sorting technique of existing internet data stream have following shortcoming: the protocol characteristic storehouse needs to bring in constant renewal in, and can't in time discern to prevent New Deal and agreement mutation; The sorting technique of data flow needs enough software and hardware resources and the assembly of protocols that constantly enlarges to be complementary, and this has just caused the frequent upgrading of software and hardware, and cost constantly improves.
Summary of the invention
Sorting technique and device that the present invention provides internet data a kind of and protocol-independent to flow; To realize the network data flow rough sort of protocol-independent; Thereby make the network data traffic classification can adapt to the high speed change frequency of procotol, keep stable software and hardware resources use amount.Wherein, method comprises:
Extract the characteristic of treating grouped data stream according to the classification demand;
Calculate the diversity index at said characteristic and each bunch center, said bunch of center is that the training data that features training is concentrated is carried out forming after the cluster, and said diversity index is used to characterize the feature difference degree between said characteristic and bunch center;
If the diversity index between bunch of center in said characteristic and said each bunch in the heart, is then confirmed the said grouped data stream of treating less than predetermined threshold value and is belonged to the classification of this bunch center representative.
The embodiment of the invention also provides a kind of sorter of internet data stream, comprising:
Extraction module is used for extracting the characteristic of treating grouped data stream according to the classification demand;
Computing module; Be used for obtaining said characteristic from extraction module; And calculate the diversity index at said characteristic and each bunch center; Said bunch of center is that the training data that features training is concentrated is carried out forming after the cluster, and said diversity index is used to characterize the feature difference degree between said characteristic and bunch center, and the diversity index that obtains is sent to sort module;
Sort module; Be used to receive the diversity index that computing module sends; When the diversity index between in the heart bunch of center in said characteristic and said each bunch during, the said grouped data stream of treating is confirmed as the classification that belongs to this bunch center representative less than predetermined threshold value.
The embodiment of the invention is through above technical scheme; Extract the characteristic of data flow according to the classification demand; And confirm this according to the characteristic and the diversity index at each bunch center of the training data formation of concentrating according to features training and treat the affiliated classification of grouped data stream, make that the classification of internet data stream is only relevant with characteristic, and and protocol-independent; Can in time classify and handle New Deal and agreement mutation; Need not store protocol database, thereby make the classification of network data flow can adapt to the high speed change frequency of procotol, and need not carry out the frequent upgrading of software and hardware resources.
Description of drawings
The sorting technique flow chart of the internet data stream that Fig. 1 provides for the embodiment of the invention one;
The sorting technique flow chart of the internet data stream that Fig. 2 provides for the embodiment of the invention two;
The sorting technique flow chart of the internet data stream that Fig. 3 provides for the embodiment of the invention three;
The sorting technique flow chart of the internet data stream that Fig. 4 provides for the embodiment of the invention four;
The sorter structural representation of the internet data stream that Fig. 5 provides for the embodiment of the invention five;
The sorter structural representation of the internet data stream that Fig. 6 provides for the embodiment of the invention six;
The sorter structural representation of the internet data stream that Fig. 7 provides for the embodiment of the invention seven.
Embodiment
The sorting technique flow chart of the internet data stream that Fig. 1 provides for the embodiment of the invention one, as shown in Figure 1, this method can comprise:
Step 101, extract the characteristic treat grouped data stream according to the classification demand.
Concrete, characteristic can but be not limited to comprise: topological characteristic in network of the stream table relationship characteristic between message and other messages, counting messages characteristic (as: wrapping length, the time interval etc.), message (as: between main frame annexation etc.), message load characteristic (as: ASCII distribution, encryption etc.) or the like.The classification demand is appreciated that the purpose into the internet data traffic classification, for example, is that the data flow in the network is divided into long bag stream and short bag stream to the purpose of internet data traffic classification, can extract long this dimensional feature of bag so; If will the transmitting data stream of busy node in the network be distinguished, then can extract this dimensional feature of main frame linking number.
Also need to prove, do not limit the dimension that extracts characteristic in the step 101.If the characteristic of extracting is expressed as a matrix, can represent one to treat grouped data stream through the delegation of matrix, row of matrix are represented one-dimensional characteristic, so, the columns of restriction matrix not in this step 101.That is to say, can extract the multidimensional characteristic according to the classification demand.
The diversity index at step 103, calculated characteristics data and each bunch center; Wherein, Bunch center is that the training data that known features training is concentrated is carried out forming after the cluster, and this diversity index is used for the feature difference degree between characteristic feature data and bunch center.
Here need to prove that bunch center is meant after the characteristic of a plurality of training datas that features training is concentrated is carried out cluster, has obtained the center of each cluster.The classification of different data flow can be represented in each bunch center, and each bunch can be thought the set of the characteristic of same type of data flow.The diversity index can be a distance, and for example Euclidean distance, manhatton distance or other can significantly be explained the implication of different degree.Need to prove that also each bunch center all is that the no supervise algorithm that adopts machine learning (like clustering algorithm etc.) forms in feature space.The characteristics of no supervise algorithm mainly are: do not require that training set is through mark.This point just makes the machine learning algorithm of this type when being applied to data flow classification, can remove the dependence to protocol database.Wherein, so-called " mark " be appreciated that for: do not need the manually protocol information of input training intensive data, training set need not know that what agreement is data wherein be on earth.
If the diversity index between bunch of center in step 105 characteristic and each bunch in the heart less than predetermined threshold value, confirms then to treat that grouped data stream belongs to the classification of this bunch center representative.
Concrete, for treating grouped data stream, calculate the distance between its characteristic and each bunch center respectively, judge the final classification of treating grouped data stream according to the distance that obtains.If the distance between characteristic and the some bunch of center is less than predetermined threshold value; Just think that this treats that the diversity difference is little between the characteristic of data flow of classification of characteristic and this bunch center representative of grouped data stream, can think that just this treats that grouped data belongs to the classification of this bunch center representative.Need to prove; The predetermined threshold value here can set flexibly according to the accuracy requirement of data flow classification; And common way is the classification with that bunch center representative nearest apart from the characteristic of treating grouped data stream, as the classification of treating grouped data stream.
The sorting technique of the internet data stream that the embodiment of the invention provides; Extract the characteristic of data flow according to the classification demand; And confirm this according to the characteristic and the diversity index at each bunch center of the training data formation of concentrating according to features training and treat the affiliated classification of grouped data stream, make that the classification of internet data stream is only relevant with characteristic, and and protocol-independent; Can in time classify and handle New Deal and agreement mutation; Need not store protocol database, thereby make the classification of network data flow can adapt to the high speed change frequency of procotol, and need not carry out the frequent upgrading of software and hardware resources.
The sorting technique flow chart of the internet data stream that Fig. 2 provides for the embodiment of the invention two, as shown in Figure 2, on the basis of method shown in Figure 1, after the step 101, this method can also comprise:
Step 102, the characteristic of extracting in the step 101 is carried out preliminary treatment, obtain pretreated characteristic.
Accordingly, step 103 is deformed into:
Step 103 ': calculate the diversity index between pretreated characteristic and each bunch center.
Wherein, preliminary treatment can but be not limited to comprise in the following several kinds of modes one or more:
1, off-note data in the characteristic of extracting and noise characteristic are cleared up, can evade unusual upgrading and misclassification in the step 103 through this pretreatment mode.
2,, guarantee in the step 103 conformability thus to the various features data type with the characteristic intervalization of extracting.
For example, the span of a certain characteristic is 0 ~ 255, after dispersing, becomes 0 ~ 15 totally 16 values, and any one of these 16 values all represented a scope in the luv space, for example 3 can represent [1040] this scope.Why to carry out the intervalization of characteristic, be actually a kind of distortion of data normalization.With the characteristic intervalization of different spans in an identical span, the influence that this characteristic value that can weaken itself is calculated for the diversity index.
3, will extract characteristic and be mapped to identical data space, guarantee the unification of data processing method in the step 103.For example, the M dimensional feature space is mapped to the N dimensional feature space, general M < N.Mapping possibly have physical significance or simple mathematics mapping, and purpose is to design more effective characteristics combination for the structural classification device.Such as, be X+2Y with X and Y bidimensional Feature Mapping, X*Y, this three-dimensional feature of X/>Y.
On the basis of above-mentioned execution mode, the sorting technique flow chart of internet data that Fig. 3 provides for the embodiment of the invention three stream, as shown in Figure 3, on the basis of above-mentioned Fig. 1 or method shown in Figure 2, before the step 101, this method can also comprise:
The characteristic of the training data that step 100, extraction features training are concentrated, and adopt the no supervise algorithm of machine learning that the characteristic of training data is trained, in feature space, form each bunch center.
Here need to prove that the number at bunch center of formation is can be predefined, the corresponding preset threshold value in each bunch center also be can be rule of thumb or actual needs set, do not do here and give unnecessary details.Certainly, can also carry out not doing here and giving unnecessary details to the characteristic of training data like the described preliminary treatment of step 102.
The sorting technique flow chart of the internet data stream that Fig. 4 provides for the embodiment of the invention four, as shown in Figure 4, on the basis of Fig. 3, after the step 105, this method can also comprise:
Step 107, the characteristic of sorted data flow characteristic and/or the data flow that can't classify is carried out retraining with the form of supplementary data to original bunch of center, form new bunch center.
Supplementary data gets into last round of sorter model with the mode of augmenting, and is a kind of correction to sorter model.Grader is meant disaggregated model in embodiments of the present invention, and the implementation of this disaggregated model can be divided into the form of software or hardware or software and hardware combining.Grader is a kind of generic concept in machine learning/data mining.Grader can obtain through the training data training that features training is concentrated, and in forecasting process, plays a role.Supplementary data is a characteristic.Both can be the characteristic of non-classified data flow, also can be the characteristic of sorted data flow, can also be the characteristic of the data flow that can't classify.No matter whether supplementary data is classified, can add: when the data that can classify add, then enlarged the capacity of existing classification,, then enlarged the kind of classification when the data that can't classify add as increment.
Wherein, the forming process at new bunch center can comprise: after supplementary data gets into sorter model, can produce following several kinds of influences to each bunch center that has sorter model now:
When the classification of this supplementary data exists,, calculate new bunch center jointly then with already present sample set; When this supplementary data classification does not exist, then form new bunch center.If there is the restriction of bunch center total amount, the merging that then can produce bunch.
Can upgrade in real time a bunch center based on step 107, help the accuracy of the perfect and internet data traffic classification of grader.
On the basis of above-mentioned execution mode,, can also comprise the steps: classification results is analyzed, put in order and reports when treating after grouped data stream classifies.
For example, the characteristics that had for each type data flow are described: and strong mutual/weak mutual, single current wraps more/and single current wraps less, short bag/length is wrapped, encrypted packet/non-encrypted bag or the like.Can also analyze the category distribution characteristics of internet data stream: after the network data flow of certain node is classified; Add up such data flow shared proportionate relationship in all data streams; Investigate the global feature that network data flow had of this node, like time distribution relation of average load, Various types of data stream or the like.Can also carry out the rough segmentation classification for certain single current:, can confirm its affiliated classification for given inlet flow.For example, according to the diversity index of inlet flow with existing bunch center, after comprehensively judging, confirm under the inlet flow bunch.The characteristic feature of this bunch has been represented the characteristic feature of this classification.Need to prove that the scheme that the embodiment of the invention provides is the classification of protocol-independent, (, PPLIVE), but do not have the common class of same characteristic features like MSN so the result is not concrete protocol class.
One of ordinary skill in the art will appreciate that: all or part of step that realizes above-mentioned each method embodiment can be accomplished through the relevant hardware of program command.Aforesaid program can be stored in the computer read/write memory medium.This program the step that comprises above-mentioned each method embodiment when carrying out; And aforesaid storage medium comprises: various media that can be program code stored such as ROM, RAM, magnetic disc or CD.
Scheme provided by the invention; Extract the characteristic of data flow according to the classification demand; And confirm this according to the characteristic and the diversity index at each bunch center of the training data formation of concentrating according to features training and treat the affiliated classification of grouped data stream, make that the classification of internet data stream is only relevant with characteristic, and and protocol-independent; Can in time classify and handle New Deal and agreement mutation; Need not store protocol database, thereby make the classification of network data flow can adapt to the high speed change frequency of procotol, and need not carry out the frequent upgrading of software and hardware resources.
The sorter structural representation of the internet data stream that Fig. 5 provides for the embodiment of the invention five; As shown in Figure 5; This device can comprise: extraction module 501, computing module 502 and sort module 503; Wherein, extraction module 501 is used for extracting the characteristic of treating grouped data stream according to the classification demand; Computing module 502 is used for obtaining characteristic from extraction module 501; And the diversity index at each bunch center of forming of calculated characteristics data and the training data of concentrating according to features training; The diversity index that obtains is sent to sort module 503; Wherein, this diversity index is used for the feature difference degree between characteristic feature data and bunch center; Sort module 503 is used to receive the diversity index that computing module 502 sends; When the diversity index between in the heart bunch of center in characteristic and each bunch during, with treating that grouped data stream confirms as the classification that belongs to this bunch center representative less than predetermined threshold value.
Need to prove; The concrete job step of extraction module 501, computing module 502 and the sort module 503 of the sorter of the internet data stream that the embodiment of the invention five provides can repeat no more with reference to step 101, step 103 and the step 105 of the embodiment of the invention one here.
The sorter structural representation of the internet data stream that Fig. 6 provides for the embodiment of the invention six; As shown in Figure 6; This device can also comprise: pretreatment module 504; Be used for obtaining characteristic, characteristic is carried out preliminary treatment, and pretreated characteristic is sent to computing module 502 from extraction module 501; Wherein, 504 pairs of characteristics of pretreatment module are carried out preliminary treatment, specifically can comprise: abnormal data in the characteristic and noise data are cleared up; And/or, characteristic carry out the intervalization; And/or, characteristic is mapped to identical data space.
Need to prove that the concrete job step of the pretreatment module 504 of the sorter of the internet data stream that the embodiment of the invention six provides can repeat no more with reference to the step 102 of the embodiment of the invention two here.
The sorter structural representation of the internet data stream that Fig. 7 provides for the embodiment of the invention seven; As shown in Figure 7; This device can also comprise: training module 505; Be used to extract the characteristic of the concentrated training data of features training, and adopt the no supervise algorithm of machine learning that the characteristic of training data is trained, in feature space, form each bunch center; Accordingly, computing module 502 also is used for obtaining each bunch center from this training module 505.
On the basis of above-mentioned execution mode, training module 505 also is used for: the characteristic of the characteristic of sorted data flow and/or the data flow that can't classify is carried out retraining with the form of supplementary data to original bunch of center, form new bunch center.
Need to prove that the concrete job step of the training module 505 of the sorter of the internet data stream that the embodiment of the invention seven provides can repeat no more with reference to the step 100 of the embodiment of the invention three and the step 107 of embodiment four here.
What should explain at last is: above each embodiment is only in order to explaining technical scheme of the present invention, but not to its restriction; Although the present invention has been carried out detailed explanation with reference to aforementioned each embodiment; Those of ordinary skill in the art is to be understood that: it still can be made amendment to the technical scheme that aforementioned each embodiment put down in writing, perhaps to wherein part or all technical characteristic are equal to replacement; And these are revised or replacement, do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of relevant art scheme.

Claims (10)

1. the sorting technique of an internet data stream is characterized in that, comprising:
Extract the characteristic of treating grouped data stream according to the classification demand;
Calculate the diversity index at said characteristic and each bunch center, said bunch of center is that the training data that features training is concentrated is carried out forming after the cluster, and said diversity index is used to characterize the feature difference degree between said characteristic and bunch center;
If the diversity index between bunch of center in said characteristic and said each bunch in the heart, is then confirmed the said grouped data stream of treating less than predetermined threshold value and is belonged to the classification of this bunch center representative.
2. method according to claim 1 is characterized in that, said extract the characteristic treat grouped data stream according to the classification demand after, said method also comprises:
Characteristic to extracting is carried out preliminary treatment.
3. method according to claim 2 is characterized in that, said the characteristic of extracting is carried out preliminary treatment, specifically comprises:
Abnormal data in the characteristic of extracting and noise data are cleared up; And/or,
With the characteristic intervalization of extracting; And/or,
The characteristic of extracting is mapped to identical data space.
4. method according to claim 1 and 2 is characterized in that, said extract the characteristic treat grouped data stream according to the classification demand before, said method also comprises:
Extract the characteristic of the concentrated training data of said features training, and adopt the no supervise algorithm of machine learning that the characteristic of said training data is trained, in feature space, form each bunch center.
5. according to each described method of claim 1-4, it is characterized in that said method also comprises:
The characteristic of the characteristic of sorted data flow and/or the data flow that can't classify is carried out retraining with the form of supplementary data to original bunch of center, form new bunch center.
6. the sorter of an internet data stream is characterized in that, comprising:
Extraction module is used for extracting the characteristic of treating grouped data stream according to the classification demand;
Computing module; Be used for obtaining said characteristic from extraction module; And calculate the diversity index at said characteristic and each bunch center; The diversity index that obtains is sent to sort module, and said bunch of center is that the training data that features training is concentrated is carried out forming after the cluster, and said diversity index is used to characterize the feature difference degree between said characteristic and bunch center;
Sort module; Be used to receive the diversity index that computing module sends; When the diversity index between in the heart bunch of center in said characteristic and said each bunch during, the said grouped data stream of treating is confirmed as the classification that belongs to this bunch center representative less than predetermined threshold value.
7. device according to claim 6 is characterized in that, also comprises:
Pretreatment module is used for obtaining said characteristic from said extraction module, said characteristic is carried out preliminary treatment, and pretreated characteristic is sent to said computing module.
8. device according to claim 7 is characterized in that said pretreatment module specifically is used for,
Abnormal data in the said characteristic and noise data are cleared up; And/or,
Said characteristic carry out the intervalization; And/or,
Said characteristic is mapped to identical data space.
9. according to claim 6 or 7 described devices, it is characterized in that, also comprise:
Training module is used to extract the characteristic of the concentrated training data of said features training, and adopts the no supervise algorithm of machine learning that the characteristic of said training data is trained, and in feature space, forms each bunch center;
Accordingly, said computing module also is used for obtaining each bunch center from said training module.
10. according to each described device of claim 6-9; It is characterized in that; Said training module also is used for: the characteristic of the characteristic of sorted data flow and/or the data flow that can't classify is carried out retraining with the form of supplementary data to original bunch of center, form new bunch center.
CN2012101808261A 2012-06-04 2012-06-04 Method and device for classifying Internet data streams Pending CN102739522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012101808261A CN102739522A (en) 2012-06-04 2012-06-04 Method and device for classifying Internet data streams

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012101808261A CN102739522A (en) 2012-06-04 2012-06-04 Method and device for classifying Internet data streams

Publications (1)

Publication Number Publication Date
CN102739522A true CN102739522A (en) 2012-10-17

Family

ID=46994336

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012101808261A Pending CN102739522A (en) 2012-06-04 2012-06-04 Method and device for classifying Internet data streams

Country Status (1)

Country Link
CN (1) CN102739522A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104917739A (en) * 2014-03-14 2015-09-16 腾讯科技(北京)有限公司 False account identification method and device
WO2017152883A1 (en) * 2016-03-11 2017-09-14 华为技术有限公司 Coflow recognition method and system, and server using method
CN108141377A (en) * 2015-10-12 2018-06-08 华为技术有限公司 Network flow early stage classifies

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
US20090116394A1 (en) * 2007-11-07 2009-05-07 Satyam Computer Services Limited Of Mayfair Centre System and method for skype traffice detection
CN102033965A (en) * 2011-01-17 2011-04-27 安徽海汇金融投资集团有限公司 Method and system for classifying data based on classification model

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090116394A1 (en) * 2007-11-07 2009-05-07 Satyam Computer Services Limited Of Mayfair Centre System and method for skype traffice detection
CN101252541A (en) * 2008-04-09 2008-08-27 中国科学院计算技术研究所 Method for establishing network flow classified model and corresponding system thereof
CN102033965A (en) * 2011-01-17 2011-04-27 安徽海汇金融投资集团有限公司 Method and system for classifying data based on classification model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JEFFREY ERMAN ET AL: ""Traffic Classification Using Clustering Algorithms"", 《SIGCOMM’06 WORKSHOPS》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104917739A (en) * 2014-03-14 2015-09-16 腾讯科技(北京)有限公司 False account identification method and device
CN104917739B (en) * 2014-03-14 2018-11-09 腾讯科技(北京)有限公司 The recognition methods of false account and device
CN108141377A (en) * 2015-10-12 2018-06-08 华为技术有限公司 Network flow early stage classifies
CN108141377B (en) * 2015-10-12 2020-08-07 华为技术有限公司 Early classification of network flows
WO2017152883A1 (en) * 2016-03-11 2017-09-14 华为技术有限公司 Coflow recognition method and system, and server using method
CN107181724A (en) * 2016-03-11 2017-09-19 华为技术有限公司 A kind of recognition methods for cooperateing with stream, system and the server using this method
US10567299B2 (en) 2016-03-11 2020-02-18 Huawei Technologies Co., Ltd. Coflow identification method and system, and server using method

Similar Documents

Publication Publication Date Title
CN108234463B (en) User risk assessment and analysis method based on multi-dimensional behavior model
CN103795612B (en) Rubbish and illegal information detecting method in instant messaging
CN102420723A (en) Anomaly detection method for various kinds of intrusion
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN111191767A (en) Vectorization-based malicious traffic attack type judgment method
CN110727643B (en) File classification management method and system based on machine learning
CN113378899B (en) Abnormal account identification method, device, equipment and storage medium
CN105871619A (en) Method for n-gram-based multi-feature flow load type detection
CN106997367A (en) Sorting technique, sorter and the categorizing system of program file
CN111177360B (en) Self-adaptive filtering method and device based on user logs on cloud
CN104796300B (en) A kind of packet feature extracting method and device
CN104767736A (en) Method for separating unknown single protocol data stream into different types of data frames
CN110263338A (en) Replace entity name method, apparatus, storage medium and electronic device
CN112884121A (en) Traffic identification method based on generation of confrontation deep convolutional network
CN102567494A (en) Website classification method and device
CN105827603A (en) Inexplicit protocol feature library establishment method and device and inexplicit message classification method and device
CN112861894A (en) Data stream classification method, device and system
CN110222795A (en) The recognition methods of P2P flow based on convolutional neural networks and relevant apparatus
CN102739522A (en) Method and device for classifying Internet data streams
CN110519228B (en) Method and system for identifying malicious cloud robot in black-production scene
CN110597796A (en) Big data real-time modeling method and system based on full life cycle
CN106557983B (en) Microblog junk user detection method based on fuzzy multi-class SVM
CN111917665A (en) Terminal application data stream identification method and system
CN109086815B (en) Floating point number discretization method in decision tree model based on FPGA
CN1612135A (en) Invasion detection (protection) product and firewall product protocol identifying technology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20121017