CN109286622A - A kind of network inbreak detection method based on learning rules collection - Google Patents

A kind of network inbreak detection method based on learning rules collection Download PDF

Info

Publication number
CN109286622A
CN109286622A CN201811122445.1A CN201811122445A CN109286622A CN 109286622 A CN109286622 A CN 109286622A CN 201811122445 A CN201811122445 A CN 201811122445A CN 109286622 A CN109286622 A CN 109286622A
Authority
CN
China
Prior art keywords
classifying rules
data
data item
rules
sample
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811122445.1A
Other languages
Chinese (zh)
Other versions
CN109286622B (en
Inventor
王劲松
杨传印
黄玮
莫敬涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Technology
Original Assignee
Tianjin University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Technology filed Critical Tianjin University of Technology
Priority to CN201811122445.1A priority Critical patent/CN109286622B/en
Publication of CN109286622A publication Critical patent/CN109286622A/en
Application granted granted Critical
Publication of CN109286622B publication Critical patent/CN109286622B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of network inbreak detection method based on learning rules collection, first to the number of network connections Data preprocess in international standard data set KDDCup99, then the data item of redundancy is removed using improved FOIL algorithm and extracts classifying rules, the classification to network connection test data is finally realized according to classifying rules, judges whether the network connection is attack connection and specific attack type.Method in the present invention chooses the network connection data in KDDCup99 and carries out experimental verification, and for data set the characteristics of makes improvement to original FOIL algorithm, it is made to be more suitable for standard data set.It is extracted and the efficiency of network connection test data classification the experimental results showed that improved algorithm effectively increases classifying rules, the accuracy of testing result also has a certain upgrade, and it is low to effectively prevent traditional intruding detection system classification effectiveness, the high defect of rate of false alarm.

Description

A kind of network inbreak detection method based on learning rules collection
Technical field
This method is related to Network Intrusion Detection System field more particularly to a kind of network intrusions inspection based on learning rules collection Survey method.
Background technique
Important supplement of the intruding detection system as firewall can be collected in the case where not influencing network system performance With several key point informations in analytical calculation machine network or computer system, finds whether to have in network or system and be invaded Sign, to complete protection to network system, it plays a very important role network system security.
Intrusion Detection Technique based on data mining technology has become the hot spot of research, has generated many achievements both at home and abroad, But still generally existing some problems: based on the intrusion detection method of data mining in Detection accuracy, false alarm rate and real-time side Face needs to be further increased.The close-fitting model of especially needed data digging technology and intrusion detection improves invasion The accuracy and timeliness of detection.
Summary of the invention
The present invention is directed to the defect that traditional intruding detection system classification effectiveness is low, rate of false alarm is high, proposes a kind of based on The network inbreak detection method for practising rule set handles network connection data by using improved FOIL algorithm, improves invasion The timeliness and accuracy rate of detection.By testing on KDDCup99 experimental data set, original FOIL algorithm is compared, after improvement Algorithm be applied to intrusion detection in have certain feasibility.
Learning rules set algorithm is applied in intrusion detection method, it is mainly a kind of centered on data mining and processing Viewpoint, for network connection data acquisition process process not within the scope of consideration of the invention.Zhong Yi of the present invention is international For standard network connects data set KDDCup99, invasion network connection is divided using the thought of data mining as theoretical foundation Class.
Technical solution of the present invention:
A kind of network inbreak detection method based on learning rules collection, method includes the following steps:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set number by step 1 According to then being pre-processed to each data item in training set and test set, with each of specialization network connection data Data item.
Step 2, using improved FOIL algorithm, that is, learning rules set algorithm, remove the attribute of the redundancy in training set data Data item is trained remaining each attribute data item, extracts classifying rules, obtained classifying rules is stored in classification gauge Then in library.
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to it is matched not The case where covering sample in training set with classifying rules calculates separately the bat of every classifying rules, according to classification gauge Different connection types add up the accuracy of classifying rules respectively in then.
Step 4, the accuracy for saving different connection types in step 3, finding out the maximum connection type of accuracy is this Item is connected to the network the classification results of test data, as final detection result;Data in test set after classification obtains result, The test set data correctly classified are added in training set data together with testing result, the instruction as subsequent extracted classifying rules Practice collection data source, so that classifying rules can dynamically update, to adapt to the variation of heterogeneous networks connection.
Wherein, data set described in step 1 pretreatment the following steps are included:
1.1st step is made in KDDCup99 data set 60% network connection data using the method for cross validation For training set, remaining 40% network connection data is as test set.
1.2nd step adds sequential parameter, specialization network connection data for each data item in every network connection data In each data item, enhance the discrimination of data;KDDCup99 data are concentrated with many identical data item, such as a net Network, which connects in data, has multiple " 0 ", and every column data has specific meaning.Original FOIL algorithm connects in one network of processing When connecing data, same data item can be considered as to same data, therefore can shadow using original FOIL algorithm process data set Ring the accuracy of the speed and classification results of extracting classifying rules.To make up this defect, need be in data preprocessing phase The data item of each column adds sequential parameter.The identical data in every network connection can be distinguished in this way, also can guarantee data Concrete meaning.
Being concentrated redundant data item using improved FOIL algorithm removal training data described in step 2 and extracted classifying rules needs To pass through following steps:
Pretreated training set data is divided into positive example and negative example two according to the difference of network connection type by the 2.1st step Major class simultaneously counts the attribute data item in positive example set.The all-network connection type in training set is counted, class will be wherein connected A kind of identical network connection data of type is classified as positive example, and the data of every other connection type are classified as negative example, counts positive example collection Different attribute data item in positive example set is added to positive example attribute data item set Vset by the different attribute data item in conjunction In, the former piece of classifying rules r is set to sky.
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal are not met The data item of the redundancy of restrictive condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r New classifying rules r' is obtained in part.The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents sample in positive example set and negative example set in formula Quantity, P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example collection The quantity of the sample covered in conjunction and negative example set.At this point, the gain of all properties data item is calculated and compares, by gain maximum Attribute data item be added in the former piece of classifying rules r.
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example collection in formula The quantity of the sample covered in conjunction, P*And N*It then respectively represents new after the former piece that attribute data item v is added to classifying rules r The quantity for the sample that classifying rules r' is covered in positive example set and negative example set.At this point, being intended to the maximum attribute data of gain Item, which is added in classifying rules r former piece, needs to meet following restrictive condition: the maximum attribute data item of gain is added to classification gauge New classifying rules r' is then obtained after r former piece will cover less sample, i.e. N in negative example set*< N;If gain is maximum Attribute data item obtains new classifying rules r' and covers the sample of negative example set not becoming after being added to classifying rules r former piece Change, and the maximum attribute data item of gain is identical as a certain item attribute value in classifying rules r former piece, thinks the data item It is redundancy, which can be deleted from positive example attribute data item set Vset, then in remaining attribute data item The middle maximum attribute data item of lookup gain is added in classifying rules r former piece according to above-mentioned requirements with specialization classifying rules.
2.3rd step saves classifying rules r' in the 2.2nd step, deletes in negative example set institute either with or without being classified regular r' The sample of covering.All samples in negative example set are traversed, by the samples not comprising classifying rules r' former piece all in negative example set Example is deleted.If all samples are deleted in negative example set, classifying rules r' can be used as a classifying rules;If negative It is not deleted in example set there are also sample, then should count other attribute datas in the positive example comprising classifying rules r' former piece ?.Then it returns in the 2.2nd step, the maximum attribute data item of satisfactory gain is added in classifying rules r' former piece and is obtained To new classifying rules R, be further continued for deleting all samples not covered by new classifying rules R in negative example set, repeat more than Process is deleted until all samples in negative example set.
2.4th step saves classifying rules R (or r') obtained in the 2.3rd step, deletes in positive example set and all is classified rule The then sample of R (or r') covering.All samples in negative example set are traversed, whether compare in sample one by one includes classifying rules R (or r') former piece deletes all samples comprising classifying rules R (or r') former piece.If all samples in positive example set All classifying rules extraction for being deleted then the type finishes;If there is sample remaining in positive example set, remaining sample is counted All properties data item in example returns to all steps before the 2.2nd step repeats, until sample all in positive example set is equal It is deleted, the classifying rules extraction of the type finishes.The classifying rules of every deletion positive example can be used as the type sample point The classifying rules of class, the consequent of these classifying rules are the network connection type of the type sample, and classifying rules storage, which is arrived, to divide In rule-like library.
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until all types sample Classifying rules is all found, and is terminated by the process that training set extracts classifying rules.
The matched classifying rules bat of calculating described in step 3 need to pass through following steps:
3.1st step, read test collection data, by test set every network connection data with it is every in classifying rules library Classifying rules compares, the classifying rules that record matching arrives.Every network connection data has many in KDDCup99 data set Data item, extracting in step 2 may include several attribute data items in the former piece of numerous classifying rules, in test set When the network connection data of every UNKNOWN TYPE is classified according to the classifying rules extracted, a plurality of classification gauge may be matched to Then, all matched classifying rules are recorded.
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the unknown class The network connection data of type is the connection type in these classifying rules consequents;If matched classifying rules consequent not phase Together, then the bat of these classifying rules, classifying rules R are calculated separatelyiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising dividing in training set Rule-like RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classification in the sample of consequent Regular RiThe sample quantity of former piece.After obtaining every matched classifying rules bat, these bats are pressed It adds up respectively according to connection type, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy Accuracy(ti)。
Classification results are obtained from the accuracy of the different connection types of matching rule described in step 4 and are added into training set Add sorted test data that need to pass through following steps:
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3, compare to obtain accuracy Maximum connection type is the final classification result of the network connection test data.
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation The characteristic of dynamic change, primary training gained classifying rules possibly can not adapt to the network data constantly changed, in the method Sorted test data is added to training set together with corresponding classification results to train again, generates new classifying rules simultaneously Update classifying rules library.
The invention has the following advantages that
The present invention is divided into training by taking KDDCup99 international standard data set as an example, first, in accordance with the method for crosscheck Collection and test set add sequential parameter to 41 attribute data items in training set and test set.Then pass through improved FOIL Algorithm removes the data item of redundancy in training set and classifying rules is extracted in training.Finally by Laplce's accuracy estimation formulas The bat that matching test concentrates the network connection data classifying rules of UNKNOWN TYPE is calculated, most by the comparison of accuracy Eventually obtain classification results, while by test set data and corresponding classification results be added in training set and instructed with real-time update Practice collection data, generate new classifying rules, makes this method that there is good adaptivity and self-learning property.The invention, which uses, to be changed Into FOIL algorithm, a large amount of when efficiently avoiding original FOIL algorithm process KDDCup99 data set repeat traversal and meter It calculates, reduces the time complexity of algorithm, greatly accelerate the efficiency for extracting classifying rules and classification, improve number of network connections According to the accuracy of classification and Detection result, the characteristic of adaptivity and self study is but also this method has stronger stability.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the network inbreak detection method of learning rules collection.
Specific embodiment
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawing.
Learning rules set algorithm is applied in intrusion detection method, mainly a kind of data-centered viewpoint is right In network connection data acquisition process process not within the scope of consideration of the invention.The present invention is connected to the network with international standard For data set KDDCup99, classify using data mining thought as theoretical foundation to network connection data.
Fig. 1 has carried out detailed step explanation to a kind of network inbreak detection method based on learning rules collection.The present invention The method of offer the following steps are included:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set number by step 1 According to then being pre-processed each data item in training set and test set with every number in specialization network connection data According to item.
1.1st step is made in KDDCup99 data set 60% network connection data using the method for cross validation For training set, remaining 40% network connection data is as test set.10 nets will be randomly selected in KDDCup99 data set Network connection data are classified as one group, then arbitrarily chosen from every group wherein 6 be added to training set, remaining 4 data is added To test set.
1.2nd step adds sequential parameter, specialization network connection data for each data item in every network connection data In each data item, enhance the discrimination of data;KDDCup99 data are concentrated with many identical data item, such as a net Network, which connects, has multiple " 0 " or " 1 " in data, every column data has specific meaning, and original FOIL algorithm is in processing one It is regarded as same data when same data item in network connection data, therefore uses original FOIL algorithm process number It will affect the speed and the accuracy of classification results for extracting classifying rules according to collection.To make up this defect, need to locate in advance in data The reason stage is that the data item of each column adds sequential parameter place, i.e. column information where data item, each data in data set Item possible constructions body DataItem { int place, string data } expression, such as such net randomly selected Network connects data:
The network connection data that table 1 randomly selects
0 tcp http SF 279 1129 ...... 0 0 normal
Data preprocessing phase is that data item addition sequential parameter is as follows: (1,0), (2, tcp), and (3, http), (4, SF), (5,279) ..., (40,0), (41,0).The location information for not only saving data is handled in this way, thereby ensures that data represent Meaning will not obscure, and ergodic data is only needed during subsequent extracted classifying rules to concentrate the number in respective column According to, greatly reduce ergodic data amount, significantly improve extract classifying rules efficiency.
Step 2, the data item that redundancy in training set is removed using improved FOIL algorithm, to remaining network connection data It is trained, extracts classifying rules, and classifying rules is stored in classifying rules library.
Following explanation is carried out first:
Network connection: each row of data in international standard data set KDDCup99 is a network connection, every connection There are 42 data, wherein first 41 are the attributes being connected to the network, the last one is the connection type of the network connection, connects class The network connection that type is normal is normally to be connected to the network, remaining connection type is attack type.
Sample: the training set selected from international standard data set is the set of the network connection data of a variety of connection types, Every network connection data is a sample.
Attribute data item: data of each data item after adding sequential parameter in sample are known as attribute data item, It include sequential parameter and data item in this position in attribute data item.
Classifying rules: classifying rules is made of classifying rules former piece and classifying rules consequent, if classifying rules former piece by Dry item includes the data item composition of attribute data item, and classifying rules consequent is connection type.
Covering: including that data item all in classifying rules former piece then claims the sample energy in numerous data item of sample It is covered by this classifying rules.
Gain: FOIL algorithm needs to select attribute data item with specialization classifying rules during extracting classifying rules Former piece, gain are the differences of attribute data item information coding before and after being added to classifying rules former piece.The bigger attribute number of gain Bigger according to contribution of the item to information coding reduction, alternatively attribute data item is added to classification gauge to the gain of FOIL algorithms selection The then major criterion of former piece.
Pretreated training set data is divided into two major classes according to the difference of network connection type and counted by the 2.1st step Attribute data item in positive example set;The all-network connection type in training set is counted, it will wherein connection type identical one Kind network connection data is classified as positive example, and the data of every other connection type are classified as negative example, counts the difference in positive example set Different attribute data item in positive example set is added in positive example attribute data item set Vset, will classify by attribute data item Regular r former piece is set to sky.
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal are not met The data item of the redundancy of restrictive condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r New classifying rules r' is obtained in part;The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents sample in positive example set and negative example set in formula Quantity, P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example collection The quantity of the sample covered in conjunction and negative example set.At this point, being increased in the gain for calculating and comparing all properties data item The maximum attribute data item of benefit can be directly appended in classifying rules former piece.
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example collection in formula The quantity of the sample covered in conjunction, P*And N*It then respectively represents new after the former piece that attribute data item v is added to classifying rules r The quantity for the sample that classifying rules r' is covered in positive example set and negative example set.At this point, being intended to the maximum attribute data of gain Item, which is added in classifying rules r former piece, needs to meet following restrictive condition: the maximum attribute data item of gain is added to classification gauge New classifying rules r' is then obtained after r former piece will cover less sample, i.e. N in negative example set*< N;If gain is maximum Attribute data item obtains new classifying rules r' and covers the sample of negative example set not becoming after being added to classifying rules r former piece Change, and the maximum attribute data item of gain is identical as the attribute value of a certain item in classifying rules r former piece, thinks the data Item is redundancy, which can be deleted from positive example attribute data item set Vset, then in remaining attribute data The maximum attribute data item of gain is searched in be added in classifying rules r former piece according to above-mentioned requirements with specialization classifying rules.
The attribute data item of redundancy is removed in this step, and mainly consider may be by the category of redundancy when training set is smaller Property data item be added in classifying rules former piece, the attribute data item of redundancy also will affect classification to no any contribution of classifying Accuracy.Example below such as:
Table 2 illustrates to remove 8 datas that the citing of redundant attributes data item is chosen
F1 F2 F3 F4 Type
... 0 0 0 0 ... normal
... 0 0 1 0 ... land
... 0 1 0 0 ... ipsweep
... 0 1 1 0 ... normal
... 1 0 0 1 ... teardrop
... 1 0 1 1 ... normal
... 1 1 0 1 ... normal
... 1 1 1 1 ... back
For this 8 samples, connection type is that normal (positive example) is identical with the sample quantity of non-normal (negative example), often The value and quantity of Column Properties value are also identical, and wherein all values of attribute F4 are identical as F1, therefore F4 is the attribute of redundancy.? When extracting the classifying rules for the network connection that connection type is normal, since connection type is each category in the sample of normal Property data item gain it is all the same, if for the first time attribute data item (F1,0) is added in classifying rules former piece, Jin Jinyi Second attribute data item to be selected is distinguished by the gain of attribute data item to draw redundant attributes data item (F4,0) Into in classifying rules former piece, this cannot have any positive contribution to subsequent classification.By adding restrictive condition N*< N can be protected Demonstrate,prove the attribute data item that the maximum attribute data item of gain chosen every time is not centainly redundancy.
2.3rd step saves classifying rules r ' in the 2.2nd step, deletes in negative example set institute either with or without being classified regular r ' The sample of covering;All samples in negative example set are traversed, the samples not comprising classifying rules r ' all in negative example set are deleted It removes;If all samples are deleted in negative example set, classifying rules r ' can be used as a classifying rules;If negative example collection It is not deleted in conjunction there are also sample, then should count other attribute data items in the positive example comprising classifying rules r ' former piece, so It returns in the 2.2nd step afterwards, the maximum attribute data item of satisfactory gain is added in classifying rules r ' former piece and is obtained newly Classifying rules R, then proceed to delete in negative example set it is all not by new classifying rules R cover samples, repeat above procedure Until all samples in negative example set are deleted;
2.4th step saves classifying rules R (or r ') obtained in the 2.3rd step, deletes in positive example set and all is classified rule The then sample of R (or r ') covering;All samples in positive example set are traversed, whether compare in sample one by one includes classifying rules R (or r ') former piece deletes all samples comprising classifying rules R (or r ') former piece;If all samples in positive example set All classifying rules extraction for being deleted then the type finishes;If there is sample remaining in positive example set, remaining sample is counted All properties data item in example returns to all steps before the 2.2nd step repeats, until sample all in positive example set is equal It is deleted, the classifying rules extraction of the type finishes;The classifying rules of every deletion positive example can be used as the type sample point The classifying rules of class, the consequent of these classifying rules are the network connection type of the type sample, and classifying rules storage, which is arrived, to divide In rule-like library;
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until all types sample Classifying rules is all found, and is terminated by the process that training set extracts classifying rules.
It gives one example below and the above process is illustrated.Following data are randomly selected from KDDCup99:
5 datas randomly selected in 3 KDDCup99 data set of table
0 tcp http SF 54540 8314 …… 0.04 0.04 back
14 tcp http RSTR 33580 7300 …… 1 1 back
0 icmp eco_i SF 18 0 …… 0 0 ipsweep
0 tcp http SF 321 480 …… 1 1 normal
0 tcp http SF 277 3410 …… 0 0 normal
Using this 5 samples as training set sample, after adding sequential parameter, the sample that connection type is back is classified as just Example, other all types of samples are negative example, train the classifying rules of positive example first, different attribute data item group in positive example At attribute data item set Vset (back): (1,0), (2, tcp), (3, http), (4, SF), (5,54540), (6, 8314), (40,0.04), (41,0.04), (1,14), (4, RSTR), (5,33580), (6,7300), (40,1), (41,1) }, Classifying rules r former piece is set to sky, the gain of each attribute data item is calculated, obtains (5,54540), (6,8314), (40, 0.04) yield value of, (41,0.04), (1,14), (4, RSTR), (5,33580), (6,7300), (40,1), (41,1) is maximum, At this time if (40,1) are added in the former piece of classifying rules r, delete and be not classified regular r:{ (40,1) in negative example → The sample of back former piece covering, the sample in negative example set can not be erased entirely.
The increasing of all properties data item that connection type is back and is classified in the sample of regular r covering is calculated again Benefit should be by (41,1) this attribute data item as redundant data entry deletion, because it cannot if the gain of (41,1) is maximum Satisfaction obtains this limitation of less sample in the new negative example set of classifying rules r ' covering after being added to classifying rules r former piece Condition, and the attribute value of the attribute data item is identical with the attribute value of attribute data item in classifying rules r.Increasing is calculated The maximum attribute data item of benefit is: (1,14), (4, RSTR), (5,33580), (6,7300) therefrom select an attribute data Item is added in classifying rules former piece, obtains the classifying rules that connection type is back are as follows: r:{ (40,1), (1,14) } → Back deletes the samples for not being classified regular r former piece covering all in negative example, and negative examples all at this time is deleted, and is found One classifying rules of positive example.The sample that regular r former piece covering is classified in positive example is deleted, positive example is not erased entirely, is needed Other classifying rules of positive example: { (40,0.04) } → back are generated according still further to above step.So far, all positive examples classification Rule is found and is finished, then using the sample of other connection types as positive example, generates corresponding classifying rules.Finally obtain following point Rule-like: { (40,1), (1,14) } → back;{ (40,0.04) } → back;{ (2, icmp) } → ipsweep;{ (5,321) } →normal;{ (6,3410) } → normal;These classifying rules are stored in classifying rules library.
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to it is matched not The case where covering sample in training set with classifying rules calculates separately the bat of every classifying rules, according to classification gauge Different connection types in then add up the accuracy of classifying rules respectively.
3.1st step, read test collection data, by every network connection data in test set with every in classifying rules library Classifying rules compares, the classifying rules that record matching arrives.Every network connection data has many numbers in KDDCup99 data set According to item, being extracted in the former piece of numerous classifying rules in step 2 may include several attribute data items, every in test set When the network connection data of UNKNOWN TYPE is classified according to the classifying rules extracted, it may be matched to a plurality of classifying rules, Record all matched classifying rules.
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the unknown class The network connection data of type is the connection type in these classifying rules consequents;If matched classifying rules consequent not phase Together, then the bat of these classifying rules, classifying rules R are calculated separatelyiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising dividing in training set Rule-like RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classification in the sample of consequent Regular RiThe sample quantity of former piece.After obtaining every matched classifying rules bat, these bats are pressed It adds up respectively according to connection type, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy Accuracy(ti)。
Step 4, the accuracy for saving different connection types in step 3 compare and find out accuracy maximum connection type and be It is connected to the network the classification results of test;Simultaneously to make this method have good self-learning property, the data of test set are in basis After classifying rules classification obtains corresponding result, test set data are added in training set data together with corresponding classification results, are The extraction of subsequent classification rule provides new training set data source, guarantees that the dynamic of classifying rules updates.
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3, compare to obtain accuracy Maximum connection type is the final classification result of the network connection test data.
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation Dynamic characteristic, primary training gained classifying rules possibly can not adapt to the network data constantly changed, in the method will point Test data after class is added to training set together with corresponding classification results and trains again, generates new classifying rules and updates Classifying rules library.
In order to show the process of step 3 and step 4, from KDDCup99 data set connection type be back, ipsweep, Selection one is as follows in the network connection data of normal:
Connection type is the data chosen in the data of three of the above in 4 KDDCup99 data set of table
14 tcp http SF 321 3410 ...... 1 0 normal
This data is matched with the classifying rules in classifying rules library, matches this network connection test data Classifying rules has 3: { (5,321) } → normal;{ (6,3410) } → normal;{ (40,1), (1,14) } → back.Due to The consequent for 3 classifying rules being matched to is not identical, needs to calculate separately the corresponding two kinds of connection types of three classifying rules Accuracy.The biggish company of accuracy Connecing type is normal, i.e., it is normal that this network connection test data, which obtains classification results, as normal network connection.
It is applied to the property of Network Intrusion Detection System in order to verify improved FOIL algorithm compared to original FOIL algorithm Can, we carry out following confirmatory experiment.Experimental situation a: PC machine.CPU model Inter Core i7-4770 3.4GHz, Memory 8G, 1T hard disk, has the software environment of Visual Studio 2013.Experimental data: according in KDDCup99 data set The different proportion of network connection type, therefrom randomly selects, and guarantees that the taken data volume of every kind of connection type is no more than 50 Item chooses 2150 altogether, then using the method for crosscheck, chooses therein 60% and is used as training set data, and in addition 40% As test set data, 5 experiments are carried out to the FOIL algorithm for improving front and back, experimental result is as shown in table 5.By consulting profession Paper information obtains the bat of current network intrusion detection related algorithm, and comparing result is as shown in table 6.
Table 5 adopts international standards data set KDDCup99 to FOIL proof of algorithm Comparative result before and after improving
6 current network intrusion detection related algorithm of table is to the classification bat comparison of KDDCup99 network connection data
The results showed that there is very aspect between intrusion detection method of the invention compares original FOIL algorithm when being executed It is big to improve, there is preferable performance in terms of the bat of classification results compared with other algorithms.

Claims (5)

1. a kind of network inbreak detection method based on learning rules collection, it is characterised in that method includes the following steps:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set data by step 1, so Each data item in training set and test set is pre-processed afterwards, with each data in specialization network connection data ?;
Step 2, using improved FOIL algorithm, that is, learning rules set algorithm, remove the attribute data of redundancy in training set data , remaining each attribute data item is trained, classifying rules is extracted, by obtained classifying rules storage to classifying rules library In;
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to matched difference Classifying rules covers the bat for the case where sample calculating separately every classifying rules in training set, according to classifying rules Middle difference connection type adds up the bat of classifying rules respectively;
Step 4, the accuracy for saving different connection types in step 3, finding out the maximum connection type of accuracy is this net The classification results of network connecting test data, as final detection result;Data in test set, will just after classification obtains result The test set data really classified are added in training set data together with testing result, the training set as subsequent extracted classifying rules Data source, so that classifying rules can dynamically update, to adapt to the variation of heterogeneous networks connection.
2. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 1 institute State is to the pretreated method of data set:
1.1st step, using cross validation method using in KDDCup99 data set 60% network connection data as instruction Practice collection, remaining 40% network connection data is as test set;
1.2nd step adds sequential parameter for each data item in every network connection data, in specialization network connection data Each data item enhances the discrimination of data, guarantees the concrete meaning of each data item.
3. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 2 institute It states using the data item of redundancy in improved FOIL algorithm removal training set and the method for extracting classifying rules is:
Pretreated training set data is divided into positive example and negative example two major classes according to the difference of network connection type by the 2.1st step And count the attribute data item in positive example set;The all-network connection type in training set is counted, it will wherein connection type phase A kind of same network connection data is classified as positive example, and the data of every other connection type are classified as negative example, will be in positive example set Different attribute data item is added in positive example attribute data item set Vset, and classifying rules r former piece is set to sky;
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal do not meet limitation The attribute data item of the redundancy of condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r New classifying rules r' is obtained in part;The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents the quantity of sample in positive example set and negative example set in formula, P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example set and negative The quantity of the sample covered in example set;At this point, the gain of all properties data item is calculated and compares, by the maximum attribute of gain Data item is added in the former piece of classifying rules r;
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example set in formula The quantity of the sample of covering, P*And N*Then respectively represent new classification after the former piece that attribute data item v is added to classifying rules r The quantity for the sample that regular r' is covered in positive example set and negative example set;At this point, being intended to the maximum attribute data Xiang Tian of gain Be added in the former piece of classifying rules r and need to meet following restrictive condition: the maximum attribute data item of gain is added to classifying rules r New classifying rules r' is obtained after former piece will cover less sample, i.e. N in negative example set*< N;If the maximum attribute of gain Data item obtains new classifying rules r' and covers the sample of negative example set not changing after being added to classifying rules r former piece, and And the maximum attribute data item of gain is identical as a certain item attribute value in classifying rules r former piece, thinks that the data item is redundancy , which can be deleted from positive example attribute data item set Vset, then be searched in remaining attribute data item The maximum attribute data item of gain is added in the former piece of classifying rules r according to above-mentioned requirements with specialization classifying rules;
Classifying rules r' in 2.3rd step, the 2.2nd step of preservation deletes institute in negative example set and covers either with or without regular r' is classified Sample;All samples in negative example set are traversed, the samples not comprising classifying rules r' former piece all in negative example set are deleted It removes;If all samples are deleted in negative example set, classifying rules r' can be used as a classifying rules, classifying rules r' Consequent be network connection type in positive example;If be not deleted in negative example set there are also sample, it should count and include Other attribute data items, then return in the 2.2nd step in the positive example of classifying rules r' former piece, and satisfactory gain is maximum Attribute data item be added in classifying rules r' former piece and obtain new classifying rules R, be further continued for deleting in negative example set and own The sample not covered by the former piece of new classifying rules R repeats above procedure until all samples in negative example set are deleted;
2.4th step, save the 2.3rd step obtained in classifying rules R or r', delete positive example set in it is all be classified regular R or The sample of r' former piece covering;Traverse all samples in positive example set, compare in sample one by one whether include classifying rules R or R' former piece deletes all samples comprising classifying rules R or r' former piece;If all samples are deleted in positive example set Then all classifying rules extraction of the type finishes;If there is sample remaining in positive example set, the institute in remaining sample is counted There is an attribute data item, returns to all steps before the 2.2nd step repeats, until sample all in positive example set is deleted, The classifying rules extraction of the type finishes;The classifying rules of every deletion positive example can be used as point of the network connection of the type Rule-like, the consequent of these classifying rules are the network connection type of the type sample, classifying rules storage to classifying rules In library;
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until the classification of all types sample Rule is all found, and is terminated by the process that training set extracts classifying rules.
4. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 3 institute The method of the bat for the matched different classifications rule of calculating stated is:
3.1st step, read test collection data, by every network connection data in test set and every point in classifying rules library Rule-like compares, the classifying rules that record matching arrives;Every network connection data has many data in KDDCup99 data set , being extracted in the former piece of numerous classifying rules in step 2 may include several attribute data items, and every in test set When the network connection data of UNKNOWN TYPE is classified according to the classifying rules extracted, it may be matched to a plurality of classifying rules, remembered Record all matched classifying rules:
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the UNKNOWN TYPE Network connection data is the connection type in these classifying rules consequents;If matched classifying rules consequent is not identical, Calculate separately the bat of these classifying rules, classifying rules RiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising classification gauge in training set Then RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classifying rules in the sample of consequent RiThe sample quantity of former piece;After obtaining every matched classifying rules bat, by these bats according to even It connects type to add up respectively, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy Accuracy (ti)。
5. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 4 institute It states, obtained classification results from the accuracy of the different connection types of matching classifying rules and is added into training set sorted Test data method is:
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3 compare to obtain accuracy maximum Connection type be the network connection test data final classification result;
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation dynamic The characteristic of variation, once training gained classifying rules possibly can not adapt to the network data constantly changed, will divide in the method Test data after class is added to training set together with corresponding classification results and trains again, generates new classifying rules and updates Classifying rules library.
CN201811122445.1A 2018-09-26 2018-09-26 Network intrusion detection method based on learning rule set Active CN109286622B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811122445.1A CN109286622B (en) 2018-09-26 2018-09-26 Network intrusion detection method based on learning rule set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811122445.1A CN109286622B (en) 2018-09-26 2018-09-26 Network intrusion detection method based on learning rule set

Publications (2)

Publication Number Publication Date
CN109286622A true CN109286622A (en) 2019-01-29
CN109286622B CN109286622B (en) 2021-04-20

Family

ID=65182225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811122445.1A Active CN109286622B (en) 2018-09-26 2018-09-26 Network intrusion detection method based on learning rule set

Country Status (1)

Country Link
CN (1) CN109286622B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708850A (en) * 2020-07-16 2020-09-25 国网北京市电力公司 Processing method and device for power industry expansion metering rule base
CN113342799A (en) * 2021-08-09 2021-09-03 明品云(北京)数据科技有限公司 Data correction method and system
WO2023051228A1 (en) * 2021-09-28 2023-04-06 阿里巴巴(中国)有限公司 Method and apparatus for sample data processing, and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120096551A1 (en) * 2010-10-13 2012-04-19 National Taiwan University Of Science And Technology Intrusion detecting system and method for establishing classifying rules thereof
CN105204487A (en) * 2014-12-26 2015-12-30 北京邮电大学 Intrusion detection method and intrusion detection system for industrial control system based on communication model
US9230102B2 (en) * 2012-04-26 2016-01-05 Electronics And Telecommunications Research Institute Apparatus and method for detecting traffic flooding attack and conducting in-depth analysis using data mining
CN105306475A (en) * 2015-11-05 2016-02-03 天津理工大学 Network intrusion detection method based on association rule classification
CN107835201A (en) * 2017-12-14 2018-03-23 华中师范大学 Network attack detecting method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120096551A1 (en) * 2010-10-13 2012-04-19 National Taiwan University Of Science And Technology Intrusion detecting system and method for establishing classifying rules thereof
US9230102B2 (en) * 2012-04-26 2016-01-05 Electronics And Telecommunications Research Institute Apparatus and method for detecting traffic flooding attack and conducting in-depth analysis using data mining
CN105204487A (en) * 2014-12-26 2015-12-30 北京邮电大学 Intrusion detection method and intrusion detection system for industrial control system based on communication model
CN105306475A (en) * 2015-11-05 2016-02-03 天津理工大学 Network intrusion detection method based on association rule classification
CN107835201A (en) * 2017-12-14 2018-03-23 华中师范大学 Network attack detecting method and device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. R. QUINLAN: ""FOIL: A Midterm Report"", 《BASSER DEPARTMENT OF COMPUTER SCIENCE UNIVER.SITY OF SYDNEY》 *
郭建龙: ""应用机器学习制定的入侵检测专家***规则集"", 《计算机工程》 *
陈志雄: ""基于信息增益的中文文本关联分类"", 《中文信息学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708850A (en) * 2020-07-16 2020-09-25 国网北京市电力公司 Processing method and device for power industry expansion metering rule base
CN113342799A (en) * 2021-08-09 2021-09-03 明品云(北京)数据科技有限公司 Data correction method and system
WO2023051228A1 (en) * 2021-09-28 2023-04-06 阿里巴巴(中国)有限公司 Method and apparatus for sample data processing, and device and storage medium

Also Published As

Publication number Publication date
CN109286622B (en) 2021-04-20

Similar Documents

Publication Publication Date Title
CN105306475B (en) A kind of network inbreak detection method based on Classification of Association Rules
CN110457404B (en) Social media account classification method based on complex heterogeneous network
CN107835087B (en) Automatic extraction method of alarm rule of safety equipment based on frequent pattern mining
CN105550583B (en) Android platform malicious application detection method based on random forest classification method
CN104660594B (en) A kind of virtual malicious node and its Network Recognition method towards social networks
CN104281674B (en) It is a kind of based on the adaptive clustering scheme and system that gather coefficient
US20210026909A1 (en) System and method for identifying contacts of a target user in a social network
CN107992746A (en) Malicious act method for digging and device
CN103412888B (en) A kind of point of interest recognition methods and device
CN104699755B (en) A kind of intelligent multiple target integrated recognition method based on data mining
CN103970733B (en) A kind of Chinese new word identification method based on graph structure
CN109286622A (en) A kind of network inbreak detection method based on learning rules collection
CN106228398A (en) Specific user&#39;s digging system based on C4.5 decision Tree algorithms and method thereof
CN108897842A (en) Computer readable storage medium and computer system
CN101582817A (en) Method for extracting network interactive behavioral pattern and analyzing similarity
CN103927398A (en) Microblog hype group discovering method based on maximum frequent item set mining
CN103886030B (en) Cost-sensitive decision-making tree based physical information fusion system data classification method
CN107679135A (en) The topic detection of network-oriented text big data and tracking, device
CN108268460A (en) A kind of method for automatically selecting optimal models based on big data
CN113011889A (en) Account abnormity identification method, system, device, equipment and medium
CN106650446A (en) Identification method and system of malicious program behavior, based on system call
CN106603538A (en) Invasion detection method and system
CN116910283A (en) Graph storage method and system for network behavior data
CN107832611B (en) Zombie program detection and classification method combining dynamic and static characteristics
CN117807245A (en) Node characteristic extraction method and similar node searching method in network asset map

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant