CN109286622A - A kind of network inbreak detection method based on learning rules collection - Google Patents
A kind of network inbreak detection method based on learning rules collection Download PDFInfo
- Publication number
- CN109286622A CN109286622A CN201811122445.1A CN201811122445A CN109286622A CN 109286622 A CN109286622 A CN 109286622A CN 201811122445 A CN201811122445 A CN 201811122445A CN 109286622 A CN109286622 A CN 109286622A
- Authority
- CN
- China
- Prior art keywords
- classifying rules
- data
- data item
- rules
- sample
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of network inbreak detection method based on learning rules collection, first to the number of network connections Data preprocess in international standard data set KDDCup99, then the data item of redundancy is removed using improved FOIL algorithm and extracts classifying rules, the classification to network connection test data is finally realized according to classifying rules, judges whether the network connection is attack connection and specific attack type.Method in the present invention chooses the network connection data in KDDCup99 and carries out experimental verification, and for data set the characteristics of makes improvement to original FOIL algorithm, it is made to be more suitable for standard data set.It is extracted and the efficiency of network connection test data classification the experimental results showed that improved algorithm effectively increases classifying rules, the accuracy of testing result also has a certain upgrade, and it is low to effectively prevent traditional intruding detection system classification effectiveness, the high defect of rate of false alarm.
Description
Technical field
This method is related to Network Intrusion Detection System field more particularly to a kind of network intrusions inspection based on learning rules collection
Survey method.
Background technique
Important supplement of the intruding detection system as firewall can be collected in the case where not influencing network system performance
With several key point informations in analytical calculation machine network or computer system, finds whether to have in network or system and be invaded
Sign, to complete protection to network system, it plays a very important role network system security.
Intrusion Detection Technique based on data mining technology has become the hot spot of research, has generated many achievements both at home and abroad,
But still generally existing some problems: based on the intrusion detection method of data mining in Detection accuracy, false alarm rate and real-time side
Face needs to be further increased.The close-fitting model of especially needed data digging technology and intrusion detection improves invasion
The accuracy and timeliness of detection.
Summary of the invention
The present invention is directed to the defect that traditional intruding detection system classification effectiveness is low, rate of false alarm is high, proposes a kind of based on
The network inbreak detection method for practising rule set handles network connection data by using improved FOIL algorithm, improves invasion
The timeliness and accuracy rate of detection.By testing on KDDCup99 experimental data set, original FOIL algorithm is compared, after improvement
Algorithm be applied to intrusion detection in have certain feasibility.
Learning rules set algorithm is applied in intrusion detection method, it is mainly a kind of centered on data mining and processing
Viewpoint, for network connection data acquisition process process not within the scope of consideration of the invention.Zhong Yi of the present invention is international
For standard network connects data set KDDCup99, invasion network connection is divided using the thought of data mining as theoretical foundation
Class.
Technical solution of the present invention:
A kind of network inbreak detection method based on learning rules collection, method includes the following steps:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set number by step 1
According to then being pre-processed to each data item in training set and test set, with each of specialization network connection data
Data item.
Step 2, using improved FOIL algorithm, that is, learning rules set algorithm, remove the attribute of the redundancy in training set data
Data item is trained remaining each attribute data item, extracts classifying rules, obtained classifying rules is stored in classification gauge
Then in library.
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to it is matched not
The case where covering sample in training set with classifying rules calculates separately the bat of every classifying rules, according to classification gauge
Different connection types add up the accuracy of classifying rules respectively in then.
Step 4, the accuracy for saving different connection types in step 3, finding out the maximum connection type of accuracy is this
Item is connected to the network the classification results of test data, as final detection result;Data in test set after classification obtains result,
The test set data correctly classified are added in training set data together with testing result, the instruction as subsequent extracted classifying rules
Practice collection data source, so that classifying rules can dynamically update, to adapt to the variation of heterogeneous networks connection.
Wherein, data set described in step 1 pretreatment the following steps are included:
1.1st step is made in KDDCup99 data set 60% network connection data using the method for cross validation
For training set, remaining 40% network connection data is as test set.
1.2nd step adds sequential parameter, specialization network connection data for each data item in every network connection data
In each data item, enhance the discrimination of data;KDDCup99 data are concentrated with many identical data item, such as a net
Network, which connects in data, has multiple " 0 ", and every column data has specific meaning.Original FOIL algorithm connects in one network of processing
When connecing data, same data item can be considered as to same data, therefore can shadow using original FOIL algorithm process data set
Ring the accuracy of the speed and classification results of extracting classifying rules.To make up this defect, need be in data preprocessing phase
The data item of each column adds sequential parameter.The identical data in every network connection can be distinguished in this way, also can guarantee data
Concrete meaning.
Being concentrated redundant data item using improved FOIL algorithm removal training data described in step 2 and extracted classifying rules needs
To pass through following steps:
Pretreated training set data is divided into positive example and negative example two according to the difference of network connection type by the 2.1st step
Major class simultaneously counts the attribute data item in positive example set.The all-network connection type in training set is counted, class will be wherein connected
A kind of identical network connection data of type is classified as positive example, and the data of every other connection type are classified as negative example, counts positive example collection
Different attribute data item in positive example set is added to positive example attribute data item set Vset by the different attribute data item in conjunction
In, the former piece of classifying rules r is set to sky.
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal are not met
The data item of the redundancy of restrictive condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r
New classifying rules r' is obtained in part.The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents sample in positive example set and negative example set in formula
Quantity, P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example collection
The quantity of the sample covered in conjunction and negative example set.At this point, the gain of all properties data item is calculated and compares, by gain maximum
Attribute data item be added in the former piece of classifying rules r.
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example collection in formula
The quantity of the sample covered in conjunction, P*And N*It then respectively represents new after the former piece that attribute data item v is added to classifying rules r
The quantity for the sample that classifying rules r' is covered in positive example set and negative example set.At this point, being intended to the maximum attribute data of gain
Item, which is added in classifying rules r former piece, needs to meet following restrictive condition: the maximum attribute data item of gain is added to classification gauge
New classifying rules r' is then obtained after r former piece will cover less sample, i.e. N in negative example set*< N;If gain is maximum
Attribute data item obtains new classifying rules r' and covers the sample of negative example set not becoming after being added to classifying rules r former piece
Change, and the maximum attribute data item of gain is identical as a certain item attribute value in classifying rules r former piece, thinks the data item
It is redundancy, which can be deleted from positive example attribute data item set Vset, then in remaining attribute data item
The middle maximum attribute data item of lookup gain is added in classifying rules r former piece according to above-mentioned requirements with specialization classifying rules.
2.3rd step saves classifying rules r' in the 2.2nd step, deletes in negative example set institute either with or without being classified regular r'
The sample of covering.All samples in negative example set are traversed, by the samples not comprising classifying rules r' former piece all in negative example set
Example is deleted.If all samples are deleted in negative example set, classifying rules r' can be used as a classifying rules;If negative
It is not deleted in example set there are also sample, then should count other attribute datas in the positive example comprising classifying rules r' former piece
?.Then it returns in the 2.2nd step, the maximum attribute data item of satisfactory gain is added in classifying rules r' former piece and is obtained
To new classifying rules R, be further continued for deleting all samples not covered by new classifying rules R in negative example set, repeat more than
Process is deleted until all samples in negative example set.
2.4th step saves classifying rules R (or r') obtained in the 2.3rd step, deletes in positive example set and all is classified rule
The then sample of R (or r') covering.All samples in negative example set are traversed, whether compare in sample one by one includes classifying rules
R (or r') former piece deletes all samples comprising classifying rules R (or r') former piece.If all samples in positive example set
All classifying rules extraction for being deleted then the type finishes;If there is sample remaining in positive example set, remaining sample is counted
All properties data item in example returns to all steps before the 2.2nd step repeats, until sample all in positive example set is equal
It is deleted, the classifying rules extraction of the type finishes.The classifying rules of every deletion positive example can be used as the type sample point
The classifying rules of class, the consequent of these classifying rules are the network connection type of the type sample, and classifying rules storage, which is arrived, to divide
In rule-like library.
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until all types sample
Classifying rules is all found, and is terminated by the process that training set extracts classifying rules.
The matched classifying rules bat of calculating described in step 3 need to pass through following steps:
3.1st step, read test collection data, by test set every network connection data with it is every in classifying rules library
Classifying rules compares, the classifying rules that record matching arrives.Every network connection data has many in KDDCup99 data set
Data item, extracting in step 2 may include several attribute data items in the former piece of numerous classifying rules, in test set
When the network connection data of every UNKNOWN TYPE is classified according to the classifying rules extracted, a plurality of classification gauge may be matched to
Then, all matched classifying rules are recorded.
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the unknown class
The network connection data of type is the connection type in these classifying rules consequents;If matched classifying rules consequent not phase
Together, then the bat of these classifying rules, classifying rules R are calculated separatelyiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising dividing in training set
Rule-like RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classification in the sample of consequent
Regular RiThe sample quantity of former piece.After obtaining every matched classifying rules bat, these bats are pressed
It adds up respectively according to connection type, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy
Accuracy(ti)。
Classification results are obtained from the accuracy of the different connection types of matching rule described in step 4 and are added into training set
Add sorted test data that need to pass through following steps:
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3, compare to obtain accuracy
Maximum connection type is the final classification result of the network connection test data.
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation
The characteristic of dynamic change, primary training gained classifying rules possibly can not adapt to the network data constantly changed, in the method
Sorted test data is added to training set together with corresponding classification results to train again, generates new classifying rules simultaneously
Update classifying rules library.
The invention has the following advantages that
The present invention is divided into training by taking KDDCup99 international standard data set as an example, first, in accordance with the method for crosscheck
Collection and test set add sequential parameter to 41 attribute data items in training set and test set.Then pass through improved FOIL
Algorithm removes the data item of redundancy in training set and classifying rules is extracted in training.Finally by Laplce's accuracy estimation formulas
The bat that matching test concentrates the network connection data classifying rules of UNKNOWN TYPE is calculated, most by the comparison of accuracy
Eventually obtain classification results, while by test set data and corresponding classification results be added in training set and instructed with real-time update
Practice collection data, generate new classifying rules, makes this method that there is good adaptivity and self-learning property.The invention, which uses, to be changed
Into FOIL algorithm, a large amount of when efficiently avoiding original FOIL algorithm process KDDCup99 data set repeat traversal and meter
It calculates, reduces the time complexity of algorithm, greatly accelerate the efficiency for extracting classifying rules and classification, improve number of network connections
According to the accuracy of classification and Detection result, the characteristic of adaptivity and self study is but also this method has stronger stability.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the network inbreak detection method of learning rules collection.
Specific embodiment
Specific embodiments of the present invention will be described in further detail with reference to the accompanying drawing.
Learning rules set algorithm is applied in intrusion detection method, mainly a kind of data-centered viewpoint is right
In network connection data acquisition process process not within the scope of consideration of the invention.The present invention is connected to the network with international standard
For data set KDDCup99, classify using data mining thought as theoretical foundation to network connection data.
Fig. 1 has carried out detailed step explanation to a kind of network inbreak detection method based on learning rules collection.The present invention
The method of offer the following steps are included:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set number by step 1
According to then being pre-processed each data item in training set and test set with every number in specialization network connection data
According to item.
1.1st step is made in KDDCup99 data set 60% network connection data using the method for cross validation
For training set, remaining 40% network connection data is as test set.10 nets will be randomly selected in KDDCup99 data set
Network connection data are classified as one group, then arbitrarily chosen from every group wherein 6 be added to training set, remaining 4 data is added
To test set.
1.2nd step adds sequential parameter, specialization network connection data for each data item in every network connection data
In each data item, enhance the discrimination of data;KDDCup99 data are concentrated with many identical data item, such as a net
Network, which connects, has multiple " 0 " or " 1 " in data, every column data has specific meaning, and original FOIL algorithm is in processing one
It is regarded as same data when same data item in network connection data, therefore uses original FOIL algorithm process number
It will affect the speed and the accuracy of classification results for extracting classifying rules according to collection.To make up this defect, need to locate in advance in data
The reason stage is that the data item of each column adds sequential parameter place, i.e. column information where data item, each data in data set
Item possible constructions body DataItem { int place, string data } expression, such as such net randomly selected
Network connects data:
The network connection data that table 1 randomly selects
0 | tcp | http | SF | 279 | 1129 | ...... | 0 | 0 | normal |
Data preprocessing phase is that data item addition sequential parameter is as follows: (1,0), (2, tcp), and (3, http), (4, SF),
(5,279) ..., (40,0), (41,0).The location information for not only saving data is handled in this way, thereby ensures that data represent
Meaning will not obscure, and ergodic data is only needed during subsequent extracted classifying rules to concentrate the number in respective column
According to, greatly reduce ergodic data amount, significantly improve extract classifying rules efficiency.
Step 2, the data item that redundancy in training set is removed using improved FOIL algorithm, to remaining network connection data
It is trained, extracts classifying rules, and classifying rules is stored in classifying rules library.
Following explanation is carried out first:
Network connection: each row of data in international standard data set KDDCup99 is a network connection, every connection
There are 42 data, wherein first 41 are the attributes being connected to the network, the last one is the connection type of the network connection, connects class
The network connection that type is normal is normally to be connected to the network, remaining connection type is attack type.
Sample: the training set selected from international standard data set is the set of the network connection data of a variety of connection types,
Every network connection data is a sample.
Attribute data item: data of each data item after adding sequential parameter in sample are known as attribute data item,
It include sequential parameter and data item in this position in attribute data item.
Classifying rules: classifying rules is made of classifying rules former piece and classifying rules consequent, if classifying rules former piece by
Dry item includes the data item composition of attribute data item, and classifying rules consequent is connection type.
Covering: including that data item all in classifying rules former piece then claims the sample energy in numerous data item of sample
It is covered by this classifying rules.
Gain: FOIL algorithm needs to select attribute data item with specialization classifying rules during extracting classifying rules
Former piece, gain are the differences of attribute data item information coding before and after being added to classifying rules former piece.The bigger attribute number of gain
Bigger according to contribution of the item to information coding reduction, alternatively attribute data item is added to classification gauge to the gain of FOIL algorithms selection
The then major criterion of former piece.
Pretreated training set data is divided into two major classes according to the difference of network connection type and counted by the 2.1st step
Attribute data item in positive example set;The all-network connection type in training set is counted, it will wherein connection type identical one
Kind network connection data is classified as positive example, and the data of every other connection type are classified as negative example, counts the difference in positive example set
Different attribute data item in positive example set is added in positive example attribute data item set Vset, will classify by attribute data item
Regular r former piece is set to sky.
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal are not met
The data item of the redundancy of restrictive condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r
New classifying rules r' is obtained in part;The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents sample in positive example set and negative example set in formula
Quantity, P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example collection
The quantity of the sample covered in conjunction and negative example set.At this point, being increased in the gain for calculating and comparing all properties data item
The maximum attribute data item of benefit can be directly appended in classifying rules former piece.
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example collection in formula
The quantity of the sample covered in conjunction, P*And N*It then respectively represents new after the former piece that attribute data item v is added to classifying rules r
The quantity for the sample that classifying rules r' is covered in positive example set and negative example set.At this point, being intended to the maximum attribute data of gain
Item, which is added in classifying rules r former piece, needs to meet following restrictive condition: the maximum attribute data item of gain is added to classification gauge
New classifying rules r' is then obtained after r former piece will cover less sample, i.e. N in negative example set*< N;If gain is maximum
Attribute data item obtains new classifying rules r' and covers the sample of negative example set not becoming after being added to classifying rules r former piece
Change, and the maximum attribute data item of gain is identical as the attribute value of a certain item in classifying rules r former piece, thinks the data
Item is redundancy, which can be deleted from positive example attribute data item set Vset, then in remaining attribute data
The maximum attribute data item of gain is searched in be added in classifying rules r former piece according to above-mentioned requirements with specialization classifying rules.
The attribute data item of redundancy is removed in this step, and mainly consider may be by the category of redundancy when training set is smaller
Property data item be added in classifying rules former piece, the attribute data item of redundancy also will affect classification to no any contribution of classifying
Accuracy.Example below such as:
Table 2 illustrates to remove 8 datas that the citing of redundant attributes data item is chosen
F1 | F2 | F3 | F4 | Type | ||
... | 0 | 0 | 0 | 0 | ... | normal |
... | 0 | 0 | 1 | 0 | ... | land |
... | 0 | 1 | 0 | 0 | ... | ipsweep |
... | 0 | 1 | 1 | 0 | ... | normal |
... | 1 | 0 | 0 | 1 | ... | teardrop |
... | 1 | 0 | 1 | 1 | ... | normal |
... | 1 | 1 | 0 | 1 | ... | normal |
... | 1 | 1 | 1 | 1 | ... | back |
For this 8 samples, connection type is that normal (positive example) is identical with the sample quantity of non-normal (negative example), often
The value and quantity of Column Properties value are also identical, and wherein all values of attribute F4 are identical as F1, therefore F4 is the attribute of redundancy.?
When extracting the classifying rules for the network connection that connection type is normal, since connection type is each category in the sample of normal
Property data item gain it is all the same, if for the first time attribute data item (F1,0) is added in classifying rules former piece, Jin Jinyi
Second attribute data item to be selected is distinguished by the gain of attribute data item to draw redundant attributes data item (F4,0)
Into in classifying rules former piece, this cannot have any positive contribution to subsequent classification.By adding restrictive condition N*< N can be protected
Demonstrate,prove the attribute data item that the maximum attribute data item of gain chosen every time is not centainly redundancy.
2.3rd step saves classifying rules r ' in the 2.2nd step, deletes in negative example set institute either with or without being classified regular r '
The sample of covering;All samples in negative example set are traversed, the samples not comprising classifying rules r ' all in negative example set are deleted
It removes;If all samples are deleted in negative example set, classifying rules r ' can be used as a classifying rules;If negative example collection
It is not deleted in conjunction there are also sample, then should count other attribute data items in the positive example comprising classifying rules r ' former piece, so
It returns in the 2.2nd step afterwards, the maximum attribute data item of satisfactory gain is added in classifying rules r ' former piece and is obtained newly
Classifying rules R, then proceed to delete in negative example set it is all not by new classifying rules R cover samples, repeat above procedure
Until all samples in negative example set are deleted;
2.4th step saves classifying rules R (or r ') obtained in the 2.3rd step, deletes in positive example set and all is classified rule
The then sample of R (or r ') covering;All samples in positive example set are traversed, whether compare in sample one by one includes classifying rules
R (or r ') former piece deletes all samples comprising classifying rules R (or r ') former piece;If all samples in positive example set
All classifying rules extraction for being deleted then the type finishes;If there is sample remaining in positive example set, remaining sample is counted
All properties data item in example returns to all steps before the 2.2nd step repeats, until sample all in positive example set is equal
It is deleted, the classifying rules extraction of the type finishes;The classifying rules of every deletion positive example can be used as the type sample point
The classifying rules of class, the consequent of these classifying rules are the network connection type of the type sample, and classifying rules storage, which is arrived, to divide
In rule-like library;
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until all types sample
Classifying rules is all found, and is terminated by the process that training set extracts classifying rules.
It gives one example below and the above process is illustrated.Following data are randomly selected from KDDCup99:
5 datas randomly selected in 3 KDDCup99 data set of table
0 | tcp | http | SF | 54540 | 8314 | …… | 0.04 | 0.04 | back |
14 | tcp | http | RSTR | 33580 | 7300 | …… | 1 | 1 | back |
0 | icmp | eco_i | SF | 18 | 0 | …… | 0 | 0 | ipsweep |
0 | tcp | http | SF | 321 | 480 | …… | 1 | 1 | normal |
0 | tcp | http | SF | 277 | 3410 | …… | 0 | 0 | normal |
Using this 5 samples as training set sample, after adding sequential parameter, the sample that connection type is back is classified as just
Example, other all types of samples are negative example, train the classifying rules of positive example first, different attribute data item group in positive example
At attribute data item set Vset (back): (1,0), (2, tcp), (3, http), (4, SF), (5,54540), (6,
8314), (40,0.04), (41,0.04), (1,14), (4, RSTR), (5,33580), (6,7300), (40,1), (41,1) },
Classifying rules r former piece is set to sky, the gain of each attribute data item is calculated, obtains (5,54540), (6,8314), (40,
0.04) yield value of, (41,0.04), (1,14), (4, RSTR), (5,33580), (6,7300), (40,1), (41,1) is maximum,
At this time if (40,1) are added in the former piece of classifying rules r, delete and be not classified regular r:{ (40,1) in negative example →
The sample of back former piece covering, the sample in negative example set can not be erased entirely.
The increasing of all properties data item that connection type is back and is classified in the sample of regular r covering is calculated again
Benefit should be by (41,1) this attribute data item as redundant data entry deletion, because it cannot if the gain of (41,1) is maximum
Satisfaction obtains this limitation of less sample in the new negative example set of classifying rules r ' covering after being added to classifying rules r former piece
Condition, and the attribute value of the attribute data item is identical with the attribute value of attribute data item in classifying rules r.Increasing is calculated
The maximum attribute data item of benefit is: (1,14), (4, RSTR), (5,33580), (6,7300) therefrom select an attribute data
Item is added in classifying rules former piece, obtains the classifying rules that connection type is back are as follows: r:{ (40,1), (1,14) } →
Back deletes the samples for not being classified regular r former piece covering all in negative example, and negative examples all at this time is deleted, and is found
One classifying rules of positive example.The sample that regular r former piece covering is classified in positive example is deleted, positive example is not erased entirely, is needed
Other classifying rules of positive example: { (40,0.04) } → back are generated according still further to above step.So far, all positive examples classification
Rule is found and is finished, then using the sample of other connection types as positive example, generates corresponding classifying rules.Finally obtain following point
Rule-like: { (40,1), (1,14) } → back;{ (40,0.04) } → back;{ (2, icmp) } → ipsweep;{ (5,321) }
→normal;{ (6,3410) } → normal;These classifying rules are stored in classifying rules library.
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to it is matched not
The case where covering sample in training set with classifying rules calculates separately the bat of every classifying rules, according to classification gauge
Different connection types in then add up the accuracy of classifying rules respectively.
3.1st step, read test collection data, by every network connection data in test set with every in classifying rules library
Classifying rules compares, the classifying rules that record matching arrives.Every network connection data has many numbers in KDDCup99 data set
According to item, being extracted in the former piece of numerous classifying rules in step 2 may include several attribute data items, every in test set
When the network connection data of UNKNOWN TYPE is classified according to the classifying rules extracted, it may be matched to a plurality of classifying rules,
Record all matched classifying rules.
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the unknown class
The network connection data of type is the connection type in these classifying rules consequents;If matched classifying rules consequent not phase
Together, then the bat of these classifying rules, classifying rules R are calculated separatelyiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising dividing in training set
Rule-like RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classification in the sample of consequent
Regular RiThe sample quantity of former piece.After obtaining every matched classifying rules bat, these bats are pressed
It adds up respectively according to connection type, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy
Accuracy(ti)。
Step 4, the accuracy for saving different connection types in step 3 compare and find out accuracy maximum connection type and be
It is connected to the network the classification results of test;Simultaneously to make this method have good self-learning property, the data of test set are in basis
After classifying rules classification obtains corresponding result, test set data are added in training set data together with corresponding classification results, are
The extraction of subsequent classification rule provides new training set data source, guarantees that the dynamic of classifying rules updates.
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3, compare to obtain accuracy
Maximum connection type is the final classification result of the network connection test data.
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation
Dynamic characteristic, primary training gained classifying rules possibly can not adapt to the network data constantly changed, in the method will point
Test data after class is added to training set together with corresponding classification results and trains again, generates new classifying rules and updates
Classifying rules library.
In order to show the process of step 3 and step 4, from KDDCup99 data set connection type be back, ipsweep,
Selection one is as follows in the network connection data of normal:
Connection type is the data chosen in the data of three of the above in 4 KDDCup99 data set of table
14 | tcp | http | SF | 321 | 3410 | ...... | 1 | 0 | normal |
This data is matched with the classifying rules in classifying rules library, matches this network connection test data
Classifying rules has 3: { (5,321) } → normal;{ (6,3410) } → normal;{ (40,1), (1,14) } → back.Due to
The consequent for 3 classifying rules being matched to is not identical, needs to calculate separately the corresponding two kinds of connection types of three classifying rules
Accuracy.The biggish company of accuracy
Connecing type is normal, i.e., it is normal that this network connection test data, which obtains classification results, as normal network connection.
It is applied to the property of Network Intrusion Detection System in order to verify improved FOIL algorithm compared to original FOIL algorithm
Can, we carry out following confirmatory experiment.Experimental situation a: PC machine.CPU model Inter Core i7-4770 3.4GHz,
Memory 8G, 1T hard disk, has the software environment of Visual Studio 2013.Experimental data: according in KDDCup99 data set
The different proportion of network connection type, therefrom randomly selects, and guarantees that the taken data volume of every kind of connection type is no more than 50
Item chooses 2150 altogether, then using the method for crosscheck, chooses therein 60% and is used as training set data, and in addition 40%
As test set data, 5 experiments are carried out to the FOIL algorithm for improving front and back, experimental result is as shown in table 5.By consulting profession
Paper information obtains the bat of current network intrusion detection related algorithm, and comparing result is as shown in table 6.
Table 5 adopts international standards data set KDDCup99 to FOIL proof of algorithm Comparative result before and after improving
6 current network intrusion detection related algorithm of table is to the classification bat comparison of KDDCup99 network connection data
The results showed that there is very aspect between intrusion detection method of the invention compares original FOIL algorithm when being executed
It is big to improve, there is preferable performance in terms of the bat of classification results compared with other algorithms.
Claims (5)
1. a kind of network inbreak detection method based on learning rules collection, it is characterised in that method includes the following steps:
Network connection data selected from international standard data set KDDCup99 is divided into training set and test set data by step 1, so
Each data item in training set and test set is pre-processed afterwards, with each data in specialization network connection data
?;
Step 2, using improved FOIL algorithm, that is, learning rules set algorithm, remove the attribute data of redundancy in training set data
, remaining each attribute data item is trained, classifying rules is extracted, by obtained classifying rules storage to classifying rules library
In;
Network connection data in step 3, test set matches classifying rules in classifying rules library one by one, according to matched difference
Classifying rules covers the bat for the case where sample calculating separately every classifying rules in training set, according to classifying rules
Middle difference connection type adds up the bat of classifying rules respectively;
Step 4, the accuracy for saving different connection types in step 3, finding out the maximum connection type of accuracy is this net
The classification results of network connecting test data, as final detection result;Data in test set, will just after classification obtains result
The test set data really classified are added in training set data together with testing result, the training set as subsequent extracted classifying rules
Data source, so that classifying rules can dynamically update, to adapt to the variation of heterogeneous networks connection.
2. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 1 institute
State is to the pretreated method of data set:
1.1st step, using cross validation method using in KDDCup99 data set 60% network connection data as instruction
Practice collection, remaining 40% network connection data is as test set;
1.2nd step adds sequential parameter for each data item in every network connection data, in specialization network connection data
Each data item enhances the discrimination of data, guarantees the concrete meaning of each data item.
3. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 2 institute
It states using the data item of redundancy in improved FOIL algorithm removal training set and the method for extracting classifying rules is:
Pretreated training set data is divided into positive example and negative example two major classes according to the difference of network connection type by the 2.1st step
And count the attribute data item in positive example set;The all-network connection type in training set is counted, it will wherein connection type phase
A kind of same network connection data is classified as positive example, and the data of every other connection type are classified as negative example, will be in positive example set
Different attribute data item is added in positive example attribute data item set Vset, and classifying rules r former piece is set to sky;
2.2nd step, the gain for calculating each attribute data item v in positive example attribute data item set Vset, removal do not meet limitation
The attribute data item of the redundancy of condition, before the maximum attribute data item of the gain for meeting restrictive condition is added to classifying rules r
New classifying rules r' is obtained in part;The gain calculation formula of attribute data item v is as follows:
When the former piece of classifying rules r is empty, P and N respectively represents the quantity of sample in positive example set and negative example set in formula,
P*And N*New classifying rules r' after the former piece that attribute data item v is added to classifying rules r is respectively represented in positive example set and negative
The quantity of the sample covered in example set;At this point, the gain of all properties data item is calculated and compares, by the maximum attribute of gain
Data item is added in the former piece of classifying rules r;
When the former piece non-empty of classifying rules r, P and N respectively represents classifying rules r in positive example set and negative example set in formula
The quantity of the sample of covering, P*And N*Then respectively represent new classification after the former piece that attribute data item v is added to classifying rules r
The quantity for the sample that regular r' is covered in positive example set and negative example set;At this point, being intended to the maximum attribute data Xiang Tian of gain
Be added in the former piece of classifying rules r and need to meet following restrictive condition: the maximum attribute data item of gain is added to classifying rules r
New classifying rules r' is obtained after former piece will cover less sample, i.e. N in negative example set*< N;If the maximum attribute of gain
Data item obtains new classifying rules r' and covers the sample of negative example set not changing after being added to classifying rules r former piece, and
And the maximum attribute data item of gain is identical as a certain item attribute value in classifying rules r former piece, thinks that the data item is redundancy
, which can be deleted from positive example attribute data item set Vset, then be searched in remaining attribute data item
The maximum attribute data item of gain is added in the former piece of classifying rules r according to above-mentioned requirements with specialization classifying rules;
Classifying rules r' in 2.3rd step, the 2.2nd step of preservation deletes institute in negative example set and covers either with or without regular r' is classified
Sample;All samples in negative example set are traversed, the samples not comprising classifying rules r' former piece all in negative example set are deleted
It removes;If all samples are deleted in negative example set, classifying rules r' can be used as a classifying rules, classifying rules r'
Consequent be network connection type in positive example;If be not deleted in negative example set there are also sample, it should count and include
Other attribute data items, then return in the 2.2nd step in the positive example of classifying rules r' former piece, and satisfactory gain is maximum
Attribute data item be added in classifying rules r' former piece and obtain new classifying rules R, be further continued for deleting in negative example set and own
The sample not covered by the former piece of new classifying rules R repeats above procedure until all samples in negative example set are deleted;
2.4th step, save the 2.3rd step obtained in classifying rules R or r', delete positive example set in it is all be classified regular R or
The sample of r' former piece covering;Traverse all samples in positive example set, compare in sample one by one whether include classifying rules R or
R' former piece deletes all samples comprising classifying rules R or r' former piece;If all samples are deleted in positive example set
Then all classifying rules extraction of the type finishes;If there is sample remaining in positive example set, the institute in remaining sample is counted
There is an attribute data item, returns to all steps before the 2.2nd step repeats, until sample all in positive example set is deleted,
The classifying rules extraction of the type finishes;The classifying rules of every deletion positive example can be used as point of the network connection of the type
Rule-like, the consequent of these classifying rules are the network connection type of the type sample, classifying rules storage to classifying rules
In library;
2.5th step returns to the 2.1st step, the extraction of the second class sample classifying rules is carried out, until the classification of all types sample
Rule is all found, and is terminated by the process that training set extracts classifying rules.
4. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 3 institute
The method of the bat for the matched different classifications rule of calculating stated is:
3.1st step, read test collection data, by every network connection data in test set and every point in classifying rules library
Rule-like compares, the classifying rules that record matching arrives;Every network connection data has many data in KDDCup99 data set
, being extracted in the former piece of numerous classifying rules in step 2 may include several attribute data items, and every in test set
When the network connection data of UNKNOWN TYPE is classified according to the classifying rules extracted, it may be matched to a plurality of classifying rules, remembered
Record all matched classifying rules:
3.2nd step, for matched m articles of classifying rules, if the consequent of these classifying rules is all the same, the UNKNOWN TYPE
Network connection data is the connection type in these classifying rules consequents;If matched classifying rules consequent is not identical,
Calculate separately the bat of these classifying rules, classifying rules RiBat calculate as follows:
Wherein, k is the quantity of heterogeneous networks connection type of data connection in training set, and n is all comprising classification gauge in training set
Then RiThe quantity of the sample of former piece, e are that connection type is classifying rules R in training setiContain classifying rules in the sample of consequent
RiThe sample quantity of former piece;After obtaining every matched classifying rules bat, by these bats according to even
It connects type to add up respectively, obtains the corresponding connection type t of this network connection test dataiAccuracy:
It indicates that the s classifying rules consequent connection type in m item matching classifying rules is tiAccuracy Accuracy
(ti)。
5. the network inbreak detection method according to claim 1 based on learning rules collection, it is characterised in that: step 4 institute
It states, obtained classification results from the accuracy of the different connection types of matching classifying rules and is added into training set sorted
Test data method is:
4.1st step, the accuracy for saving the heterogeneous networks connection type being calculated in step 3 compare to obtain accuracy maximum
Connection type be the network connection test data final classification result;
4.2nd step, the dynamic update to guarantee this method self-learning property and classifying rules, it is contemplated that real network situation dynamic
The characteristic of variation, once training gained classifying rules possibly can not adapt to the network data constantly changed, will divide in the method
Test data after class is added to training set together with corresponding classification results and trains again, generates new classifying rules and updates
Classifying rules library.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811122445.1A CN109286622B (en) | 2018-09-26 | 2018-09-26 | Network intrusion detection method based on learning rule set |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811122445.1A CN109286622B (en) | 2018-09-26 | 2018-09-26 | Network intrusion detection method based on learning rule set |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109286622A true CN109286622A (en) | 2019-01-29 |
CN109286622B CN109286622B (en) | 2021-04-20 |
Family
ID=65182225
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811122445.1A Active CN109286622B (en) | 2018-09-26 | 2018-09-26 | Network intrusion detection method based on learning rule set |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109286622B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708850A (en) * | 2020-07-16 | 2020-09-25 | 国网北京市电力公司 | Processing method and device for power industry expansion metering rule base |
CN113342799A (en) * | 2021-08-09 | 2021-09-03 | 明品云(北京)数据科技有限公司 | Data correction method and system |
WO2023051228A1 (en) * | 2021-09-28 | 2023-04-06 | 阿里巴巴(中国)有限公司 | Method and apparatus for sample data processing, and device and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120096551A1 (en) * | 2010-10-13 | 2012-04-19 | National Taiwan University Of Science And Technology | Intrusion detecting system and method for establishing classifying rules thereof |
CN105204487A (en) * | 2014-12-26 | 2015-12-30 | 北京邮电大学 | Intrusion detection method and intrusion detection system for industrial control system based on communication model |
US9230102B2 (en) * | 2012-04-26 | 2016-01-05 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting traffic flooding attack and conducting in-depth analysis using data mining |
CN105306475A (en) * | 2015-11-05 | 2016-02-03 | 天津理工大学 | Network intrusion detection method based on association rule classification |
CN107835201A (en) * | 2017-12-14 | 2018-03-23 | 华中师范大学 | Network attack detecting method and device |
-
2018
- 2018-09-26 CN CN201811122445.1A patent/CN109286622B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120096551A1 (en) * | 2010-10-13 | 2012-04-19 | National Taiwan University Of Science And Technology | Intrusion detecting system and method for establishing classifying rules thereof |
US9230102B2 (en) * | 2012-04-26 | 2016-01-05 | Electronics And Telecommunications Research Institute | Apparatus and method for detecting traffic flooding attack and conducting in-depth analysis using data mining |
CN105204487A (en) * | 2014-12-26 | 2015-12-30 | 北京邮电大学 | Intrusion detection method and intrusion detection system for industrial control system based on communication model |
CN105306475A (en) * | 2015-11-05 | 2016-02-03 | 天津理工大学 | Network intrusion detection method based on association rule classification |
CN107835201A (en) * | 2017-12-14 | 2018-03-23 | 华中师范大学 | Network attack detecting method and device |
Non-Patent Citations (3)
Title |
---|
J. R. QUINLAN: ""FOIL: A Midterm Report"", 《BASSER DEPARTMENT OF COMPUTER SCIENCE UNIVER.SITY OF SYDNEY》 * |
郭建龙: ""应用机器学习制定的入侵检测专家***规则集"", 《计算机工程》 * |
陈志雄: ""基于信息增益的中文文本关联分类"", 《中文信息学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708850A (en) * | 2020-07-16 | 2020-09-25 | 国网北京市电力公司 | Processing method and device for power industry expansion metering rule base |
CN113342799A (en) * | 2021-08-09 | 2021-09-03 | 明品云(北京)数据科技有限公司 | Data correction method and system |
WO2023051228A1 (en) * | 2021-09-28 | 2023-04-06 | 阿里巴巴(中国)有限公司 | Method and apparatus for sample data processing, and device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109286622B (en) | 2021-04-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105306475B (en) | A kind of network inbreak detection method based on Classification of Association Rules | |
CN110457404B (en) | Social media account classification method based on complex heterogeneous network | |
CN107835087B (en) | Automatic extraction method of alarm rule of safety equipment based on frequent pattern mining | |
CN105550583B (en) | Android platform malicious application detection method based on random forest classification method | |
CN104660594B (en) | A kind of virtual malicious node and its Network Recognition method towards social networks | |
CN104281674B (en) | It is a kind of based on the adaptive clustering scheme and system that gather coefficient | |
US20210026909A1 (en) | System and method for identifying contacts of a target user in a social network | |
CN107992746A (en) | Malicious act method for digging and device | |
CN103412888B (en) | A kind of point of interest recognition methods and device | |
CN104699755B (en) | A kind of intelligent multiple target integrated recognition method based on data mining | |
CN103970733B (en) | A kind of Chinese new word identification method based on graph structure | |
CN109286622A (en) | A kind of network inbreak detection method based on learning rules collection | |
CN106228398A (en) | Specific user's digging system based on C4.5 decision Tree algorithms and method thereof | |
CN108897842A (en) | Computer readable storage medium and computer system | |
CN101582817A (en) | Method for extracting network interactive behavioral pattern and analyzing similarity | |
CN103927398A (en) | Microblog hype group discovering method based on maximum frequent item set mining | |
CN103886030B (en) | Cost-sensitive decision-making tree based physical information fusion system data classification method | |
CN107679135A (en) | The topic detection of network-oriented text big data and tracking, device | |
CN108268460A (en) | A kind of method for automatically selecting optimal models based on big data | |
CN113011889A (en) | Account abnormity identification method, system, device, equipment and medium | |
CN106650446A (en) | Identification method and system of malicious program behavior, based on system call | |
CN106603538A (en) | Invasion detection method and system | |
CN116910283A (en) | Graph storage method and system for network behavior data | |
CN107832611B (en) | Zombie program detection and classification method combining dynamic and static characteristics | |
CN117807245A (en) | Node characteristic extraction method and similar node searching method in network asset map |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |