CN109462610A

CN109462610A - A kind of network inbreak detection method based on Active Learning and transfer learning

Info

Publication number: CN109462610A
Application number: CN201811582916.7A
Authority: CN
Inventors: 李静梅; 吴伟飞; 汪家祥
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-03-12

Abstract

The invention belongs to network safety fileds, and in particular to a kind of network inbreak detection method based on Active Learning and transfer learning, comprising the following steps: given source domain intrusion detection sample set S_aAnd aiming field intrusion detection sample set S_b；Using Active Learning thought, source domain S is calculated_aIn each sample weight, query function is according to weight by source domain S_aWith aiming field S_bThe biggish sample of similitude is marked；Base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tWith Unlabeled data collection S, several Weak Classifier models are obtained；According to the different weight of Weak Classifier, combination obtains strong classifier.The detection that the method for the present invention is especially the U2R and R2L less to sample size has many promotions relative to benchmark algorithm；The Detection accuracy balance of each attack type is more preferable compared with benchmark algorithm；Also there is greater advantage in the time efficiency of algorithm detection.Therefore, the present invention has better application prospect in network invasion monitoring.

Description

A kind of network inbreak detection method based on Active Learning and transfer learning

Technical field

The invention belongs to network safety fileds, and in particular to a kind of to be examined based on the network intrusions of Active Learning and transfer learning Survey method.

Background technique

With the rapid development of network, no matter network plays increasingly in country's life or the daily routines of people Important role.Therefore, the importance of network security technology is also increasingly prominent.Current network security is faced with more and more diseases The challenge such as poison, system vulnerability and hacker attack.Wherein, how to identify that various attacks are a kind of protection network securitys Important technical.Intrusion detection as one of the core technology in network security, can find to carry out in time or The malicious attack behavior having occurred and that.It is a kind of network security of active using Intrusion Detection Technique as the intruding detection system of core Defense technique, it not only compensates for the deficiency of firewall but also attack can be effectively detected and propose that corresponding defence is arranged It applies.But traditional intruding detection system is only able to detect and has deposited there are problems, such as rate of false alarm and rate of failing to report height Attack and seem more and more unable to do what one wishes to the detection of the attack of novel attack and magnanimity.

In recent years along with the rise of machine learning, the intrusion detection method based on machine learning algorithm makes to network The intelligent measurement of attack is possibly realized, and the efficiency of intrusion detection is on the one hand improved compared with traditional intrusion detection method, separately On the one hand rate of failing to report and rate of false alarm are reduced.Therefore, the rise of machine learning is that the development of Intrusion Detection Technique specifies one New direction.Currently, although machine learning traditional in intrusion detection is using very extensive, wherein most of machine Learning algorithm can all regard several different attacks as attack, take single detection algorithm without go specifically to distinguish It is detected.The detection success rate that this will lead to every kind of attack is uneven, for example certain machine learning algorithm is trained The classifier arrived is relatively high to the attack detecting rate of a certain type, and another type of attack is difficult to detect, especially attacks against each other The attack type for hitting number of samples rareness, is often ignored.In addition, traditional machine learning algorithm usually requires satisfaction or less Two assumed conditions: (1) training sample and new test sample meet independent same distribution condition；(2) a large amount of training sample is needed The study of this ability obtains a good learning model.However the distribution of test data and training data is difficult to do in practical applications To consistent, furthermore some sample resources are very rare.For example, carrying out data classification in biology, a training sample is obtained Label generally require largely, for a long time, expensive experiment；In text classification field, it has been found that existing trained sample This far from establishes a reliable disaggregated model enough, and marks large volume document and generally require to employ by offering a high salary a large amount of expert, Cause the cost for obtaining mark training sample very high.To sum up, on the one hand people need a large amount of training sample to establish standard The high disaggregated model of true rate: on the other hand, it is almost impossible in many practical applications to obtain a large amount of training sample.

In order to solve sample scarcity problem, researcher proposes transfer learning, and this method is using existing existing knowledge To the new machine learning method of one kind that related fields problem is solved, its relax in conventional machines study two is basic It is assumed that the purpose is to use existing knowledge to solve in target domain and only have the problem of a small amount of sample is even without study.It moves It moves study and represents the later developing direction of machine learning, transfer learning is applied in intrusion detection, with other machine algorithms Compared to the cost that the knowledge that can be used in existing historical data saves gather data, the data point of training data and test data Cloth can not also be identical.In addition, transfer learning is also that effective solution attack type sample is rare and attack type detection is uneven Problem proposes a kind of new settling mode, therefore has very big advantage compared to traditional machine learning algorithm.

In conclusion a large amount of history attack sample solution on the basis of transfer learning, can be effectively utilized in the present invention Certainly intrusion detection attack type sample scarcity problem and attack type detect imbalance problem, in addition use reasonable Active Learning Policy selection marker samples can reduce the classifier training time.In short, the method for the present invention is compared to pervious intrusion detection side Method accuracy rate is higher, and rate of false alarm is lower and algorithm time efficiency is more advantageous.

Summary of the invention

The problem of for above-mentioned background technique, the method for the present invention propose a kind of based on Active Learning and transfer learning Intrusion detection method.

A kind of network inbreak detection method based on Active Learning and transfer learning, comprising the following steps:

(1) source domain intrusion detection sample set S is given_aAnd aiming field intrusion detection sample set S_b；

(2) Active Learning thought is used, source domain S is calculated_aIn each sample weight, query function is according to weight by source domain S_aWith aiming field S_bThe biggish sample of similitude is marked；

(3) base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tIt does not mark Remember data set S, obtains several Weak Classifier models；

(4) weight different according to Weak Classifier, combination obtain strong classifier.

The given source domain intrusion detection sample set S_aAnd aiming field intrusion detection sample set S_b, comprising:

Given a large amount of unlabelled source domain intrusion detection sample set S_a, a small amount of markd aiming field intrusion detection sample Set S_b, n S_aSample size, m S_bMiddle sample size；Training dataset T=S after merging_a∪S_b, x_i∈T。

It is described to use Active Learning thought, calculate source domain S_aIn each sample weight, query function is according to weight by source Domain S_aWith aiming field S_bThe biggish sample of similitude is marked, comprising:

(2.1) using Active Learning query strategy Q to training dataset S_aSelected marker is carried out, following formula is calculated:

S.t.0≤β≤1, i=1,2 ..., n

Wherein, n indicates source domain number of samples, K=K (x_i,x_j)=φ (x_i)^Tφ(x_j),

(2.2) according to the sample in the calculated result screening source domain in (2.1), source domain S is filtered_aIn with aiming field S_bDifference Biggish sample, to obtain new training dataset S'_a, source domain sample set S_aCollection after being marked is combined into S'_a, sample number Amount is n'；

(2.3) weight vectors are initializedWherein

The calling base classifier BA_SVM, according to the weight distribution p after merging on training dataset T and T^tIt does not mark Remember data set S, obtain several Weak Classifier models, comprising:

Setting

(3.1) p is set^tMeet

Wherein, w_iFor the weight vectors of i-th source domain sample；

(3.2) base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tNot Flag data collection S obtains a classifier on S

(3.3) it calculates in data set S_bOn error rate:

(3.4) τ is set_t=ε_t/(1-ε_t), classifier weight coefficient is

(3.5) it is as follows that new weight vectors are set:

N times iteration terminates.

The weight different according to Weak Classifier, combination obtain strong classifier, comprising:

After learning training by above 2 steps, several Weak Classifier models are formed, then by these weak typings It is as follows that device according to respective weight obtains final classifier:

Wherein, h_f(x) strong classifier to obtain；α_tFor each Weak Classifier h_t(x) weight.

The beneficial effects of the present invention are:

The detection that the method for the present invention is especially the U2R and R2L less to sample size has much relative to benchmark algorithm Promotion；The Detection accuracy balance of each attack type is more preferable compared with benchmark algorithm；In the time efficiency of algorithm detection Also there is greater advantage.Therefore, the present invention has better application prospect in network invasion monitoring.

Detailed description of the invention

Fig. 1 is the schematic diagram of the method for the present invention；

Fig. 2 is the process schematic that strong Study strategies and methods perform intrusion detection.

Specific embodiment

The present invention is described further with reference to the accompanying drawing.

The method of the present invention includes that Active Learning first is looked into using query function Q using selective Largest Mean deviation (MMD) Inquiry strategy carries out screening to the intrusion detection sample in source domain and selects, under selecting in the intrusion detection sample set not marked largely The sample of one mark is added to training data concentration；Then using the classifier (TrAdaBoost) with transfer learning ability Repetitive exercise is trained to the training dataset after screening, until meeting condition.The wherein basic classification in TrAdaBoost Device selects a kind of support vector machines (BA_SVM) based on bat algorithm, and BA_SVM can find the best parameter group of SVM, keep away Local optimum problem is exempted from.In the methods of the invention on the one hand Active Learning reduces the scale of intrusion detection sample in source domain； The MMD method of the query strategy of another aspect query function Q can filter out in source domain with the lesser sample of aiming field similarity, Help to solve the problems, such as negative transfer.This paper inventive method is compared to the rule of other traditional machine learning methods not only training set Mold shrinkage subtracts, and training effectiveness improves, and itself can effectively inhibit negative transfer.Therefore, context of methods not only have compared with Good detection speed, and improve the accuracy, real-time and balance of intrusion detection.

It elaborates below to the method for the present invention implementation.

It is as follows to related basic symbol and concept definition by being used in description for convenience of the description of problem:

Define 1 basic symbol:

(1) SD is set as source domain space, and TD is target domain space；

(2) Y is set as classifying space, two points of problem Y={ -1,1 } is only considered in text without loss of generality, for more classification problems Two classification problems can be extended；

(3) training intrusion detection data set

(4)Sample x is mapped on class label f (x) ∈ Y.

Define 2 test data sets:

Wherein

Define 3 training datasets:

Wherein

Wherein, f (x) is true class label；S_aIt is the auxiliary intrusion detection data in source domain；S_bIt is in target domain Intrusion detection data set, n and m are S respectively_aAnd S_bMiddle number of samples.S_aAnd S_bIt is collectively referred to as training dataset, so training data Collection can be defined as follows:

Wherein, S is concentrated in training data_aWith test data set S_bDistribution is different, but test data set S and S_bIt is distributed phase Together, be both P ((x, y) | x ∈ S_a)≠P((x,y)|x∈S)。

It so far, can be as follows by the transfer learning problem definition in inventive method: the intrusion detection data of a given very little Collection is used as target intrusion detection data set S_b, do not mark intrusion detection data set S largely_a, test intrusion detection data set S, mesh Mark is one classifier of training, can reduce the error in classification on S to the greatest extent, improves the predictablity rate of intrusion detection behavior.It asks Outputting and inputting for topic is as follows:

Input:

Two training dataset S_aAnd S_b, a test data set S；

One basic classification device BA_SVM.

Output:

Classifier.

Fig. 1 shows the schematic diagram of the method for the present invention, and method mainly includes query function Q and migration sorter model TrAdaBoost-BA_SVM, the input of receiving include source domain training dataset S_a, aiming field training dataset S_bAnd test data Collect S.

Related notion is as follows in this method specific implementation: a large amount of unlabelled source domain sample set S_a, a small amount of markd Aiming field sample set S_b, n is sample size S_a, m S_bMiddle sample size；T=S_a∪S_b, x_i∈ T, f (x_i) true class label. Unmarked test data set S, base a classifier BA_SVM, the number of iterations N.

Method implement specifically includes the following steps:

1. initialization

(1) using Active Learning query strategy Q to training dataset S_aSelected marker is carried out, following formula is calculated

S.t.0≤β≤1, i=1,2 ..., n

N indicates source domain number of samples, K=K (x in above-mentioned formula in above-mentioned formula_i,x_j)=φ (x_i)^Tφ(x_j),

(2) according to the sample in the calculated result screening source domain in (1), source domain S is filtered_aIn with aiming field S_bIt differs greatly Sample, to obtain new training dataset S'_a:

For j=1, n

if(β_i< 0)

Give up the sample

else

Retain sample, expert is transferred to be labeled

Source domain sample set S_aCollection after being marked is combined into S'_a, sample size n'.

(3) weight vectors are initializedWherein

2. core process

SettingInventive method core executive process is as follows:

For t=1 ..., N

(1) p is set^tMeet

(2) base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tIt does not mark Remember data set S, obtains a classifier on S

(3) it calculates in data set S_bOn error rate:

(4) τ is set_t=ε_t/(1-ε_t), if classifier weight coefficient

(5) it is as follows that new weight vectors are set:

N times iteration terminates.

3. strong classifier

H in formula_f(x) strong classifier to obtain；α_tFor each Weak Classifier h_t(x) weight.

After above-mentioned 3 steps, it is formed the network inbreak detection method based on Active Learning and transfer learning. The method of the present invention is in terms of intrusion detection precision and time efficiency, compared to traditional intrusion detection side based on machine learning Method not only has preferable Detection accuracy and detection speed, but also improves the accuracy, real-time and balance of detection.

It is better embodiment of the invention above, but protection scope of the present invention is not limited to this, it is any ripe Know those skilled in the art in the technical scope disclosed by the present invention, it is all to be transformed or replaced according to technical solution of the present invention , it should all be included within the scope of protection of the present invention.Therefore, protection scope of the present invention all should be with the protection model of claim Subject to enclosing.

1. a kind of network inbreak detection method based on Active Learning and transfer learning, the method includes the following contents:

Given a large amount of unlabelled source domain intrusion detection sample set S_a, a small amount of markd aiming field intrusion detection sample Set S_b, n S_aSample size, m S_bMiddle sample size；T=S_a∪S_b, x_i∈ T, f (x_i) true class label.Unmarked test Intrusion detection data set S, base a classifier BA_SVM, the number of iterations N.The step of method include initialization, core process and Strong classifier.Each step is described as follows:

(1) it initializes, using Active Learning thought, calculates source domain S_aIn each sample weight, query function is according to power Again by source domain S_aWith aiming field S_bThe biggish sample of similitude is marked.

(2) core process describes the main implementation procedure of inventive method, obtains one group of Weak Classifier power corresponding with its Weight；

(3) strong classifier, according to the different weight of Weak Classifier, combination obtains strong classifier.

The initialization is as follows:

S.t.0≤β≤1, i=1,2 ..., n

N indicates source domain number of samples, K=K (x in above-mentioned formula_i,x_j)=φ (x_i)^Tφ(x_j),

For j=1, n

if(β_i< 0)

Give up the sample；

else

Retain sample, expert is transferred to be labeled.

(3) weight vectors are initializedWherein

The method core process:

SettingInventive method core executive process is as follows:

For t=1 ..., N

(1) p is set^tMeet

(3) it calculates in data set S_bOn error rate:

(4) τ is set_t=ε_t/(1-ε_t), if classifier weight coefficient

(5) it is as follows that new weight vectors are set:

N times iteration terminates.

The classifier:

In summary as it can be seen that the present invention is on the basis of transfer learning, a large amount of history attack sample can be effectively utilized This solution intrusion detection attack type sample scarcity problem and attack type detect imbalance problem, in addition using reasonable active Learning strategy selected marker sample can reduce the classifier training time.In short, the method for the present invention is examined compared to pervious invasion Survey method accuracy rate is higher, and rate of false alarm is lower and algorithm time efficiency is more advantageous.

It is a kind of based on master the present invention relates to a kind of method for improving network invasion monitoring accuracy rate and time efficiency The efficient intrusion detection method of dynamic study and transfer learning.The method of the present invention the following steps are included:

Given a large amount of unlabelled intrusion detection sample set S_a, a small amount of markd aiming field intrusion detection sample set S_b, n S_aSample size, m S_bMiddle sample size；T=S_a∪S_b, x_i∈ T, f (x_i) true class label.Unmarked test invasion Detection data collection S, base a classifier BA_SVM, the number of iterations N.

1. initialization calculates source domain S using Active Learning thought_aIn each sample weight, query function is according to weight By source domain S_aWith aiming field S_bThe biggish sample of similitude is marked.

(1) using Active Learning query strategy Q to training dataset S_aSelected marker is carried out, following formula is calculated:

S.t.0≤β≤1, i=1,2 ..., n

For j=1, n

if(β_i< 0)

Give up the sample；

else

Retain sample, expert is transferred to be labeled.

(3) weight vectors are initializedWherein

2. core process describes the main implementation procedure of inventive method, obtains one group of Weak Classifier and corresponding weight；

SettingInventive method core executive process is as follows:

For t=1 ..., N

(1) p is set^tMeet

(3) it calculates in data set S_bOn error rate:

(4) τ is set_t=ε_t/(1-ε_t), if classifier weight coefficient

(5) it is as follows that new weight vectors are set:

N times iteration terminates.

3. strong classifier, according to the different weight of Weak Classifier, combination obtains strong classifier.

Claims

1. a kind of network inbreak detection method based on Active Learning and transfer learning, which comprises the following steps:

(3) base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tAnd Unlabeled data Collect S, obtains several Weak Classifier models；

2. a kind of network inbreak detection method based on Active Learning and transfer learning according to claim 1, feature It is, the given source domain intrusion detection sample set S_aAnd aiming field intrusion detection sample set S_b, comprising:

3. a kind of network inbreak detection method based on Active Learning and transfer learning according to claim 1, feature It is, it is described to use Active Learning thought, calculate source domain S_aIn each sample weight, query function is according to weight by source domain S_a With aiming field S_bThe biggish sample of similitude is marked, comprising:

S.t.0≤β≤1, i=1,2 ..., n

(2.2) according to the sample in the calculated result screening source domain in (2.1), source domain S is filtered_aIn with aiming field S_bIt differs greatly Sample, to obtain new training dataset S'_a, source domain sample set S_aCollection after being marked is combined into S'_a, sample size is n'；

(2.3) weight vectors are initializedWherein

4. a kind of network inbreak detection method based on Active Learning and transfer learning according to claim 1, feature It is, the calling base classifier BA_SVM, according to the weight distribution p after merging on training dataset T and T^tWith it is unmarked Data set S obtains several Weak Classifier models, comprising:

Setting

(3.1) p is set^tMeet

Wherein, w_iFor the weight vectors of i-th source domain sample；

(3.2) base classifier BA_SVM is called, according to the weight distribution p after merging on training dataset T and T^tWith unmarked number According to collection S, a classifier on S is obtained

(3.3) it calculates in data set S_bOn error rate:

(3.4) τ is set_t=ε_t/(1-ε_t), classifier weight coefficient is

(3.5) it is as follows that new weight vectors are set:

N times iteration terminates.

5. a kind of network inbreak detection method based on Active Learning and transfer learning according to claim 1, feature It is, the weight different according to Weak Classifier, combination obtains strong classifier, comprising:

After learning training by above 2 steps, form several Weak Classifier models, then by these Weak Classifiers according to It is as follows that final classifier is obtained according to respective weight: