CN112653751B

CN112653751B - Distributed intrusion detection method based on multilayer extreme learning machine in Internet of things environment

Info

Publication number: CN112653751B
Application number: CN202011503520.6A
Authority: CN
Inventors: 付兴兵; 吴炳金; 焦利彬; 索宏泽; 章坚武; 唐向宏
Original assignee: Hangzhou Dianzi University; CETC 54 Research Institute
Current assignee: Hangzhou Dianzi University; CETC 54 Research Institute
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2022-05-13
Anticipated expiration: 2040-12-18
Also published as: CN112653751A

Abstract

Due to the characteristic that resources of related equipment are constrained, tasks with heavy calculation for realizing automatic attack detection are moved to boundary equipment so as to enable a processing function to be close to a data source. These edge devices can run a preset classification model. But do not have sufficient storage and processing power to build and upgrade such models when faced with large amounts of training data. In order to solve the problem, the training operation with intensive calculation and large storage amount is moved to the cloud server for construction, a single-hidden-layer extreme learning machine and a multi-hidden-layer extreme learning machine model are trained in the cloud server, so that the boundary equipment executes flow classification based on a deep learning model preset in the cloud server, normal flow or network attack is classified, and the multi-hidden-layer extreme learning machine has better performance through experimental analysis.

Description

Distributed intrusion detection method based on multilayer extreme learning machine under Internet of things environment

Technical Field

The invention belongs to the field of intrusion detection and deep learning, and particularly relates to a distributed intrusion detection method based on a multilayer extreme learning machine in an Internet of things environment.

Background

The internet of things technology, which is rapidly developed, is connected with internet-based remote control equipment through increasingly complex sensing equipment, and a gap between traditional information services and the surrounding physical environment is rapidly closed. Many potential internet of things application services, such as environment monitoring, traffic monitoring, health and medical monitoring and the like, are produced, and the applications greatly improve the interactivity of human beings and computing equipment. The increasing requirements of internet of things application and physical information service on network security are increasing day by day, and intrusion detection under the internet of things becomes a mainstream trend of the current development of the internet of things technology.

Anderson first introduced the concept of intrusion detection in 1980, opening the frontier of intrusion detection studies. Intrusion detection based on deep learning has gained rapid development, benefiting from the excellent performance of classification models based on deep learning. Due to the resource-constrained nature of the associated devices, one popular solution today is to allocate some storage and computing power on edge devices near the data source, i.e., "edge computing," which can shift certain applications and services from a centralized point to the edge. Meanwhile, the boundary devices can run preset classification models, so that for the tasks which are heavy in calculation and realize automatic attack detection, the core operation device can reduce the workload of operation and processing. However, when a large amount of data is faced, the preset model is not enough in storage and operation processing capacity to deal with the data, and is limited by hardware performance constraints, so that the preset model cannot be greatly upgraded. Therefore, it is desirable to separate the training operations with high computation density and large memory amount, and let the boundary device execute the work with small computation amount as much as possible, so as to further improve the efficiency on the premise of ensuring the accuracy.

Disclosure of Invention

In order to solve the problem that in the prior art, overlarge training data cannot be processed by boundary equipment for distributed intrusion detection in the environment of the Internet of things, the invention provides a distributed intrusion detection method based on a multilayer extreme learning machine in the environment of the Internet of things. The method is based on a multi-hidden-layer extreme learning machine model, training operation with high calculation density and large storage amount is transferred to a cloud server for carrying out, so that the boundary equipment can execute flow classification based on a preset model in the cloud server, and normal flow or network attack is classified. The technical problem to be solved by the invention is realized by the following technical method:

the invention provides a distributed intrusion detection method based on a multilayer extreme learning machine in an Internet of things environment, which comprises the following steps:

step 1: preprocessing network flow data;

step 2: building a single hidden layer extreme learning machine classification model;

step 2.1, the single-hidden-layer extreme learning machine has three layers in total, wherein the first layer is an input layer, the second layer is a hidden layer, and the third layer is an output layer; the connection weight w and the bias b of the input layer and the hidden layer are set randomly, the number of the characteristic columns is the number of hidden nodes of the hidden layer, and the bias b is realized by adding a column of all-1 values to the input characteristic matrix;

step 2.2, for M arbitrary input samples, firstly carrying out linear operation by arbitrary weight w and bias b to obtain a feature matrix, and then carrying out nonlinear operation on the feature matrix by using a Sigmoid activation function to obtain a nonlinear feature matrix H; wherein, the Sigmoid activation function maps the eigenvalue to [0,1], and the formula is as follows:

the nonlinear feature matrix H is represented by formula H_ij＝S(w_i·X_j+b_i) (ii) a Where X is input sample data, w_iWeight representing the ith hidden node, b_iDenotes the bias of the ith hidden node, X_jA characteristic column representing jth column of input data;

step 2.3, on the basis of step 2.2, representing a single hidden layer neural network with L hidden nodes as follows:

wherein S is a Sigmoid activation function, N is the number of columns of an input characteristic data matrix, b_iFor the ith of hidden layer

Offset of (b), beta_iOutput weight of the i-th hidden node, o_jOutputting the j-th column characteristic value of the extreme learning machine;

step 2.4, let the output of the single hidden layer neural network differ the minimum from the target value, hence expressed as

Wherein t is_jIs a target value, therefore beta is present_i，w_iAnd b_iSo that

I.e., H β ═ T, where H is the output of the hidden node, β is the output weight, and T is the desired output;

step 2.5, training the single hidden layer extreme learning machine model, equivalently obtaining

So that

I.e. minimizing the loss function as

Step 2.6, once the input weight w and the hidden layer bias b are randomly determined, the output matrix H is also uniquely determined; thus solving for simple hidingThe layer neural network is H beta-T; the output weight may be confirmed as:

wherein

Is a Moore-Penrose generalized inverse,

and step 3: building a multi-hidden-layer extreme learning machine classification model;

step 3.1, the multi-hidden-layer extreme learning machine comprises a plurality of hidden layers, and the specific number of layers is determined by the length n of a hidden node number list L given by a user;

step 3.2, assume that list L ═ a₁,a₂,a₃,…a_n]，a_iThe number of hidden nodes corresponding to the ith hidden layer;

step 3.3, for a₁Solving for the output weight β₁The step (2) is consistent with the single hidden layer model solving step in the step (2), and the output beta is output₁Storing the weight list M into an output weight list;

step 3.4, for a₂,…,a_n-1The input feature matrix of each hidden layer is the result of inner products of the output weights of the hidden layers in front of the hidden layer and the original input feature matrix in sequence, and the formula is as follows:

X₂＝β₁·X

X₃＝β₂·(β₁·X)＝β₂·X₂

……

X_n-1＝β_n-2·X_n-2

step 3.5, for a_nI.e. the last hidden layer, with an input matrix of

X_n＝β_n-1·X_n-1

Then substituting into a single hidden layer extreme learning machine to obtainFinal beta_n；

And 4, step 4: respectively inputting the training set into the single-hidden-layer extreme learning machine and the multi-hidden-layer extreme learning machine to obtain output

A weight β;

and 5: and classifying the test set by using the trained model.

Preferably, the step 1 comprises the following steps:

step 1.1, taking network flow data as a data set, converting character characteristic data of the data set into numerical values, and then carrying out one-hot coding on the characteristics;

step 1.2, respectively normalizing each characteristic data, wherein the formula is as follows:

wherein x_minIs the minimum value of the feature, x_maxIs the maximum value of the feature;

1.3, segmenting a data set, and dividing the whole data set into a training set and a test set according to the proportion of 8: 2;

1.4, respectively segmenting Label fields in the training set and the data set, judging, marking as-1 if the value is 0, otherwise marking as 1, namely marking the flow of the normal sample as-1 and marking the attack sample as 1;

step 1.5, obtaining training data Train _ X and training data label Train _ Y; test data Test _ X, Test data tag Test _ Y.

Preferably, the connection weight w in step 2.2 is a uniform distribution of [ -1,1], and the distribution density function is as follows:

wherein m is the maximum value of the random value range of w, and n is the minimum value of the random value range of w.

Preferably, the step 5 comprises the following steps:

step 5.1, under the condition that the output weight beta, the input weight w and the hidden layer bias b are known, classifying the input test set through the following formula, and outputting a discriminant array Predicts;

wherein, beta_iFor the output weight, X is the test data, sign () is a sign function, which is expressed as follows:

preferably, the method further comprises model evaluation; the method specifically comprises the following steps:

step 6.1, calculating the difference between the judgment array Predicts obtained in the step 5 and the Test data real label Test _ Y;

step 6.2, drawing ROC curves of the two models according to Predicts and Test _ Y;

and 6.3, calculating each evaluation index according to Predicts and Test _ Y and recording the time consumed by training and classification.

Compared with the background technology, the invention has the advantages that:

the method comprises the following steps of utilizing a multi-hidden-layer extreme learning machine model, randomly acquiring the connection weight of an input layer and a hidden layer and the bias of the hidden layer, and not adjusting after the acquisition. Compared with BP neural network, it does not need to reverse adjust weight and bias, so the efficiency is increased greatly.

And secondly, classification only needs to be carried out on input data through matrix operation, the calculated amount is small, the training process with high calculation density is carried out on the cloud server, the efficiency is improved, and meanwhile, the performance requirement on the boundary equipment is lowered.

And thirdly, the multi-hidden-layer extreme learning machine can extract deeper data characteristic relation by constructing multiple hidden layers and comparing with a single-hidden-layer extreme learning machine, and the precision of the classifier is improved.

The present invention will be described in further detail with reference to the drawings and examples.

Drawings

FIG. 1 is an overall architecture of the present invention;

FIG. 2 is an experimental procedure of the present invention;

FIG. 3 is a network structure of a single hidden layer limit learning machine according to the present invention;

FIG. 4 is a network structure of a multi-hidden layer extreme learning machine according to the present invention;

FIG. 5 is a ROC graph in the present invention.

Detailed Description

The invention is further described below with reference to the following examples, which are set forth in detail:

the whole intrusion detection architecture of the invention is shown in figure 1, and the architecture is divided into two parts, namely boundary equipment and a cloud server. The boundary equipment is preset with a model trained by the cloud server, and has the functions of preprocessing and classifying original network data, and if the classification result is abnormal, the boundary equipment informs an administrator; and the cloud server receives the network data transmitted by the boundary equipment, performs model training by using the data, distributes the newly trained model to the boundary equipment and updates the model. The following is a detailed description of the experimental part, and the experimental flow chart can refer to the attached FIG. 2.

Step 1, preprocessing network flow data.

Step 1 of the present invention comprises the following steps:

step 1.1, converting character characteristic data of the data set into numerical values, and then carrying out one-hot coding on the characteristics.

The method uses a network intrusion data set (CSE-CIC-IDS2017) collected by a communication security mechanism and a Canada network security research institute cooperation project to map character values under Protocol fields in the CSE-CIC-IDS2017 to numerical values according to categories, and then one-hot coding is carried out.

Step 1.2, respectively normalizing the characteristic data of each field, wherein the formula is as follows:

wherein x_minIs the minimum value of this field, x_maxIs the maximum value of this field. E.g. x for a certain column of characteristic data_max＝10，x_minWhen 0, then some eigenvalue of the column is normalized to 0.6, at [0,1 |]。

And 1.3, segmenting the data set, and dividing the whole data set into a training set and a test set according to the ratio of 8: 2.

100,000 sample data were selected in the original Dataset, labeled as Dataset, and the Dataset was partitioned, with 80,000 in the training set and 20,000 in the test set.

And 1.4, respectively segmenting the Label fields in the training set and the data set, judging, marking as-1 if the value is 0, otherwise marking as 1, namely marking the flow of the normal sample as-1 and marking the attack sample as 1.

Changing the data of BENIGN under the Label field into-1, which is the normal network flow; the data of the non-BENIGN is changed into 1, and is network attack traffic.

The Train _ X dataset shape is [80000,74], the Train _ Y dataset shape is [80000 ]; the Test _ X dataset shape is [20000,74] and the Test _ Y dataset shape is [20000 ].

And 2, building a single hidden layer extreme learning machine classification model (ELM). (Single hidden layer extreme learning machine network architecture refer to FIG. 3 of the accompanying drawings).

Step 2 of the present invention comprises the following steps:

and 2.1, the single-hidden-layer extreme learning machine has three layers, wherein the first layer is an input layer, the second layer is a hidden layer, and the third layer is an output layer. In the method, the connection weight w is uniform distribution of [ -1,1], the number of characteristic columns is the number of hidden nodes of the hidden layer, and the distribution density function is as follows:

wherein m is the maximum value of the random value range of w, and n is the minimum value of the random value range of w; the bias b is realized by adding a column of all 1 values to the input feature matrix;

in this experiment, the shape of the characteristic data matrix of the input layer is [80000,74], the number of hidden nodes is 50, the number of characteristic data columns is 74, and therefore the shape of the weight w is [50,74], and the shape of the characteristic data matrix after adding the offset b is [80000,74 ].

And 2.2, for M arbitrary input samples, firstly carrying out linear operation by arbitrary weight w and bias b to obtain a feature matrix, and then carrying out nonlinear operation on the feature matrix by using a Sigmoid activation function to obtain a nonlinear feature matrix H. Wherein, the Sigmoid activation function maps the eigenvalue to [0,1], and the formula is as follows:

the nonlinear feature matrix H is expressed by the formula H_ij＝S(w_i·X_j+b_i)。

After the activation function, the element size of the feature data matrix is at [ -1,1 ].

Step 2.3, based on step 2.2, a single hidden layer neural network with L hidden nodes can be expressed as

Wherein S is a Sigmoid activation function, N is the number of columns of an input characteristic data matrix, b_iTo hide the bias of the ith layer, β_iOutput weight of the i-th hidden node, o_jAnd outputting the j-th column characteristic value of the extreme learning machine.

Step 2.4, the output of the neural network with a single hidden layer is made to have the smallest difference with the target value, and thus can be expressed as

That is, H β ═ T, where H is the output of the hidden node, β is the output weight, and T is the desired output.

The shape of the output weight β is [50], the shape of the hidden node output H is [80000,50], and the shape of the desired output T is [80000 ].

So that

I.e. minimizing the loss function as

Once the input weights w and the hidden layer bias b are randomly determined, step 2.6, the output matrix H is also uniquely determined. Therefore, solving the single hidden layer neural network is H β ═ T. The output weight may be confirmed as:

wherein

Is Moore-Penrose (MP) generalized inverse,

and 3, building a multi-hidden-layer extreme learning machine classification model (MLELM). (the network structure of the multi-hidden-layer extreme learning machine refers to the attached figure 4).

Step 3 of the present invention comprises the following steps:

and 3.1, the multi-hidden-layer extreme learning machine comprises a plurality of hidden layers, and the specific number of the layers is determined by the length n of a hidden node number list L given by a user.

Step 3.2, assume list L ═ a₁,a₂,a₃,…a_n]，a_iCorresponding to the number of hidden nodes of the ith hidden layer.

The experimental list L ═ 50,100,150.

Step 3.3, for α₁Solving for the output weight β₁The step (2) is consistent with the single hidden layer model solving step in the step (2), and the output beta is output₁And storing the weight list M.

Thus beta₁Has a shape of [50]]。

X₂＝β₁·X

X₃＝β₂·(β₁·X)＝β₂·X₂

……

X_n-1＝β_n-2·X_n-2

β₂is in the shape of [100 ]]。

Step 3.5, for a_nI.e. the last hidden layer, with an input matrix of

X_n＝β_n-1·X_n-1

Then substituting the obtained result into a single hidden layer extreme learning machine to obtain the final beta_n。

β₃Is in the shape of [150]]。

And 4, training the model and calculating the output weight.

Step 4 of the present invention comprises the following steps:

and 4.1, inputting the training set into the single-hidden-layer extreme learning machine and the multi-hidden-layer extreme learning machine respectively to obtain an output weight beta.

In the experiment, the output weight beta of the single-hidden-layer extreme learning machine model is [50], and the output weight beta of the multi-hidden-layer extreme learning machine model is [150 ].

And 5, classifying the test set by using the trained model.

Step 5 of the present invention comprises the following steps:

and 5.1, under the condition that the output weight beta, the input weight w and the hidden layer bias b are known, classifying the input test set through the following formula, and outputting a discriminant array Predicts.

Wherein, beta_iFor output weights, X is test data, sign () is a sign function, expressed as follows:

the input data shape is [20000,74], and the output discriminant array Predicts shape is [20000], and its value is [ -1,1, -1, -1, …,1,1, -1 ].

And 6, evaluating the model.

Step 6 of the present invention comprises the following steps:

and 6.1, calculating the difference between the judgment array Predicts obtained in the step 5 and the Test data real label Test _ Y.

And 6.2, drawing ROC curves of the two models according to Predicts and Test _ Y.

And 6.3, calculating various evaluation indexes such as Accuracy (ACC), False Alarm Rate (FAR) and Detection Rate (DR), Precision (Precision), Recall (Recall) and F1-Measure according to Predicts and Test _ Y, and recording the time consumed by training and classification.

The method is a binary classification test, i.e. normal or abnormal, and therefore four outcomes are predicted, i.e. True Positives (TP): detecting as abnormal, in fact abnormal; false Positive (FP): detected as abnormal, in fact normal; true Negative (TN): detected as normal, and actually normal; false Negative (FN): detected as normal, and in fact abnormal. According to the standard, the Accuracy (ACC), the False Alarm Rate (FAR) and the Detection Rate (DR), the Precision (Precision), the Recall rate (Recall) and the F1-Measure of the model prediction result are calculated.

The ROC curve is suitable for evaluating the overall performance of the classifier and does not change significantly as the ratio of positive and negative samples in the sample changes. The two model evaluation indexes mentioned in the method are shown in the following table 1.

TABLE 1

The MLELM model is superior to ELM in six indexes of ACC, FAR and DR, Precision, Recall and F1-Measure, but the test time consumption is larger than ELM due to the increase of hidden layers.

The ROC diagram of the two models is shown in FIG. 5. The analysis of the experimental results can lead the performance of the MLELM model to be more excellent than that of the ELM.

Claims

1. The distributed intrusion detection method based on the multilayer extreme learning machine under the environment of the Internet of things is characterized by comprising the following steps:

step 1: preprocessing network flow data;

step 2.1, the single-hidden-layer extreme learning machine has three layers in total, wherein the first layer is an input layer, the second layer is a hidden layer, and the third layer is an output layer; the connection weight w and the bias b of the input layer and the hidden layer are randomly set, and the number of the characteristic columns is the number of hidden nodes of the hidden layer; the bias b is realized by adding a column of all 1 values to the input feature matrix;

step 2.2, for any input sample, firstly carrying out linear operation by any weight w and bias b to obtain a feature matrix, and then carrying out nonlinear operation on the feature matrix by using a Sigmoid activation function to obtain a nonlinear feature matrix H; wherein, the Sigmoid activation function maps the eigenvalue to [0,1], and the formula is as follows:

the nonlinear feature matrix H is represented by formula H_ij＝S(w_i·X_j+b_i) (ii) a Where X is input sample data, w_iWeight representing the ith hidden node, b_iA bias representing the ith hidden node, and X_jA characteristic column representing jth column of input data;

wherein S is a Sigmoid activation function, N is the number of columns of an input characteristic data matrix, b_iTo hide the bias of the ith layer, β_iOutput weight of the i-th hidden node, o_jOutputting the j-th column characteristic value of the extreme learning machine;

So that

I.e. minimizing the loss function as

Step 2.6, once the input weight w and the hidden layer bias b are randomly determined, the output matrix H is also uniquely determined; therefore, solving the single hidden layer neural network is H β ═ T; the output weight may be confirmed as:

wherein

Is a Moore-Penrose generalized inverse,

step 3.1, the multi-hidden-layer extreme learning machine comprises a plurality of hidden layers, and the specific number of the layers is determined by the length n of a hidden node number list L given by a user;

step 3.2, assume that list L ═ a₁，a₂，a₃，...a_n]，a_iThe number of hidden nodes corresponding to the ith hidden layer;

step 3.4, for a₂，...，a_n-1The input feature matrix of each hidden layer is the result of inner product of the output weight of the hidden layer in front of the hidden layer and the original input feature matrix in turn, and the formula is as follows:

X₂＝β₁·X

X₃＝β₂·(β₁·X)＝β₂·X₂

……

X_n-1＝β_n-2·X_n-2

step 3.5, for a_nI.e. the last hidden layer, with an input matrix of

X_n＝β_n-1·X_n-1

Then substituting the obtained value into a single hidden layer extreme learning machine to obtain the final beta_n；

And 4, step 4: respectively inputting the training set into a single-hidden-layer extreme learning machine and a multi-hidden-layer extreme learning machine to obtain an output weight beta;

and 5: and classifying the test set by using a trained multi-hidden-layer extreme learning machine.

2. The distributed intrusion detection method based on the multi-layer extreme learning machine in the environment of the internet of things according to claim 1, characterized in that: the step 1 comprises the following steps:

step 1.1, taking network flow data as a data set, converting character characteristic data of the data set into numerical values, and then carrying out one-hot coding on the characteristic values;

step 1.2, respectively normalizing each characteristic value, wherein the formula is as follows:

1.3, dividing a data set, and dividing the whole data set into a training set and a test set according to the ratio of 8: 2;

1.4, respectively segmenting Label fields in the training set and the data set, judging, if the value is 0, marking the value as-1, otherwise, marking the value as 1, namely marking the flow of a normal sample as-1, and marking an attack sample as 1;

3. The distributed intrusion detection method based on the multi-layer extreme learning machine in the environment of the internet of things according to claim 1, characterized in that: the connection weight w in step 2.2 is a uniform distribution of [ -1,1], and the distribution density function is as follows:

4. The distributed intrusion detection method based on the multi-layer extreme learning machine in the environment of the internet of things according to claim 1, characterized in that: the step 5 comprises the following steps:

wherein beta is_iFor output weights, X is test data, sign () is a sign function, expressed as follows:

5. the distributed intrusion detection method based on the multi-layer extreme learning machine in the environment of the internet of things according to claim 1, characterized in that: also comprises model evaluation; the method specifically comprises the following steps:

step 6.1, calculating the difference between the judgment array Predicts obtained in the step 5 and the real Test _ Y label of the Test data;