CN110995692A - Network security intrusion detection method based on factor analysis and subspace collaborative representation - Google Patents

Network security intrusion detection method based on factor analysis and subspace collaborative representation Download PDF

Info

Publication number
CN110995692A
CN110995692A CN201911192193.4A CN201911192193A CN110995692A CN 110995692 A CN110995692 A CN 110995692A CN 201911192193 A CN201911192193 A CN 201911192193A CN 110995692 A CN110995692 A CN 110995692A
Authority
CN
China
Prior art keywords
connections
current connection
factor
percentage
subspace
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911192193.4A
Other languages
Chinese (zh)
Inventor
张明明
李萌
陈咏秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co Ltd
Jiangsu Electric Power Information Technology Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co Ltd
Jiangsu Electric Power Information Technology Co Ltd
Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co Ltd, Jiangsu Electric Power Information Technology Co Ltd, Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co Ltd
Priority to CN201911192193.4A priority Critical patent/CN110995692A/en
Publication of CN110995692A publication Critical patent/CN110995692A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a network security intrusion detection method based on factor analysis and subspace collaborative representation, which is used for obtaining security factors reflecting different security states of a network at a certain moment or within a certain time period, wherein the security factors comprise TCP connection basic characteristics, TCP connection content characteristics, time-based network flow statistical characteristics and host-based network flow statistical characteristics, analyzing the factors through 4 algorithms to obtain contribution weights of different factors to network intrusion detection, sequencing the factors according to the mean value of the contribution weights obtained by the 4 algorithms, and extracting N factors with the largest contribution weights. The invention utilizes the subspace collaborative representation classification algorithm to detect the network security state, and has the advantages of rapidness, effectiveness, ingenious method, novel concept and good application prospect.

Description

Network security intrusion detection method based on factor analysis and subspace collaborative representation
Technical Field
The invention relates to the technical field of network security, in particular to a network security intrusion detection method based on factor analysis and subspace collaborative representation.
Background
The wide application of information technology and the rapid development of network space greatly promote social prosperity and progress, but the information security problem in the informatization development process is increasingly prominent, such as virus infection, illegal invasion, brute force cracking, denial of service attack and the like. In order to prevent the accidents, the network safety prediction is judged and analyzed in advance, and corresponding protective measures are taken according to the safety hazard degree, so that the asset loss can be effectively reduced.
The operation and maintenance work which is as important as the scientific and technological construction work has been gradually paid attention to. How to save the operation and maintenance cost, improve the operation and maintenance efficiency and ensure the operation and maintenance safety is a very wide subject. The network security analysis is an essential link in operation and maintenance, and is concerned with the stable operation of the most critical system, and is concerned particularly.
Network security is an indispensable part in large and small enterprise management, and current network security prediction judgment and analysis cannot realize situation assessment of network security, cannot establish a method for quantitative analysis of network security, and is a problem to be solved.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a network security intrusion detection method based on factor analysis and subspace collaborative representation, which obtains security factors reflecting different security states of a network at a certain moment or within a certain time period, wherein the security factors comprise TCP connection basic characteristics, TCP connection content characteristics, time-based network traffic statistical characteristics and host-based network traffic statistical characteristics, the factors are analyzed through 4 algorithms to obtain contribution weights of different factors to network intrusion detection, the factors are ranked according to the mean values of the contribution weights obtained by the 4 algorithms, N factors with the largest contribution weights are extracted, and a subspace collaborative representation classification algorithm is used for detecting the security states of the network.
In order to achieve the purpose, the invention adopts the technical scheme that:
a network security intrusion detection method based on factor analysis and subspace collaborative representation comprises the following steps,
step (A), standardizing and normalizing network data;
step (B), performing factor analysis on the network data, obtaining the contribution weight of each factor by using four factor analysis methods, and calculating the contribution weight mean value of each factor;
step (C), extracting N factors with the largest contribution weight, detecting the network data by utilizing a subspace collaborative representation classification algorithm, and returning to the step (A) if the detection is normal, and evaluating the network data situation at the next moment or in the next time period; if the detection result is abnormal, the detection result is an abnormal event, and an alarm is output.
In the invention, in the step (A), the network data comprises TCP connection basic characteristics, TCP connection content characteristics, time-based network flow statistical characteristics and host-based network flow statistical characteristics.
The aforementioned TCP connection basic characteristics include the following factors,
network connection duration, network protocol type, network service type of the target host, number of bytes of data from the source host to the target host, number of bytes of data from the target host to the source host.
The content characteristics of the aforementioned TCP connection include the following factors,
number of failed login attempts, whether login was successful, number of compromised conditions occurred.
The aforementioned time-based network traffic statistics include factors,
the number of connections having the same target host as the current connection in the past two seconds, the number of connections having the same service as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the percentage of connections having "SYN" error in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same service as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same service as the current connection in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the connection having the same target host as the current connection in the current connection, percentage of connections with different services from the current connection, percentage of connections with different target hosts in the last two seconds, in connections with the same services as the current connection.
The aforementioned host-based network traffic statistics include factors,
of the first 100 connections, the number of connections having the same destination host as the current connection, the percentage of connections having the same service as the current connection and the same destination host as the current connection among the first 100 connections, the percentage of connections having different service from the current connection and the same source port as the current connection among the first 100 connections, the percentage of connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having different source hosts from the current connection among the connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having "SYN" errors among the connections having the same destination host as the current connection among the first 100 connections, and the first 100 connections, the percentage of the connections with the same service as the current connection and the same target host, in which the "SYN" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, in which the "REJ" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, and the percentage of the connections with the same service as the current connection and the same target host as the current connection, in which the "REJ" error occurs.
A step (A) of normalizing said data, comprising the steps of,
(A1) and calculating the average value of the samples,
Figure BDA0002293850040000031
wherein X is sample data;
(A2) and the standard deviation of the samples is calculated,
Figure BDA0002293850040000032
(A3) normalizing the data according to the mean and standard deviation of the sample,
Figure BDA0002293850040000033
the data is normalized, and the method comprises the following steps,
(A4) calculating the minimum value of the sample, Xmin=min{X’ij};
(A5) Calculating the maximum value of the sample, Xmax=max{X’ij};
(A6) Normalizing the data according to the minimum and maximum values of the samples,
Figure BDA0002293850040000034
the network security intrusion detection method based on the factor analysis and the subspace collaborative representation, step (B), performing factor analysis on the network data, obtaining the contribution weight of each factor by using four factor analysis methods and calculating the contribution weight mean value of each factor, includes the following steps,
(B1) calculating the contribution weight of each factor by using a variance threshold filtering method;
(B2) calculating the contribution weight of each factor by using a characteristic selection method based on mutual information;
(B3) calculating the contribution weight of each factor by using a feature selection method based on Lasso regression;
(B4) calculating the contribution weight of each factor by using a feature selection method based on a Relieff algorithm;
(B5) the 4 contribution weights of each factor are averaged
The aforementioned variance threshold filtering method, (B1), comprising the steps of,
(1) calculating the variance var (i) of each factor;
(2) sorting the obtained variances in a descending order;
(3) factors of variance greater than the threshold T are truncated as a filtered result.
The aforementioned mutual information-based feature selection method, (B2), comprises the steps of,
(1) the feature matrix is recorded as
Figure BDA0002293850040000035
The class (label) vector is
Figure BDA0002293850040000036
Where n is the number of samples, s is the number of features, xiIs the ith eigenvector (i ═ 1, …, s);
(2) calculating each feature vector xiMutual information mi (i) with Y;
(3) sequencing the acquired mutual information MI in a descending order;
(4) and intercepting the factor of the mutual information which is larger than the threshold value T as a result of feature selection.
The mutual information MI: the calculation formula is as follows,
Figure BDA0002293850040000041
where X and Y are two discrete random variables, p (X, Y) is the joint probability distribution of X and Y, and p (X) and p (Y) are the edge distribution probabilities of X, Y, respectively.
The aforementioned feature selection method based on Lasso regression, (B3), comprises the following steps,
(1) the feature matrix is recorded as
Figure BDA0002293850040000042
The class (label) vector is
Figure BDA0002293850040000043
Wherein n is the number of samples and s is the number of features;
(2) performing Lasso regression on X and Y;
(3) sorting the obtained Lasso regression results in a descending order;
(4) and intercepting a factor of which the regression result is larger than the threshold value T as a result of feature selection.
The aforementioned feature selection method based on the ReliefF algorithm, (B4), comprises the following steps,
(1) the feature matrix is recorded as
Figure BDA0002293850040000044
The class (label) vector is
Figure BDA0002293850040000045
Wherein n is the number of samples and s is the number of features;
(2) calculating the weight W of each factor by utilizing a Relief algorithm;
(3) sorting the obtained factor weights in a descending order;
(4) and taking a factor of which the intercepted result is larger than the threshold value T as a result of feature selection.
The calculation method of the weight W in the step (2) is as follows:
(1) randomly selecting a sample point R in Xi
(2) Find and RiSimilar k nearest neighbor samples Hj
(3) For each C ≠ class (R)i) Find R respectivelyiDifferent classes of k nearest neighbor samples Mj(C);
(4) And circulating for p times, updating the contribution weight of each factor, wherein the updating formula is as follows:
Figure BDA0002293850040000046
(5) and (4) repeating the steps (1), (2), (3) and (4) for m times.
The network security intrusion detection method based on factor analysis and subspace collaborative representation comprises the following steps of (C) extracting N factors with the maximum contribution weight, and detecting network data by utilizing a subspace collaborative representation classification algorithm,
(C1) inputting a training sample X (the number of classes is C) and a corresponding label B, a sample y to be tested and a parameter lambda, and extracting N factors with the maximum contribution weight;
(C2) dividing the training sample X into C subsets according to the number of categories;
(C3) calculating the offset Tikhonov matrix gamma of the l-th class and the test sample yl,y
(C4) Calculating the approximate value of the test sample of the l-th class
Figure BDA0002293850040000051
(C5) Repeating the steps (C3) and (C4) C times
(C6) Respectively calculating the distance r between each class and ylBy passing
Figure BDA0002293850040000052
Obtaining a classification of y;
(C7) if the classification result is normal, monitoring the next piece of network data; and if the classification result is abnormal, a warning is given out.
The offset Tikhonov matrix gammal,yThe calculation formula is as follows:
Figure BDA0002293850040000053
wherein x1,x2,…,xnSubspace X forming class Il
Approximate values of the test sample
Figure BDA0002293850040000054
The calculation formula is as follows:
Figure BDA0002293850040000055
said approximation
Figure BDA0002293850040000056
The distance from sample y is calculated as follows:
Figure BDA0002293850040000057
the invention has the beneficial effects that: the invention relates to a network security intrusion detection method based on factor analysis and subspace collaborative representation, which obtains security factors reflecting different security states of a network at a certain moment or within a certain time period, wherein the security factors comprise TCP connection basic characteristics, TCP connection content characteristics, time-based network traffic statistical characteristics and host-based network traffic statistical characteristics, the factors are analyzed through 4 algorithms to obtain contribution weights of different factors to network intrusion detection, the factors are ranked according to the mean value of the contribution weights obtained by the 4 algorithms, N factors with the maximum contribution weight are extracted, and the subspace collaborative representation classification algorithm is utilized to detect the security state of the network.
Drawings
FIG. 1 is a flow chart of a network security intrusion detection method based on factor analysis and subspace collaborative representation according to the present invention;
FIG. 2 is a block diagram of the intrusion detection method for network security based on factor analysis and subspace collaborative representation according to the present invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings.
As shown in FIG. 1, the network security intrusion detection method based on factor analysis and subspace collaborative representation of the present invention obtains security factors reflecting different security states of a network at a certain time or within a certain time period, including TCP connection basic features, TCP connection content features, time-based network traffic statistical features, host-based network traffic statistical features, analyzes the factors through 4 algorithms, obtains contribution weights of different factors to network intrusion detection, sorts the factors according to the mean value of the contribution weights obtained by the 4 algorithms, extracts N factors with the largest contribution weights, detects the security state of the network by using a subspace collaborative representation classification algorithm, is fast and effective, has ingenious method and novel concept, and comprises the following steps,
step (A), standardizing and normalizing network data;
step (B), performing factor analysis on the network data, obtaining the contribution weight of each factor by using four factor analysis methods, and calculating the contribution weight mean value of each factor;
step (C), extracting N factors with the largest contribution weight, detecting the network data by utilizing a subspace collaborative representation classification algorithm, and returning to the step (A) if the detection is normal, and evaluating the network data situation at the next moment or in the next time period; if the detection result is abnormal, the detection result is an abnormal event, and an alarm is output.
And (A), the network data comprises TCP connection basic characteristics, TCP connection content characteristics, time-based network flow statistical characteristics and host-based network flow statistical characteristics.
The aforementioned TCP connection basic characteristics include the following factors,
network connection duration, network protocol type, network service type of the target host, number of bytes of data from the source host to the target host, number of bytes of data from the target host to the source host.
The content characteristics of the aforementioned TCP connection include the following factors,
number of failed login attempts, whether login was successful, number of compromised conditions occurred.
The aforementioned time-based network traffic statistics include factors,
the number of connections having the same target host as the current connection in the past two seconds, the number of connections having the same service as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the percentage of connections having "SYN" error in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same service as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same service as the current connection in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the connection having the same target host as the current connection in the current connection, percentage of connections with different services from the current connection, percentage of connections with different target hosts in the last two seconds, in connections with the same services as the current connection.
The aforementioned host-based network traffic statistics include factors,
of the first 100 connections, the number of connections having the same destination host as the current connection, the percentage of connections having the same service as the current connection and the same destination host as the current connection among the first 100 connections, the percentage of connections having different service from the current connection and the same source port as the current connection among the first 100 connections, the percentage of connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having different source hosts from the current connection among the connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having "SYN" errors among the connections having the same destination host as the current connection among the first 100 connections, and the first 100 connections, the percentage of the connections with the same service as the current connection and the same target host, in which the "SYN" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, in which the "REJ" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, and the percentage of the connections with the same service as the current connection and the same target host as the current connection, in which the "REJ" error occurs.
A step (A) of normalizing said data, comprising the steps of,
(A1) and calculating the average value of the samples,
Figure BDA0002293850040000071
wherein X is a data sample;
(A2) and the standard deviation of the samples is calculated,
Figure BDA0002293850040000072
(A3) normalizing the data according to the mean and standard deviation of the sample,
Figure BDA0002293850040000073
the data is normalized, and the method comprises the following steps,
(A4) calculating the minimum value of the sample, Xmin=min{X’ij};
(A5) Calculating the maximum value of the sample, Xmax=max{X’ij};
(A6) Normalizing the data according to the minimum and maximum values of the samples,
Figure BDA0002293850040000074
the network security intrusion detection method based on the factor analysis and the subspace collaborative representation, step (B), performing factor analysis on the network data, obtaining the contribution weight of each factor by using four factor analysis methods and calculating the contribution weight mean value of each factor, includes the following steps,
(B1) calculating the contribution weight of each factor by using a variance threshold filtering method;
(B2) calculating the contribution weight of each factor by using a characteristic selection method based on mutual information;
(B3) calculating the contribution weight of each factor by using a feature selection method based on Lasso regression;
(B4) calculating the contribution weight of each factor by using a feature selection method based on a Relieff algorithm;
(B5) the 4 contribution weights of each factor are averaged
The aforementioned variance threshold filtering method, (B1), comprising the steps of,
(1) calculating the variance var (i) of each factor;
(2) sorting the obtained variances in a descending order;
(3) factors of variance greater than the threshold T are truncated as a filtered result.
The aforementioned mutual information-based feature selection method, (B2), comprises the steps of,
(1) the feature matrix is recorded as
Figure BDA0002293850040000081
The class (label) vector is
Figure BDA0002293850040000082
Where n is the number of samples, s is the number of features, xiIs the ith eigenvector (i ═ 1, …, s);
(2) calculating each feature vector xiMutual information mi (i) with Y;
(3) sequencing the acquired mutual information MI in a descending order;
(4) and intercepting the factor of the mutual information which is larger than the threshold value T as a result of feature selection.
The mutual information MI: the calculation formula is as follows,
Figure BDA0002293850040000083
where X and Y are two discrete random variables, p (X, Y) is the joint probability distribution of X and Y, and p (X) and p (Y) are the edge distribution probabilities of X, Y, respectively.
The aforementioned feature selection method based on Lasso regression, (B3), comprises the following steps,
(1) the feature matrix is recorded as
Figure BDA0002293850040000084
The class (label) vector is
Figure BDA0002293850040000085
Wherein n is the number of samples and s is the number of features;
(2) performing Lasso regression on X and Y;
(3) sorting the obtained Lasso regression results in a descending order;
(4) and intercepting a factor of which the regression result is larger than the threshold value T as a result of feature selection.
The aforementioned feature selection method based on the ReliefF algorithm, (B4), comprises the following steps,
(1) the feature matrix is recorded as
Figure BDA0002293850040000086
The class (label) vector is
Figure BDA0002293850040000087
Wherein n is the number of samples and s is the number of features;
(2) calculating the weight W of each factor by utilizing a Relief algorithm;
(3) sorting the obtained factor weights in a descending order;
(4) and taking a factor of which the intercepted result is larger than the threshold value T as a result of feature selection.
The calculation method of the weight W in the step (2) is as follows:
(1) randomly selecting a sample point R in Xi
(2) Find and RiSimilar k nearest neighbor samples Hj
(3) For each C ≠ class (R)i) Find R respectivelyiDifferent classes of k nearest neighbor samples Mj(C);
(4) And circulating for p times, updating the contribution weight of each factor, wherein the updating formula is as follows:
Figure BDA0002293850040000091
(5) and (4) repeating the steps (1), (2), (3) and (4) for m times.
The network security intrusion detection method based on factor analysis and subspace collaborative representation comprises the following steps of (C) extracting N factors with the maximum contribution weight, and detecting network data by utilizing a subspace collaborative representation classification algorithm,
(C1) inputting a training sample X (the number of classes is C) and a corresponding label B, a sample y to be tested and a parameter lambda, and extracting N factors with the maximum contribution weight;
(C2) dividing the training sample X into C subsets according to the number of categories;
(C3) calculating the offset Tikhonov matrix gamma of the l-th class and the test sample yl,y
(C4) Calculating the approximate value of the test sample of the l-th class
Figure BDA0002293850040000092
(C5) Repeating the steps (C3) and (C4) C times
(C6) Respectively calculating the distance r between each class and ylBy passing
Figure BDA0002293850040000093
Obtaining a classification of y;
(C7) if the classification result is normal, monitoring the next piece of network data; and if the classification result is abnormal, a warning is given out.
The offset Tikhonov matrix gammal,yThe calculation formula is as follows:
Figure BDA0002293850040000094
wherein x1,x2,…,xnSubspace X forming class Il
Approximate values of the test sample
Figure BDA0002293850040000095
The calculation formula is as follows:
Figure BDA0002293850040000101
said approximation
Figure BDA0002293850040000102
The distance from sample y is calculated as follows:
Figure BDA0002293850040000103
the network security intrusion detection method based on the factor analysis and the subspace collaborative representation can obtain the contribution weight mean value of each factor by utilizing four factor analysis methods under the condition of obtaining limited network data factors, and then can predict whether the network receives the attack by utilizing N factors with the maximum contribution weight and utilizing a classification algorithm of the subspace collaborative representation, and the method can achieve the lowest accuracy rate of 97.6 percent through detection.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (10)

1. A network security intrusion detection method based on factor analysis and subspace collaborative representation is characterized in that: comprises the following steps of (a) carrying out,
step (A), standardizing and normalizing network data;
step (B), performing factor analysis on the network data, obtaining the contribution weight of each factor by using four factor analysis algorithms, and calculating the contribution weight mean value of each factor;
step (C), extracting N factors with the largest contribution weight, detecting the network data by utilizing a subspace collaborative representation classification algorithm, and returning to the step (A) if the detection is normal, and evaluating the network data situation at the next moment or in the next time period; if the detection result is abnormal, the detection result is an abnormal event, and an alarm is output.
2. The method for detecting network security intrusion based on factor analysis and subspace collaborative representation according to claim 1, wherein: step (A), the network data comprises TCP connection basic characteristics, TCP connection content characteristics, time-based network flow statistical characteristics and host-based network flow statistical characteristics;
the TCP connection base characteristics include the following factors: network connection duration, network protocol type, network service type of the target host, number of bytes of data from the source host to the target host, number of bytes of data from the target host to the source host;
the content characteristics of the TCP connection include the following factors: the number of failed login attempts, whether login was successful, and the number of occurrences of the completed condition;
the time-based network traffic statistics include the following factors: the number of connections having the same target host as the current connection in the past two seconds, the number of connections having the same service as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the percentage of connections having "SYN" error in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same service as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having "REJ" error in the connection having the same service as the current connection in the past two seconds, the percentage of connections having the same service as the current connection in the connection having the same target host as the current connection in the past two seconds, the percentage of connections having the same target host as the current connection in the past two seconds, the connection having the same target host as the current connection in the current connection, percentage of connections with different services from the current connection, percentage of connections with different target hosts from the current connection within the last two seconds, in connections with the same services as the current connection;
the host-based network traffic statistics include the following factors: of the first 100 connections, the number of connections having the same destination host as the current connection, the percentage of connections having the same service as the current connection and the same destination host as the current connection among the first 100 connections, the percentage of connections having different service from the current connection and the same source port as the current connection among the first 100 connections, the percentage of connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having different source hosts from the current connection among the connections having the same destination host as the current connection among the first 100 connections, the percentage of connections having "SYN" errors among the connections having the same destination host as the current connection among the first 100 connections, and the first 100 connections, the percentage of the connections with the same service as the current connection and the same target host, in which the "SYN" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, in which the "REJ" error occurs, the percentage of the connections with the same target host as the current connection, in the first 100 connections, and the percentage of the connections with the same service as the current connection and the same target host as the current connection, in which the "REJ" error occurs.
3. The method for detecting network security intrusion based on factor analysis and subspace collaborative representation according to claim 1, wherein: a step (A) of normalizing said data, comprising the steps of,
(A1) and calculating the average value of the samples,
Figure FDA0002293850030000021
wherein X is a data sample;
(A2) and the standard deviation of the samples is calculated,
Figure FDA0002293850030000022
(A3) normalizing the data according to the mean and standard deviation of the sample,
Figure FDA0002293850030000023
4. the method for detecting network security intrusion based on factor analysis and subspace collaborative representation according to claim 1, wherein: a step (A) of normalizing the data, comprising the steps of,
(A4) calculating the minimum value of the sample, Xmin=min{Xi'j};
(A5) Calculating the maximum value of the sample, Xmax=max{Xi'j};
(A6) Normalizing the data according to the minimum and maximum values of the samples,
Figure FDA0002293850030000024
5. the method for detecting network security intrusion based on factor analysis and subspace collaborative representation according to claim 1, wherein: step (B), the network data is subjected to factor analysis, the contribution weight of each factor is obtained by utilizing four factor analysis algorithms, and the contribution weight mean value of each factor is calculated, the method comprises the following steps,
(B1) calculating the contribution weight of each factor by using a variance threshold filtering method;
(B2) calculating the contribution weight of each factor by using a characteristic selection method based on mutual information;
(B3) calculating the contribution weight of each factor by using a feature selection method based on Lasso regression;
(B4) calculating the contribution weight of each factor by using a feature selection method based on a Relieff algorithm;
(B5) the 4 contribution weights for each factor are averaged.
6. The method according to claim 5, wherein the intrusion detection method based on factor analysis and subspace collaborative representation comprises: a method (B1), the variance threshold filtering method, comprising the steps of,
(1) calculating the variance var (i) of each factor;
(2) sorting the obtained variances in a descending order;
(3) factors of variance greater than the threshold T are truncated as a filtered result.
7. The method according to claim 5, wherein the intrusion detection method based on factor analysis and subspace collaborative representation comprises: a method (B2), the mutual information based feature selection method, comprising the steps of,
(1) the feature matrix is recorded as
Figure FDA0002293850030000031
The class (label) vector is
Figure FDA0002293850030000032
Where n is the number of samples, s is the number of features, xiIs the ith eigenvector (i ═ 1, …, s);
(2) calculating each feature vector xiMutual information mi (i) with Y;
(3) sequencing the acquired mutual information MI in a descending order;
(4) intercepting a factor of mutual information larger than a threshold value T as a result of feature selection;
the mutual information MI: the calculation formula is as follows,
Figure FDA0002293850030000033
where X and Y are two discrete random variables, p (X, Y) is the joint probability distribution of X and Y, and p (X) and p (Y) are the edge distribution probabilities of X, Y, respectively.
8. The method according to claim 5, wherein the intrusion detection method based on factor analysis and subspace collaborative representation comprises: method (B3), the feature selection method based on Lasso regression, comprising the following steps,
(1) the feature matrix is recorded as
Figure FDA0002293850030000034
The class (label) vector is
Figure FDA0002293850030000035
Wherein n is the number of samples and s is the number of features;
(2) performing Lasso regression on X and Y;
(3) sorting the obtained Lasso regression results in a descending order;
(4) and intercepting a factor of which the regression result is larger than the threshold value T as a result of feature selection.
9. The method according to claim 5, wherein the intrusion detection method based on factor analysis and subspace collaborative representation comprises: method (B4), the feature selection method based on the Relieff algorithm, comprising the following steps,
(1) the feature matrix is recorded as
Figure FDA0002293850030000036
The class (label) vector is
Figure FDA0002293850030000037
Wherein n is the number of samples and s is the number of features;
(2) calculating the weight W of each factor by utilizing a Relief algorithm;
(3) sorting the obtained factor weights in a descending order;
(4) taking a factor of which the interception result is greater than a threshold value T as a result of feature selection;
the calculation method of the weight W in the step (2) is as follows:
(1) randomly selecting a sample point R in Xi
(2) Find and RiSimilar k nearest neighbor samples Hj
(3) For each C ≠ class (R)i) Find R respectivelyiDifferent classes of k nearest neighbor samples Mj(C);
(4) And circulating for p times, updating the contribution weight of each factor, wherein the updating formula is as follows:
Figure FDA0002293850030000041
(5) and (4) repeating the steps (1), (2), (3) and (4) for m times.
10. The method for detecting network security intrusion based on factor analysis and subspace collaborative representation according to claim 1, wherein: the step (C) of extracting N factors with the maximum contribution weight and detecting the network data by utilizing a subspace collaborative representation classification algorithm comprises the following steps,
(C1) inputting training samples X with the category number of C and corresponding labels B, samples y to be tested and parameters lambda, and extracting N factors with the maximum contribution weight;
(C2) dividing the training sample X into C subsets according to the number of categories;
(C3) calculating the offset Tikhonov matrix gamma of the l-th class and the test sample yl,y
(C4) Calculating the approximate value of the test sample of the l-th class
Figure FDA0002293850030000042
(C5) Repeating the steps (C3) and (C4) C times
(C6) Respectively calculating the distance r between each class and ylBy passing
Figure FDA0002293850030000043
Obtaining a classification of y;
(C7) if the classification result is normal, monitoring the next piece of network data; if the classification result is abnormal, a warning is given out;
the offset Tikhonov matrix gammal,yThe calculation formula is as follows:
Figure FDA0002293850030000044
wherein x1,x2,…,xnSubspace X forming class Il
Approximate values of the test sample
Figure FDA0002293850030000045
The calculation formula is as follows:
Figure FDA0002293850030000051
said approximation
Figure FDA0002293850030000052
The distance from sample y is calculated as follows:
Figure FDA0002293850030000053
CN201911192193.4A 2019-11-28 2019-11-28 Network security intrusion detection method based on factor analysis and subspace collaborative representation Pending CN110995692A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911192193.4A CN110995692A (en) 2019-11-28 2019-11-28 Network security intrusion detection method based on factor analysis and subspace collaborative representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911192193.4A CN110995692A (en) 2019-11-28 2019-11-28 Network security intrusion detection method based on factor analysis and subspace collaborative representation

Publications (1)

Publication Number Publication Date
CN110995692A true CN110995692A (en) 2020-04-10

Family

ID=70088137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911192193.4A Pending CN110995692A (en) 2019-11-28 2019-11-28 Network security intrusion detection method based on factor analysis and subspace collaborative representation

Country Status (1)

Country Link
CN (1) CN110995692A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597835A (en) * 2020-12-11 2021-04-02 国汽(北京)智能网联汽车研究院有限公司 Driving behavior evaluation method and device, electronic equipment and readable storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101081875B1 (en) * 2010-08-09 2011-11-09 국방과학연구소 Prealarm system and method for danger of information system
CN104079452A (en) * 2014-06-30 2014-10-01 电子科技大学 Data monitoring technology and network traffic abnormality classifying method
US20150135318A1 (en) * 2013-11-12 2015-05-14 Macau University Of Science And Technology Method of detecting intrusion based on improved support vector machine
CN104899507A (en) * 2015-06-08 2015-09-09 桂林电子科技大学 Detecting method for abnormal intrusion of large high-dimensional data of network
CN105376260A (en) * 2015-12-18 2016-03-02 重庆邮电大学 Network abnormity flow monitoring system based on density peak value cluster
CN106411854A (en) * 2016-09-06 2017-02-15 中国电子技术标准化研究院 Network security risk assessment method based on fuzzy Bayes
CN110061868A (en) * 2019-04-04 2019-07-26 中国航天***科学与工程研究院 A kind of network security Measure Indexes system discrimination evaluation model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101081875B1 (en) * 2010-08-09 2011-11-09 국방과학연구소 Prealarm system and method for danger of information system
US20150135318A1 (en) * 2013-11-12 2015-05-14 Macau University Of Science And Technology Method of detecting intrusion based on improved support vector machine
CN104079452A (en) * 2014-06-30 2014-10-01 电子科技大学 Data monitoring technology and network traffic abnormality classifying method
CN104899507A (en) * 2015-06-08 2015-09-09 桂林电子科技大学 Detecting method for abnormal intrusion of large high-dimensional data of network
CN105376260A (en) * 2015-12-18 2016-03-02 重庆邮电大学 Network abnormity flow monitoring system based on density peak value cluster
CN106411854A (en) * 2016-09-06 2017-02-15 中国电子技术标准化研究院 Network security risk assessment method based on fuzzy Bayes
CN110061868A (en) * 2019-04-04 2019-07-26 中国航天***科学与工程研究院 A kind of network security Measure Indexes system discrimination evaluation model

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WEI LI ETL: "Nearest Regularized Subspace for Hyperspectral Classification", 《IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING》 *
吴欣欣: "基于因子分析的混合贝叶斯入侵检测算法", 《湖南工业大学学报》 *
郭通: "基于自适应流抽样测量的网络异常检测技术研究", 《中国博士学位论文全文数据库》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112597835A (en) * 2020-12-11 2021-04-02 国汽(北京)智能网联汽车研究院有限公司 Driving behavior evaluation method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN108289088B (en) Abnormal flow detection system and method based on business model
CN111669375B (en) Online safety situation assessment method and system for power industrial control terminal
US20070226803A1 (en) System and method for detecting internet worm traffics through classification of traffic characteristics by types
CN107040517B (en) Cognitive intrusion detection method oriented to cloud computing environment
CN106973038B (en) Network intrusion detection method based on genetic algorithm oversampling support vector machine
CN111107102A (en) Real-time network flow abnormity detection method based on big data
CN111885059B (en) Method for detecting and positioning abnormal industrial network flow
WO2016082284A1 (en) Modbus tcp communication behaviour anomaly detection method based on ocsvm dual-profile model
CN110572413A (en) Low-rate denial of service attack detection method based on Elman neural network
WO2007055222A1 (en) Network failure detection method and network failure detection system
EP2415229A1 (en) Method and system for alert classification in a computer network
CN110351291B (en) DDoS attack detection method and device based on multi-scale convolutional neural network
CN109784668B (en) Sample feature dimension reduction processing method for detecting abnormal behaviors of power monitoring system
CN109067722A (en) A kind of LDoS detection method based on two steps cluster and detection lug analysis joint algorithm
Dhakar et al. A novel data mining based hybrid intrusion detection framework
CN115987615A (en) Network behavior safety early warning method and system
CN114422184A (en) Network security attack type and threat level prediction method based on machine learning
Aiello et al. A similarity based approach for application DoS attacks detection
CN113225358A (en) Network security risk assessment system
CN110719270A (en) FCM algorithm-based slow denial of service attack detection method
CN115021997A (en) Network intrusion detection system based on machine learning
CN114826770A (en) Big data management platform for intelligent analysis of computer network
CN117336055A (en) Network abnormal behavior detection method and device, electronic equipment and storage medium
CN110995692A (en) Network security intrusion detection method based on factor analysis and subspace collaborative representation
CN113902052A (en) Distributed denial of service attack network anomaly detection method based on AE-SVM model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200410

RJ01 Rejection of invention patent application after publication