CN113315747A - Computer network anomaly detection method - Google Patents

Computer network anomaly detection method Download PDF

Info

Publication number
CN113315747A
CN113315747A CN202011237830.8A CN202011237830A CN113315747A CN 113315747 A CN113315747 A CN 113315747A CN 202011237830 A CN202011237830 A CN 202011237830A CN 113315747 A CN113315747 A CN 113315747A
Authority
CN
China
Prior art keywords
value
model
time
week
abnormal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011237830.8A
Other languages
Chinese (zh)
Inventor
肖守柏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanchang Institute of Technology
Original Assignee
Nanchang Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanchang Institute of Technology filed Critical Nanchang Institute of Technology
Priority to CN202011237830.8A priority Critical patent/CN113315747A/en
Publication of CN113315747A publication Critical patent/CN113315747A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • H04L63/145Countermeasures against malicious traffic the attack involving the propagation of malware through the network, e.g. viruses, trojans or worms

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Algebra (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Pure & Applied Mathematics (AREA)
  • Complex Calculations (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a computer network anomaly detection method, which comprises the following steps: s1, preprocessing, S2, zero averaging, S3, fitting an AR model, S4, detecting abnormity, establishing a normal behavior mode of flow through a variance analysis method, eliminating the influence of work and rest time on network flow, establishing an AR (2) model in a sliding window, and detecting abnormity in the form of the sliding window.

Description

Computer network anomaly detection method
Technical Field
The invention relates to the technical field of computers, in particular to a computer network anomaly detection method.
Background
With the rapid development of computer networks, there are increasing network threats and other network-related problems, such as network attacks, data theft, viruses, worms, malicious port scanning activities, etc., which are more quickly applied, faster in change rate, and more complex. Currently, despite peripheral defense, cyber threats are directly submerged through a computer network, and thus many threat detection tools have emerged, and thus a computer network anomaly detection method has been proposed.
Disclosure of Invention
The present invention is directed to a method for detecting network anomaly in a computer, so as to solve the problems mentioned in the background art.
In order to achieve the purpose, the invention provides the following technical scheme:
a computer network anomaly detection method comprises the following steps:
s1, preProcessing, collecting network force data with time of 8 weeks in advance, collecting data with interval time of 5min, 5 working days per week (not considering rest days), 24h per day, taking week as unit, averaging the data of 8 weeks, deleting some bad values due to influence of uncertain factors, and eliminating bad values according to Grabbs criterion, specifically, if the network force data is collected in advance for 8 weeks, specifically, if the network force data is collected in 5 working days per week (not considering rest days), the network force data is collected in 24 hours per day, taking week as unit, the data of 8 weeks is averaged, and the bad values are eliminated according to Grabbs criterion
Figure RE-GDA0003109481560000011
Denotes x1, x2,……x8V represents their standard deviation, i.e.
Figure RE-GDA0003109481560000012
If xiSatisfy | xi|>kv,
X is theniFor bad value, it should be eliminated and used x1,x2,……x8Wherein k is a grabbs criterion coefficient, k corresponding to a confidence interval of 95% being 2.03;
the 1440 data obtained above are the average of 8-week-history data and can be represented as Sij(j 1,2,3, 4, 5; i 1,2, … …, 288), which is called the normal behavior pattern of the flow parameter at the ith moment of the jth working day of a week, and intuitively considered that if the current observation value has a significant deviation from the current observation value, the current observation value is considered as abnormal;
the overall mean of the observations over the week is denoted by μ, α i represents the deviation of the mean from the overall mean μ at the ith time of day, β j is the deviation of the mean from the overall mean μ on the jth day of the week, i.e.
Figure RE-GDA0003109481560000021
Then
Figure RE-GDA0003109481560000022
Then observing the value S for each timeijDecomposing S as followsijDivided into four parts, i.e.
sij=μ+αi+βj+yij
That is, yij=sij-μ-αi-βj。
I.e. converting the original observation sequence into yijThe conditions of different working days and different time of the same day when the network is used are greatly different, and the influence of different working days in a week and different time of each day can be all observed from the original observed value S through the processingijWhere similarly, for those networks that are constantly changing, the converted sequence y is represented for ease of representationijArranged in chronological order, denoted as yiIn practical applications, μ, α i (i ═ 1,2, …, 288), β j (j ═ 1,2,3, 4, 5) can be updated once a week.
S2, zero-averaging, for sliding window zero-averaging, although the observed value sequence of the network flow is not stable in general, we can regard its local part as statistically approximate stability, the local part is taken as a sliding time window, the window size is set as N +1, each time, N +1 number is taken out, the N +1 number of the sliding window is expressed as y1,y2...,yN,yN+1Establishing an Autoregressive (AR) model by using the first N numbers to judge whether the (N + 1) th number is abnormal (in real-time application, a time window is continuously slid once); to establish an AR model, the first N numbers are zero-averaged, and
Figure RE-GDA0003109481560000031
denotes y1,y2...,yNAverage value of (i), i.e.
Figure RE-GDA0003109481560000032
Figure RE-GDA0003109481560000033
Then x1,x2,...xN,xN+1Is a zero mean time series.
S3, fitting the AR model, and firstly selecting a proper model order rho due to the observation value sequence { xiGenerally speaking, t (1, 2,3, …) is unstable, but it is assumed that the sliding time window is approximately stable, so the size N of the window should not be too large, on the other hand, when the autoregressive model AR (ρ) fits the time series, its accuracy can be measured by Akaike's FPE (final Prediction error), and the order ρ of the AR corresponding to the minimum FPE is the best model order, but there is a constraint condition about N and ρ: p is more than or equal to 0 and less than or equal to 0.1N.
Further, since the order of the autoregressive model AR (ρ) is too large, which results in a large amount of calculation, and since real-time detection is desired, an excessively large value of ρ should not be selected, the second-order autoregressive model AR (2) and N satisfying the above condition are generally taken as 20 in our algorithm.
By time series x1,x2,...xN,xNFitting a second order auto-regressive model AR (2) has a linear formula calculated directly the model of AR (2) is:
Figure RE-GDA0003109481560000041
here, the
Figure RE-GDA0003109481560000042
And
Figure RE-GDA0003109481560000043
coefficient representing AR (2), etIs white noise which is a Gaussian random variable with independent and same distribution, the mean value is zero, and the variance is
Figure RE-GDA0003109481560000044
By x1,x2,...xNTo estimate the parameters of AR (2) model
Figure RE-GDA0003109481560000045
i is 1,2 and
Figure RE-GDA0003109481560000046
the specific calculation process is given as follows
Figure RE-GDA0003109481560000047
T represents the transpose of the matrix, then the coefficients
Figure RE-GDA0003109481560000048
The following estimate is given:
Figure RE-GDA0003109481560000049
white noise etVariance of (2)
Figure RE-GDA00031094815600000410
Is composed of
Figure RE-GDA00031094815600000411
Furthermore, it is possible to provide a liquid crystal display device,
Figure RE-GDA00031094815600000412
Figure RE-GDA00031094815600000413
Figure RE-GDA00031094815600000414
Figure RE-GDA00031094815600000415
the above equations (1) and (2) are the estimation of the AR (2) parameter, and as can be seen from the above equations, the AR (2) parameter can be obtained from the linear estimation of the time series data.
S4, abnormality detection, and in the last step, detection is performed by using an AR (2) model, wherein the AR (2) model is,
Figure RE-GDA0003109481560000051
namely, it is
Figure RE-GDA0003109481560000052
If B is a step-back operator, i.e. xt-1As long as Bxt is satisfied,
Figure RE-GDA0003109481560000053
wherein the content of the first and second substances,
Figure RE-GDA0003109481560000054
thus, it is possible to provide
Figure RE-GDA0003109481560000055
Order to
Figure RE-GDA0003109481560000056
And define
Figure RE-GDA0003109481560000057
Preferably, the S4 sigma2Is a residual e representing N corresponding residuals backward from the current time in the time seriestMean of the sum of squares, λ represents the residual to σ ratio of the current observation, then λ is taken as the detection xN+1Whether the measurement is abnormal; when lambda is<-L or λ>When U is, xN+1Are outliers where L and U are preset constants greater than zero>The condition of U means that the abnormal value is larger than the normal value, and the size of the statistic lambda marks the deviation of the abnormal point from the normal value, namely, the larger lambda is, the larger the deviation of the abnormal value from the normal range is; for lambda<The case of-L means that the abnormal value is smaller than the normal value, and the smaller λ indicates that the abnormal value deviates from the normal range.
The invention establishes a normal behavior mode of flow through a variance analysis method, eliminates the influence of work and rest time on network flow, establishes an AR (2) model in a sliding window, detects abnormity in the form of the sliding window, applies the algorithm to the detection of non-unicast packet number in an actual network, detects broadcast data generated by experiments in the network, also applies to the detection of 3 different flow parameter abnormity in a simulation experiment network, detects the abnormity of flow when a server fails, and has high detection rate.
Detailed Description
The technical solutions in the embodiments of the present invention will be described below clearly and completely with reference to the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
The invention provides a technical scheme that:
a computer network anomaly detection method comprises the following steps:
s1, preprocessing, collecting network force data with 8 weeks in advance, averaging the data of the 8 weeks with the interval time of 5min, 5 working days per week (without considering rest days) and 24h per day in units of weeks, deleting some bad values due to the influence of uncertain factors, and removing the bad values according to the Grabbs criterion, specifically, if the bad values are not deleted, the network force data is collected in advance for 8 weeks
Figure RE-GDA0003109481560000061
Denotes x1, x2,……x8V represents their standard deviation, i.e.
Figure RE-GDA0003109481560000062
If xiSatisfy | xi|>kv,
X is theniFor bad value, it should be eliminated and used x1,x2,……x8Wherein k is a grabbs criterion coefficient, k corresponding to a confidence interval of 95% being 2.03;
the 1440 data obtained above are the average of 8-week-history data and can be represented as Sij(j 1,2,3, 4, 5; i 1,2, … …, 288), which is called the normal behavior pattern of the flow parameter at the ith moment of the jth working day of a week, and intuitively considered that if the current observation value has a significant deviation from the current observation value, the current observation value is considered as abnormal;
the overall mean of the observations over the week is denoted by μ, α i represents the deviation of the mean from the overall mean μ at the ith time of day, β j is the deviation of the mean from the overall mean μ on the jth day of the week, i.e.
Figure RE-GDA0003109481560000071
Then
Figure RE-GDA0003109481560000072
Then observing the value S for each timeijDecomposing S as followsijDivided into four parts, i.e.
sij=μ+αi+βj+yij
That is, yij=sij-μ-αi-βj。
I.e. converting the original observation sequence into yijDifferent toolsThe conditions of the network used at different working days and different moments of the same day are greatly different, and the influences of the different working days in the same week and the different moments of the day are all observed from the original observed value S through the processingijWhere similarly, for those networks that are constantly changing, the converted sequence y is represented for ease of representationijArranged in chronological order, denoted as yiIn practical applications, μ, α i (i ═ 1,2, …, 288), β j (j ═ 1,2,3, 4, 5) can be updated once a week.
S2, zero-averaging, for sliding window zero-averaging, although the observed value sequence of the network flow is not stable in general, we can regard its local part as statistically approximate stability, the local part is taken as a sliding time window, the window size is set as N +1, each time, N +1 number is taken out, the N +1 number of the sliding window is expressed as y1,y2...,yN,yN+1Establishing an Autoregressive (AR) model by using the first N numbers to judge whether the (N + 1) th number is abnormal (in real-time application, a time window is continuously slid once); to establish an AR model, the first N numbers are zero-averaged, and
Figure RE-GDA0003109481560000081
denotes y1,y2...,yNAverage value of (i), i.e.
Figure RE-GDA0003109481560000082
Figure RE-GDA0003109481560000083
Then x1,x2,...xN,xN+1Is a zero mean time series.
S3, fitting the AR model, and firstly selecting a proper model order rho due to the observation value sequence { xiGenerally speaking, but assuming that the sliding time window is approximately stationary, the window size N should not be too large, but on the other hand, when the autoregressive model AR (ρ) fits the time series, it is accurateThe performance can be measured by Akaike's FPE (final Prediction error), and the order p of AR corresponding to the minimum FPE is the optimal model order, but there are constraints on N and p: p is more than or equal to 0 and less than or equal to 0.1N.
Further, since the order of the autoregressive model AR (ρ) is too large, which results in a large amount of calculation, and since real-time detection is desired, an excessively large value of ρ should not be selected, the second-order autoregressive model AR (2) and N satisfying the above condition are generally taken as 20 in our algorithm.
By time series x1,x2,...xN,xNFitting a second order auto-regressive model AR (2) has a linear formula calculated directly the model of AR (2) is:
Figure RE-GDA0003109481560000091
here, the
Figure RE-GDA0003109481560000092
And
Figure RE-GDA0003109481560000093
coefficient representing AR (2), etIs white noise which is a Gaussian random variable with independent and same distribution, the mean value is zero, and the variance is
Figure RE-GDA0003109481560000094
By x1,x2,...xNTo estimate the parameters of AR (2) model
Figure RE-GDA0003109481560000095
i is 1,2 and
Figure RE-GDA0003109481560000096
the specific calculation process is given as follows
Figure RE-GDA0003109481560000097
T represents the transpose of the matrix, then the coefficients
Figure RE-GDA0003109481560000098
The following estimate is given:
Figure RE-GDA0003109481560000099
white noise etVariance of (2)
Figure RE-GDA00031094815600000910
Is composed of
Figure RE-GDA00031094815600000911
Furthermore, it is possible to provide a liquid crystal display device,
Figure RE-GDA00031094815600000912
Figure RE-GDA00031094815600000913
Figure RE-GDA00031094815600000914
Figure RE-GDA00031094815600000915
the above equations (1) and (2) are the estimation of the AR (2) parameter, and as can be seen from the above equations, the AR (2) parameter can be obtained from the linear estimation of the time series data.
S4, abnormality detection, and in the last step, detection is performed by using an AR (2) model, wherein the AR (2) model is,
Figure RE-GDA0003109481560000101
namely, it is
Figure RE-GDA0003109481560000102
If B is a step-back operator, i.e. xt-1As long as Bxt is satisfied,
Figure RE-GDA0003109481560000103
wherein the content of the first and second substances,
Figure RE-GDA0003109481560000104
thus, it is possible to provide
Figure RE-GDA0003109481560000105
Order to
Figure RE-GDA0003109481560000106
And define
Figure RE-GDA0003109481560000107
The S4 sigma2Is a residual e representing N corresponding residuals backward from the current time in the time seriestMean of the sum of squares, λ represents the residual to σ ratio of the current observation, then λ is taken as the detection xN+1Whether the measurement is abnormal; when lambda is<-L or λ>When U is, xN+1Are outliers where L and U are preset constants greater than zero>The condition of U means that the abnormal value is larger than the normal value, and the size of the statistic lambda marks the size of the abnormal point deviating from the normal value, namely, the larger lambda is, the larger the abnormal value deviates from the normal range is; for lambda<The case of-L means that the abnormal value is smaller than the normal value, and the smaller λ indicates the larger the abnormal value deviates from the normal range.
Example (b): preprocessing, collecting network force data with time of 8 weeks in advance, collecting data with interval time of 5min, and working 5 times per weekAveraging the data of 8 weeks every day (without considering rest day) 24h every day in week, deleting some bad values due to uncertain factors, and eliminating the bad values according to Grabbs criterion, specifically, if the bad values are not deleted
Figure RE-GDA0003109481560000111
Denotes x1,x2,……x8V represents their standard deviation, i.e.
Figure RE-GDA0003109481560000112
If xiSatisfy | xi|>kv,
X is theniFor bad value, it should be eliminated and used x1,x2,……x8Wherein k is a grabbs criterion coefficient, k corresponding to a confidence interval of 95% being 2.03;
the 1440 data obtained above are the average of 8-week-history data and can be represented as Sij(j 1,2,3, 4, 5; i 1,2, … …, 288), which is called the normal behavior pattern of the flow parameter at the ith moment of the jth working day of a week, and intuitively considered that if the current observation value has a significant deviation from the current observation value, the current observation value is considered as abnormal;
the overall mean of the observations over the week is denoted by μ, α i represents the deviation of the mean from the overall mean μ at the ith time of day, β j is the deviation of the mean from the overall mean μ on the jth day of the week, i.e.
Figure RE-GDA0003109481560000113
Then
Figure RE-GDA0003109481560000114
Then observing the value S for each timeijIs carried out as followsLower decomposition, SijDivided into four parts, i.e.
sij=μ+αi+βj+yij
That is, yij=sij-μ-αi-βj。
I.e. converting the original observation sequence into yijThe conditions of different working days and different time of the same day when the network is used are greatly different, and the influence of different working days in a week and different time of each day can be all observed from the original observed value S through the processingijWhere similarly, for those networks that are constantly changing, the converted sequence y is represented for ease of representationijArranged in chronological order, denoted as yiIn practical applications, μ, α i (i ═ 1,2, …, 288), β j (j ═ 1,2,3, 4, 5) can be updated once a week.
Zero-averaging, for sliding window zero-averaging, although the observed value sequence of the network traffic is not stable as a whole, we can regard its local part as a statistically approximate stability, the local part is taken as a sliding time window, the window size is set to N +1, N +1 numbers are taken out each time, the N +1 numbers of the sliding window are represented as y1,y2...,yN,yN+1Establishing an auto-regression (AR) model by using the first N numbers to judge whether the (N + 1) th number is abnormal (in the real-time application, the time window is continuously slid forward once); to establish the AR model, the first N numbers are zero-averaged, and
Figure RE-GDA0003109481560000121
denotes y1,y2...,yNAverage value of (i), i.e.
Figure RE-GDA0003109481560000122
Figure RE-GDA0003109481560000123
Then x1,x2,...xN,xN+1Is a zero mean time series.
Fitting the AR model by first selecting the appropriate model order ρ due to the observation sequence { x }iGenerally speaking, t (1, 2,3, …) is unstable, but it is assumed that the sliding time window is approximately stable, so the size N of the window should not be too large, on the other hand, when the autoregressive model AR (ρ) fits the time series, its accuracy can be measured by Akaike's FPE (final Prediction error), and the order ρ of the AR corresponding to the minimum FPE is the best model order, but there is a constraint condition about N and ρ: p is more than or equal to 0 and less than or equal to 0.1N.
Further, since the order of the autoregressive model AR (ρ) is too large, which results in a large amount of calculation, and since real-time detection is desired, an excessively large value of ρ should not be selected, the second-order autoregressive model AR (2) and N satisfying the above condition are generally taken as 20 in our algorithm.
By time series x1,x2,...xN,xNFitting a second order auto-regressive model AR (2) has a linear formula calculated directly the model of AR (2) is:
Figure RE-GDA0003109481560000131
here, the
Figure RE-GDA0003109481560000132
And
Figure RE-GDA0003109481560000133
coefficient representing AR (2), etIs white noise which is a Gaussian random variable with independent and same distribution, the mean value is zero, and the variance is
Figure RE-GDA0003109481560000134
By x1,x2,...xNTo estimate the parameters of AR (2) model
Figure RE-GDA0003109481560000135
i is 1,2 and
Figure RE-GDA0003109481560000136
the specific calculation process is given as follows
Figure RE-GDA0003109481560000137
T represents the transpose of the matrix, then the coefficients
Figure RE-GDA0003109481560000138
The following estimate is given:
Figure RE-GDA0003109481560000139
white noise etVariance of (2)
Figure RE-GDA00031094815600001310
Is composed of
Figure RE-GDA00031094815600001311
Furthermore, it is possible to provide a liquid crystal display device,
Figure RE-GDA00031094815600001312
Figure RE-GDA0003109481560000141
Figure RE-GDA0003109481560000142
Figure RE-GDA0003109481560000143
the above equations (1) and (2) are the estimation of the AR (2) parameter, and as can be seen from the above equations, the AR (2) parameter can be obtained from the linear estimation of the time series data.
S4, abnormality detection, and in the last step, detection is performed by using an AR (2) model, wherein the AR (2) model is,
Figure RE-GDA0003109481560000144
namely, it is
Figure RE-GDA0003109481560000145
If B is a step-back operator, i.e. xt-1As long as Bxt is satisfied,
Figure RE-GDA0003109481560000146
wherein the content of the first and second substances,
Figure RE-GDA0003109481560000147
thus, it is possible to provide
Figure RE-GDA0003109481560000148
Order to
Figure RE-GDA0003109481560000149
And define
Figure RE-GDA00031094815600001410
S4σ2Is a residual e representing N corresponding residuals backward from the current time in the time seriestSum of squares
When lambda is<-L or λ>When U is, xN+1Are outliers where L and U are preset constants greater than zero>The case of U means that the abnormal value is larger than the normal value, and the magnitude of the statistic λ indicates the magnitude of the abnormal point deviating from the normal range, that is, the larger λ is, the more the abnormal value deviates from the normal rangeLarge; for lambda<The case of-L means that the abnormal value is smaller than the normal value, and the smaller λ indicates the larger the abnormal value deviates from the normal range.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the scope of the present invention, and the technical solutions and the inventive concepts of the present invention are equivalent to or changed within the technical scope of the present invention.

Claims (2)

1. A computer network anomaly detection method is characterized by comprising the following steps:
s1, preprocessing, collecting network force data with time of 8 weeks in advance, averaging the data of 8 weeks with the interval time of 5min, 5 working days per week (without considering rest days) and 24h per day in units of weeks, deleting some bad values due to influence of uncertain factors, and removing the bad values according to the Grabbs criterion, specifically, if the network force data is collected in advance for 8 weeks
Figure RE-FDA0003057476140000011
Denotes x1,x2,……x8V represents their standard deviation, i.e.
Figure RE-FDA0003057476140000012
If xiSatisfy | xi|>kv,
X is theniFor bad value, it should be eliminated and used x1,x2,……x8Wherein k is a grabbs criterion coefficient, k corresponding to a confidence interval of 95% being 2.03;
the 1440 data obtained above were the average of 8-week-history data and can be expressedIs Sij(j 1,2,3, 4, 5; i 1,2, … …, 288), which is called the normal behavior pattern of the flow parameter at the ith moment of the jth working day of the week, and intuitively considers that if the current observation value has a significant deviation from the current observation value, the current observation value is considered as abnormal;
the overall mean of the observations over the week is denoted by μ, α i represents the deviation of the mean from the overall mean μ at the ith time of day, β j is the deviation of the mean from the overall mean μ on the jth day of the week, i.e.
Figure RE-FDA0003057476140000013
Then
Figure RE-FDA0003057476140000021
jβj=0,
Then observing the value S for each timeijDecomposing S as followsijDivided into four parts, i.e.
sij=μ+αi+βj+yij
That is, yij=sij-μ-αi-βj。
I.e. converting the original observation sequence into yijThe conditions of the networks used on different working days and at different times of the same day are greatly different, and the influences of the different working days in a week and the different times of the day are all obtained from the original observed value S through the processingijWhere similarly, for those networks that are constantly changing, the converted sequence y is represented for ease of representationijArranged in chronological order, denoted as yiIn practical applications, μ, α i (i ═ 1,2, …, 288), β j (j ═ 1,2,3, 4, 5) can be updated once a week.
S2, zero-averaging, for sliding window zero-averaging, although the observed sequence of network traffic is not stationary in general, we can consider its local part as a statistically similar stationary partMaking a sliding time window, setting the size of the window as N +1, taking out N +1 numbers each time, and expressing the N +1 number of the sliding window as y1,y2...,yN,yN+1Establishing an Autoregressive (AR) model by using the first N numbers to judge whether the (N + 1) th number is abnormal (in the real-time application, the time window is continuously slid once); to establish the AR model, the first N numbers are zero-averaged, and
Figure RE-FDA0003057476140000022
denotes y1,y2...,yNAverage value of (i), i.e.
Figure RE-FDA0003057476140000023
Figure RE-FDA0003057476140000024
Then x1,x2,...xN,xN+1Is a zero mean time series.
S3, fitting the AR model, and firstly selecting a proper model order rho due to the observation value sequence { x }iGenerally speaking, t (1, 2,3, …) is unstable, but it is assumed that the sliding time window is approximately stable, so the size N of the window should not be too large, on the other hand, when the autoregressive model AR (ρ) fits the time series, its accuracy can be measured by Akaike's FPE (final Prediction error), and the order ρ of AR corresponding to the minimum FPE is the optimal model order, but there are constraints on N and ρ: p is more than or equal to 0 and less than or equal to 0.1N.
Further, since the order of the autoregressive model AR (ρ) is too large, which results in a large amount of calculation, and since real-time detection is desired, an excessively large value of ρ should not be selected, the second-order autoregressive model AR (2) commonly used and N satisfying the above condition are taken as 20 in our algorithm.
By time series x1,x2,...xN,xNFitting a second order auto-regressive model AR (2) has a linear formula that is calculated directly, the model of AR (2) is:
Figure RE-FDA0003057476140000031
here, the
Figure RE-FDA0003057476140000032
And
Figure RE-FDA0003057476140000033
coefficient representing AR (2), etIs white noise which is an independent and identically distributed Gaussian random variable with a mean value of zero and a variance of
Figure RE-FDA0003057476140000034
By x1,x2,...xNTo estimate the parameters of AR (2) model
Figure RE-FDA0003057476140000035
And
Figure RE-FDA0003057476140000036
the specific calculation process is given as follows
Figure RE-FDA0003057476140000037
T represents the transpose of the matrix, then the coefficients
Figure RE-FDA0003057476140000038
The following estimate is given:
Figure RE-FDA0003057476140000039
white noise etVariance of (2)
Figure RE-FDA00030574761400000310
Is composed of
Figure RE-FDA0003057476140000041
Furthermore, it is possible to provide a liquid crystal display device,
Figure RE-FDA0003057476140000042
Figure RE-FDA0003057476140000043
Figure RE-FDA0003057476140000044
Figure RE-FDA0003057476140000045
the above equations (1) and (2) are the estimation of the AR (2) parameter, and it can be seen from the above equations that the AR (2) parameter can be obtained from the linear estimation of the time series data.
S4, abnormality detection, and in the last step, detection is performed by using an AR (2) model, wherein the AR (2) model is,
Figure RE-FDA0003057476140000046
namely, it is
Figure RE-FDA0003057476140000047
If B is a step-back operator, i.e. xt-1As long as Bxt is satisfied,
Figure RE-FDA0003057476140000048
wherein the content of the first and second substances,
Figure RE-FDA0003057476140000049
thus, it is possible to provide
Figure RE-FDA0003057476140000051
Order to
Figure RE-FDA0003057476140000052
And define
Figure RE-FDA0003057476140000053
2. The method according to claim 1, wherein said S4 σ is detected2Is a residual e representing N corresponding residuals backward from the current time in the time seriestMean of the sum of squares, λ represents the residual to σ ratio of the current observation, then λ is taken as the detection xN+1Whether the measurement is abnormal; when lambda is<-L or λ>When U is, xN+1Are outliers where L and U are preset constants greater than zero>The condition of U means that the abnormal value is larger than the normal value, and the size of the statistic lambda marks the deviation of the abnormal point from the normal value, namely, the larger lambda is, the larger the deviation of the abnormal value from the normal range is; for lambda<The case of-L means that the abnormal value is smaller than the normal value, and the smaller λ indicates the larger the abnormal value deviates from the normal range.
CN202011237830.8A 2020-11-09 2020-11-09 Computer network anomaly detection method Pending CN113315747A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011237830.8A CN113315747A (en) 2020-11-09 2020-11-09 Computer network anomaly detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011237830.8A CN113315747A (en) 2020-11-09 2020-11-09 Computer network anomaly detection method

Publications (1)

Publication Number Publication Date
CN113315747A true CN113315747A (en) 2021-08-27

Family

ID=77370273

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011237830.8A Pending CN113315747A (en) 2020-11-09 2020-11-09 Computer network anomaly detection method

Country Status (1)

Country Link
CN (1) CN113315747A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709173A (en) * 2021-09-02 2021-11-26 南方电网数字电网研究院有限公司 Method for external non-interference monitoring aiming at network service of power system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286897A (en) * 2008-05-16 2008-10-15 华中科技大学 Network flow rate abnormality detecting method based on super stochastic theory
CN103384215A (en) * 2012-12-21 2013-11-06 北京安天电子设备有限公司 Virus situation anomaly detection method and system based on join AR model
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN107070683A (en) * 2016-12-12 2017-08-18 国网北京市电力公司 The method and apparatus of data prediction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101286897A (en) * 2008-05-16 2008-10-15 华中科技大学 Network flow rate abnormality detecting method based on super stochastic theory
CN103384215A (en) * 2012-12-21 2013-11-06 北京安天电子设备有限公司 Virus situation anomaly detection method and system based on join AR model
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN107070683A (en) * 2016-12-12 2017-08-18 国网北京市电力公司 The method and apparatus of data prediction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡海龙等: "基于残噪预测的网络流量异常检测算法", 《计算机安全》 *
邹柏贤: "一种网络异常实时检测方法", 《计算机学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709173A (en) * 2021-09-02 2021-11-26 南方电网数字电网研究院有限公司 Method for external non-interference monitoring aiming at network service of power system
CN113709173B (en) * 2021-09-02 2023-02-10 南方电网数字电网研究院有限公司 Method for external non-interference monitoring aiming at network service of power system

Similar Documents

Publication Publication Date Title
EP2082555B1 (en) Intelligence network anomaly detection using a type ii fuzzy neural network
CN113315747A (en) Computer network anomaly detection method
CN112217650B (en) Network blocking attack effect evaluation method, device and storage medium
CN117439827B (en) Network flow big data analysis method
CN115931055A (en) Rural water supply operation diagnosis method and system based on big data analysis
CN110677400B (en) Attack exposure surface analysis method and system for host and service in local area network environment
CN105787283B (en) A kind of earthen ruins monitoring data amendment approximating method based on temporal correlation
US7873046B1 (en) Detecting anomalous network activity through transformation of terrain
CN113645215B (en) Abnormal network traffic data detection method, device, equipment and storage medium
CN117390373B (en) Communication transmission equipment debugging maintenance management method and system
CN112508316A (en) Adaptive anomaly determination method and device in real-time anomaly detection system
Lavrova et al. Wavelet-analysis of network traffic time-series for detection of attacks on digital production infrastructure
CN113778806A (en) Method, device, equipment and storage medium for processing safety alarm event
CN113887119B (en) River water quality prediction method based on SARIMA-LSTM
CN113794680A (en) Malicious traffic detection method and device under high-bandwidth scene based on frequency domain analysis
US7617172B2 (en) Using percentile data in business analysis of time series data
CN115118334B (en) Method and device for capturing satellite communication frame header, communication equipment and storage medium
CN111079591A (en) Bad data restoration method and system based on improved multi-scale principal component analysis
Salagean Real network traffic anomaly detection based on analytical discrete wavelet transform
CN115001954A (en) Network security situation sensing method, device and system
Anming et al. Study on the applications of hidden markov models to computer intrusion detection
Chen et al. A novel anomaly detection system using feature-based MSPCA with sketch
Zhang et al. A qualitative and quantitative risk assessment method in software security
Khan et al. A polyscale autonomous sliding window for cognitive machine classification of malicious internet traffic
CN111866023A (en) Abnormal user behavior auditing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210827