CN105808368A - Information security abnormity detection method and system based on random probability distribution - Google Patents

Information security abnormity detection method and system based on random probability distribution Download PDF

Info

Publication number
CN105808368A
CN105808368A CN201610145685.8A CN201610145685A CN105808368A CN 105808368 A CN105808368 A CN 105808368A CN 201610145685 A CN201610145685 A CN 201610145685A CN 105808368 A CN105808368 A CN 105808368A
Authority
CN
China
Prior art keywords
alarm
module
time
distribution
anomaly detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610145685.8A
Other languages
Chinese (zh)
Other versions
CN105808368B (en
Inventor
李木金
凌飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Liancheng Technology Development Co Ltd
Original Assignee
Nanjing Liancheng Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Liancheng Technology Development Co Ltd filed Critical Nanjing Liancheng Technology Development Co Ltd
Priority to CN201610145685.8A priority Critical patent/CN105808368B/en
Publication of CN105808368A publication Critical patent/CN105808368A/en
Application granted granted Critical
Publication of CN105808368B publication Critical patent/CN105808368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Alarm Systems (AREA)

Abstract

The invention discloses an information security abnormity detection method and system based on random probability distribution. The system comprises a real-time warning module, a historical warning module, an off-line abnormity detection modeling module, an on-line abnormality detection module and a knowledge base. By using the method and the system, heterogeneous, dynamic and complicated security attack events in mass security warning generated by IT (Information Technology) enterprise network equipment can be detected; the traceability or the event playback can be realized; the roots or the sources of the fault can be discovered; the method and the system can help an IT enterprise to fast recover the business; and the normal operation of the IT enterprise is ensured.

Description

Information security anomaly detection method and system based on random probability distribution
Technical Field
The invention relates to the technical field of information security application, in particular to an abnormity detection method and system for mass alarms reported by SNMP, syslog and the like.
Background
The English abbreviation contained in the invention is as follows:
SMA: simple moving average line of SimpleMovingAverage
ACF: AutoCorrelationFunction autocorrelating function
MAD: median absolute deviation in medianabsolutedevelopment
LR: linear regression of linear regression
OLS: method of least squares of andinyleastsquares
MA: movingaverage moving average
WMA: weighted moving average
EWMA: exponentialweightmovovingaverage index weighted moving average
AR: autoregressive autoregression
ARMA: autoregressive moving average of autoegregnesivemovaverage
ARIMA: integrated autoregressive ARMA Integrated moving average
CUSUM: cumulantsumtest accumulation and test
SOC: security management center for SecurityOperationCenter
IDS: intrusion detection system of intrusion detection system
SNMP: simple network management protocol
HDFS (Hadoop distributed File System): HadoopDistributementFileSystemHadoop distributed file system
MQ: MessageQueue message queue
Safety production always guarantees the orderly development of various works and is also a negative index for checking the leaders and the cadres at all levels. The network and information security operation and maintenance system is an important component of the security production work of various enterprises. The network can be guaranteed to operate efficiently and stably, and the method is the basis for all market operation activities and normal operation of enterprises.
With the construction and the perfection of various enterprise information systems, the labor productivity is effectively improved, and the operation cost is reduced. Once each business system of an enterprise has a security event or a fault or forms a performance bottleneck, the security event cannot be found, processed and recovered in time, and the operation of all the services borne by the business system is directly caused, so that the normal operation order of the enterprise is influenced, and the business of the enterprise cannot be normally developed. Therefore, the method is particularly important for the safety guarantee of the IT basic implementation of governments and enterprises.
With the increasing informatization degree of governments and enterprises. The business systems are more and more closely connected, data exchange is more and more frequent, each system has complex network or logic connection, a large amount of data exchange exists, even one fault can cause enterprise whole network fault, one point or one business system has bugs to infect viruses or be attacked, and the bugs can quickly spread to other business systems and networks, even cause the enterprise whole network to be paralyzed.
The enterprise IT system generates a large amount of alarms, and the data are discarded without reason along with the reduction of the cost of storage equipment, however, at present, no corresponding method and analysis tool exist, and the root of security attack, source tracing or fault finding can be prevented or discovered from the large amount of alarms; IT is not possible to help information security engineers to restore IT traffic as soon as possible.
Therefore, how to improve the operation and maintenance benefits of enterprise security management by using an informatization means and optimize the operation and maintenance service of enterprise information security management so that the information security management system can provide professional and high-performance information security operation and maintenance management service for various enterprises becomes an important subject which needs to be solved in the operation and maintenance design of information security management in particular.
Disclosure of Invention
After analyzing the defects and shortcomings of various enterprise information security management operation and maintenance service platforms, the invention provides a method and a system for detecting information security abnormity based on random probability distribution.
The core idea of the invention is as follows: a distribution-based method and system for security anomaly detection are constructed. The method and the system can establish an information security anomaly detection model through the alarm time sequence, the model is established in an off-line mode, and a method guide is provided for an on-line security anomaly detection system.
Further, the method and the system comprise a real-time alarm module, a historical alarm module, an offline anomaly detection modeling module, an online anomaly detection module and a knowledge base.
The real-time alarm module receives alarms reported by various safety devices through protocols such as SNMP, syslog and the like in real time and respectively sends the alarms to the history alarm module and the distribution-based abnormal online detection model module.
The historical alarm module can be used as a backup of an alarm time sequence and can also provide alarm data for the offline security attack anomaly detection modeling module.
The offline anomaly detectionAnd the modeling module is used for modeling the alarm time sequence and providing a guide of an abnormal detection method based on a threshold, an abnormal detection method based on regression and an abnormal detection method based on random distribution. The regression-based anomaly detection method calculates the median m, the quartile range iqr, the event interval k, the period T,Andand determining whether to select the distribution-based information security anomaly detection method or not, and feeding back the information security anomaly detection method to an online distribution-based anomaly detection module in real time.
The online anomaly detection module detects the anomalies of the alarm time sequence reported by the real-time alarm module in real time online by adopting a random probability distribution-based method, and reports the detection results to a related display module or a safety analyst for further processing.
The knowledge base stores various statistical parameters, anomaly detection methods, application scenarios thereof and the like.
Preferably, the median m, the interquartile range iqr, the event interval k, the period T, the,Andsituation when satisfyingAnd k is1. T! =0 andand then, adopting a safety abnormity detection method based on distribution.
Aiming at massive security alarms generated by heterogeneous and dynamic IT enterprise network equipment, the invention can detect security attack events, trace back or replay the events, discover the root or source of the fault, help IT enterprises to quickly recover services and ensure normal operation of the IT enterprises.
Drawings
FIG. 1 is a schematic view of an internal flow of an offline anomaly detection modeling module according to the present invention;
FIG. 2 is an embodiment of the alarm percentage for actual detection based on distributed information security anomaly detection in accordance with the present invention;
FIG. 3 is an embodiment of 3 alarms (Wired Trojan, wifi Trojan, and external Trojan) based on distributed information security anomaly detection (alarm time sequence per hour) according to the present invention;
FIG. 4 is a diagram of an embodiment of a distributed information security anomaly detection-based alarm statistics distribution box according to the present invention;
FIG. 5 is a statistical variance coefficient embodiment of a distributed-based information security anomaly detection according to the present invention;
FIG. 6 is a diagram of 3 embodiments of alarm timing dependence for distributed-based information security anomaly detection according to the present invention;
FIG. 7 is a graphical illustration of descriptive statistical stability of a distributed-based information security anomaly detection according to the present invention;
FIG. 8 is a graph illustrating stability index values for distributed-based information security anomaly detection in accordance with the present invention;
FIG. 9 is a schematic diagram of a selective anomaly detection method based on distributed information security anomaly detection according to the present invention;
FIG. 10 is a diagram illustrating distributed-based information security anomaly detection in accordance with the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and examples:
fig. 1 is a schematic view of an internal flow of the offline anomaly detection modeling module according to the present invention, which includes: preprocessing and grouping; time-based segmentation; thirdly, descriptive statistics; fourthly, descriptive statistical analysis; possible recombination. The double circles represent the input and output of the offline security anomaly detection. The original input is an alarm from a security device (e.g., firewall, intrusion detection device, router, etc.). The final output is an algorithm guide for selecting security anomaly detection. The grey frame is a parameter input by a security analyst, and different parameters can adapt to different application scenarios and the purpose of security analysis. The application scenario determines the number of alarms (e.g., 1 year of alarm), network topology (e.g., node, subnet), number of nodes (since the larger the number of hosts and network devices, the larger the number of alarms) required for security analysis.
The preprocessing and grouping depends mainly on the topology of the network and the purpose of a security analyst, for example, only one subnet or one type of alarm needs to be monitored. If the alarms are generated by different security devices, standardization of security alarm attributes and preliminary alarm correlation analysis are required.
The ② time-based segmentation, calculating alarm time series and time-based segmentation (e.g., dividing a day into day and night).
And the descriptive statistics are used for extracting the distribution of each alarm time sequence and the descriptive statistics of time sequence dependence. The distribution is represented by the central tendency (mean, median) and the dispersion of the data (variance, quartile, coefficient of variance). Furthermore, the stability of the alarm distribution statistics can also be evaluated. An alarm time series exhibits a time-series dependency if it trends, is periodic, seasonal, or can be predicted. Thus, the timing dependency may be expressed as predictability and/or periodicity of the alarm time series.
And fourthly, analyzing the extracted descriptive statistics to deduce the applicability and the effectiveness of the abnormal detection method.
And fifthly, possible recombination, which suggests possible recombination of the alarm time sequence to a security analyst so as to establish a more effective security anomaly detection algorithm. For example, if the number of alarms is dependent on working hours, descriptive statistics of different time distributions (e.g., day, night) may be extracted. The threshold for timing dependent anomaly detection can be determined at this point.
Further, the alarm received by the preprocessing and grouping module may be any type of alarm, for example, an original alarm, a super alarm, or a meta alarm reported by the security device. In order to avoid loss of generality, the invention mainly considers the original alarm.
The preprocessing, i.e., the standardization of alarm information, and the elimination of repeated alarms, etc. Alarm grouping by setting initial combination parametersAnd then realized. The different grouping methods depend on the goals of the security analyst. For example:
the method comprises the following steps: a source address of the alarm;
the alarm type is as follows: either the normal alarm type or the super alarm type.
The warning source can be an internal warning or an external warning. The internal alarms mainly show the behavior during working hours and the user behavior, while the external alarms are mainly changes and noise. The finer grained combination of internal alarms may be based on network topology and security analysis objectives. For example, security analysts may be grouped based on different network and firewall policies, such as different subnets, organizational departments, and wired or wireless.
The second group of alarm types, based on which different behaviors are revealed, would otherwise consider that all alarms of a group would likely prevent security anomaly detection. For example, typically, one alarm type generates a large number of alarms, and other types of alarms may be masked.
The ① preprocessed and grouped outputs are N alarm groups, i.e.、…、. For example, considering alarms generated by an enterprise IT network for 5 months, alarms may be classified according to previously defined criteria:
an alarm source: the method comprises the following steps of alarming of wired equipment, alarming of wifi and external alarming;
alarm type: hobbyhorse, etc.
The reason why the wired alarm and the wifi alarm are monitored respectively is that the PC client of most internal employees is connected with all servers in a wired communication mode, and most internal employees (including guests) using a notebook computer and a smart phone are connected in a wireless communication mode. In addition, most network, wifi devices are policy limited so that some PCs (or notebooks) can only access Web and mail applications. For these reasons, it is desirable for a security alarm analysis system to be able to derive different historical behaviors from the wired alarms and the alarms generated by the wireless host.
The extraction of alarm types is related to the number of alarms of each type. In fig. 2, the percentage of different types of alarms generated is given (less than 1% of alarms, disregarded). As seen from fig. 2, the alarm generated 80% is of the trojan alarm type. This result is trusted because the enterprise does not directly monitor most host devices. Fig. 1 is adapted to be independent of all alarm packets and independent of the number of alarms. However, it is very useful for automatic analysis containing a large number of alert packets. Therefore, the next step mainly considers the three most active alert packets: wired hobbyhorse, wireless hobbyhorse and external hobbyhorse.
Further, the ② is based on time division, the input of which is、…、(ii) a And, three operational steps of extracting descriptive statistics: alarm time series calculation, valid/invalid alarm sequence tagging, and time-based segmentation.
For each alarm groupTime sequence of alarmsStatistics requires the input of two parameters:
the time window w determines the alarm quantity to be analyzed;
two-phase time granularity g, the minimum time unit of alarm (e.g., alarm time series per day, hour, minute) is evaluated.
The above parameters are input by the security analyst, depending on the scenario and the analysis objective. For example, if the analysis target is to find out which day the anomaly or situational awareness of the alarm, the time granularity may be equal to one day (The number of alarms per day) and a time window w of 6 months or more. On the other hand, if the analysis objective is to evaluate whether the day and night have different alarm distributions, the time granularity may be equal to one hour or less and the time window w is 1 month or more. In the context of security analysis, too fine a granularity g (e.g., seconds) should be avoided.
Then the ② is evaluated based on the time-based segmentationActive or not in the time window w. The purpose of this step is mainly to remove inactive time sequencesThis is because for further analysis. As a criterion for checking whether the time series of alarms is active, if 50% or more of the number of alarms are generated within the time interval, the alarm is active, i.e. mean)>0. Other criteria and thresholds, such as filtering inactive alert sequences, depend on the security analysis objectives and the conditions of the enterprise IT system.
In calculating alarm time seriesThen, if it isIs active, the parameters are combined at the input timeIs further divided, wherein,defined as some time interval (e.g. day, night), alarm time sequenceIs divided into M subsequencesJ ∈ { 1,2, …, M }. on the other hand, if the security analyst has no particular expectation of the temporal behavior of the alarms, all alarms may be grouped togetherDefining a fine granularity time(e.g., generally in terms of hourly segmentation.) this is due to the fact that the ⑤ possible recombinations can automatically suggest possible coarse-grained temporal recombinations to analyze the descriptive statistics extracted at the ③ descriptive statistics.
The output of the ② time-based segmentation is M subsequencesAnd sequencesI.e. for each alarm packetAnd outputting M +1 alarm sequences.
Now, considering the previous example again, mainly the 3 most active alert packets are of interest: wired Trojan, wifi Trojan, external Trojan. The time window w examined was 5 months and the time particle size g was 1 hour. This time granularity allows to examine the temporal behavior of different time intervals. Fig. 5 is a time series of wired, wifi, and external trojan alarms for each hour. The X-axis represents time (hours), and the Y-axis represents reported alarm quantity (0-800 alarms/hour). Because the median of these three alarm sequences is greater than zero (mean: ()>0, i =1,2, 3), so they are active. As can be seen from fig. 3, wifi trojans are most active, wired trojans are next to them, and the external trojan alert sequence is weakest.
Further, the ③ descriptive statistics with inputs ofAnd M subsequences. This module extracts 3 sets of related descriptive statistics, relating to random distribution, timing dependence and stability.
The random distribution, the distribution characteristics of which have 2 main attributes: concentration trends and discrepancies. For highly dynamic application scenarios, the following statistics are reviewed and can be visually represented by box diagrams.
The median m (mean (m)) represents the concentration trend of data;
the quartile iqr represents the dispersion around the concentration trend.
To represent outlier-to-data dispersionInfluence of degree, investigating variance coefficientWhereinandrespectively, the mean and variance of the distribution to which the alarm sequence belongs.When the value is higher, the alarm sequence is discrete and/or an abnormal value exists; however,when the value is smaller, the distribution is expressed as a convergent distribution.
Again, for the most active alarm sequence: wired Trojan, WIFI Trojan, external Trojan, considering time combination= { working hour (day), working hour (night), holiday (day), holiday (night) }, time combinations are given in fig. 4A box diagram of (a); where the X-axis represents time division (day, night) and the Y-axis represents the number of alarms per time unit (e.g., number of alarms reported per hour). Each box graph gives the following statistical properties: lower quartile (q1), median (mean), upper quartile (q3), interquartile (iqr = q3-q 1), lower whisker: (q1) ((r))=) And upper tentacle=. All are inAbove and aboveThe values below can be considered as outliers.
FIG. 5 shows different time combinationsCoefficient of variance ofThe value of the coefficient. This statistic is useful for capturing variability of the data.
As seen in fig. 4, most of the alarms during the day of the work day are generated by wifi trojans. On the other hand, in the daytime of holidays, wifi Trojan alarms are reduced, and in the nighttime of holidays, there are almost no alarms. As can be seen from fig. 7, in the daytime of the working day, the coefficient of variation of the wifi trojan alarm is low, while other combinations are higher than this, which indicates that the alarm sequence is noisy and/or has some outliers.
In all four time combinations of fig. 4 (a) and (d), the wired trojan alarm exhibits similar central trends (m) and dispersion (iqr) of the alarm, with somewhat higher points during the day of the work day. However, on weekdays, either daytime or night, there are higher outliers. These outliers are almost an order of magnitude higher than the concentration trend; as can be seen from fig. 5, the variance coefficient is also a high value.
On the other hand, external trojans are almost equally distributed, whether during the day or at night, and the day of the workday is somewhat low, which may be related to attacks from different time zones. The dispersion of the external trojan warning is low and the coefficient of variance approaches 1.5 in all time combinations. This suggests that the external trojan warning sequence is independent of the time of detection and can be combined into a time combination (no difference in working hours/holidays, daytime/night).
The time dependence, descriptive statistics related to timing dependence, is useful for regression-based anomaly detection. An alarm sequence exhibits a temporal dependency if it is trending, periodic, and seasonal. The trend is a general systematic component, and for sufficiently long time frames, a time series may show periodic or seasonal patterns.
To extract timing dependent descriptive statistics, filtering and auto-correlation time series analysis techniques are employed. Filtering can reduce the noise of the time series. Such noise may hide trends and temporal patterns that are useful for model anomaly detection. In this case, a simple filtering technique is employed; it is important to consider that the nature of the data can be changed as more advanced filtering techniques are employed. For this reason, the present invention employs SMA filtering based on a radius r hour center window. For the sake of clarity, assumeAs an alarm time series, andis the number of alarms at time t (e.g., if the time granularity g equals 1 day, thenIndicating the number of alerts on day t). The SMA filtering generates a new sequence SMA (t), in which the alarm sequenceEach value of (1) isIs replaced by the average of the 2r neighbors of (i):
SMA(t)
wherein,is the number of alarms at time t, and 2r +1 is the size of the moving average window. The invention proposes smooth filtering with radius r of 1 or progressive filtering with radius r of 5.
After filtering, the following auto-correlation function (ACF) is calculated:
wherein,is the time interval of the automatic association,is the alarm time series, E is the mathematical expectation operator,andis thatMean and variance of. When self-correlation is a high value and slow decay, it means that future values are correlated with historical values; the opposite is true, i.e., when the automatic association between two values tends to zero. If it isThen a time series is considered predictable and has sufficient prediction accuracy in the kth window. Therefore, the above conditions are satisfied, and the regression-based abnormality detection algorithm can be effectively used.
Unlike randomly distributed descriptive statistics, timing dependent statistics are only from the entire alarm time seriesBecause the auto-correlation function requires continuity of alarm time for identifying predictability, trends, and periodicity.
In particular, with respect to timing dependencies, the present invention extracts the following descriptive statistics:
⑴ as predictable intervalsA value;
time seriesMain period of(e.g. usingIf any).
There may be multiple periods (e.g., 24 hours, 7 days), or there may be no period (in this case,= 0). Again, note that regardless of whether or not to alert sequenceFiltering is applied and each statistic can be decimated. That is, there are 3 configurations (no SMA filtering, weak SMA filtering, strong SMA filtering), and accordingly 3 pairs of values (f &),)。
Fig. 6 shows ACF values of the wired trojan, the WIFI trojan, and the external trojan. X-axis represents time interval(hours), value of Y-axis ACF. The vertical dashed line indicates a 24 hour slip, while the horizontal dashed line indicates a threshold of 0.3 to determine whether an alarm sequence is predictable and gives no filtering,=1 and=5 results of three configurations.
Fig. 4 (a) shows a wired trojan warning for a 24-hour period of a week, which is slightly enhanced by SMA filtering, but still remains below the 0.3 threshold (hence, period)= 0). The filtering slightly improves the pitchPrediction of, in particular=5, however, the alarm sequence remains weakly associated. On the other hand, WIFI trojan alerts exhibit strong 24 hour periods, which is evident even if no filtering is used. This means that the highest probability of finding the same value every hour is every 24 hours. The ACF of the external Trojan warning sequence shows a trend component which is enhanced by filtering to achievePredicted to be higher than24 hours worth of = 5. .
Stability of the description statistics, each alarm time seriesTo show the stability of the descriptive statistics of their distribution, the median (mean) and the interquartile range are considered. In the invention, w is defined as the time window over which the alarm time series is to be analyzed. It is verified how the distribution statistics evolve in the time window w. For this purpose, two parameters are considered: size of sliding window S (e.g., 1 month), time shift(e.g., 1 week); wherein,Sw is added. By assigning different values to these parameters, the information security analyst can assess the stability of the descriptive statistics over different periods. Information that determines how often the anomaly detection parameters are re-evaluated is also useful. The invention calculates median and quartile rangeValue of from the time interval=[0,s]Is started and then=[,s+]Then, then=[2,s+2]Etc. until the entire time window w is covered. This process is descriptive statisticsAnd
fig. 7 gives descriptive statistics on the alarm data set. Time shift on X-axisAnd the Y axis representsAndvalue of (number of alarms/hour). In this example, w =5 months, s =1 month,=1 week. For example, X =0, indicates month 1And(ii) a X =1, and represents week 1Andand so on. This enables the evaluation of how descriptive statistics evolve on a weekly basis.
As can be seen from fig. 7, during the initial period, the statistics of the wired trojan during the day are unstable and then stable; on the other hand, the WIFI trojan has almost no alarm at night, but the alarm is increased sharply in the daytime. The external trojan is stable throughout the cycle.
Here, criteria are given for automatically verifying whether descriptive statistics regarding the alarm distribution are stable. Let d be a descriptive statistic (e.g., iqr), and the descriptive statistic d be a value at time shift t (e.g.,at 5Value of (d). To assess the stability of d, a popular dispersion measurement method is used: the median absolute deviation MAD. In particular, for each descriptive statistic d, the stability index is calculated by the following formula
Wherein,denotes MAD, denominator m (d) = mean (C: (D))) This is a normalization factor for descriptive statistics of different scales that require comparison. Is smaller(almost zero) means that the descriptive statistic d is stable and vice versa. In particular, when the time sequenceIs stable when the concentration trend and dispersion satisfy the following relations:
0
wherein,is a stability threshold that can be adjusted by the security analyst based on the IT network environment. In the application scenario of the invention, the stability and instability of descriptive statistics are identified automatically and verified heuristically=0.2 is a sufficient threshold. In the above formula, considerThe maximum value of the stability index, since a statistically significant difference can be sufficient to take into account the instability of the distribution. In fig. 8, both the wired trojan and wireless trojan alarms during the day are unstable, while the stability index for the other four distributions is below the threshold.
The distribution-based anomaly detection method, whose alarm sequence can be modeled by parametric or non-parametric distributions (Gaussian distribution,(Gamma) distribution) and the abnormal event occurs in a low probability region of the stochastic model or a large variation region is distributed. These algorithms are only useful when there is no timing dependence and regression-based methods are not applicable. Alarm sequences can only be modeled by distributions if the central trends and dispersion remain stable. If it is notAndare stable, a distribution-based anomaly detection method may be employed. The distribution-based algorithm may be parametric or nonparametric.
The parameterization technique is only useful when there is some evidence or knowledge of the alarm sequence distribution. E.g. medianIs stable and concentrated in the quartile regionThe alarm sequence can then be modeled by a gaussian distribution, although further analysis is required, e.g. chi-square test. Other commonly used parameter distributions are gamma distribution and Longtail distribution. More complex distributions may be approximated by distributions, such as MoG (approximated by a gaussian distribution).
Non-parametric techniques are only useful if there is no a priori knowledge of the alarm sequence distribution. Common examples are histogram-based techniques and kernel-based techniques (e.g., Parzen window estimation).
In addition, ifIs unstable andis stable, then the CUSUM-like method, which uses median as descriptive statistics, is effective for anomaly detection.
Note that external trojan (day, night), wireless trojan (night), and wired trojan (night) alarms can be modeled by distributions, while wireless trojan (day) is constantly growing in mean and variance, such alarm sequences make modeling more complex by distributions. The wired trojan is unstable only in an initial period and is stable all the time thereafter. That is, the distribution-based method is effective for the wired trojan after the initial period is unstable.
As shown in fig. 9, the decision flow chart evaluates the convergence index in the first step: the distribution-based approach is also effective for anomaly detection if the alert sequence is not convergent, but does not have timing dependencies.
FIG. 10 is a diagram of distributed-based information security anomaly detection according to the present invention, including a real-time alarm module, a historical alarm module, an offline anomaly detection modeling module, an online anomaly detection module, and a knowledge base.
The real-time alarm module receives alarms reported by various safety devices through protocols such as SNMP, syslog and the like in real time and respectively sends the alarms to the history alarm module and the distribution-based abnormal online detection model module.
The historical alarm module can be used as a backup of an alarm time sequence and can also provide alarm data for the offline security attack anomaly detection model module.
The off-line anomaly detection modeling module models the alarm time sequence and provides guidance for an anomaly detection method based on threshold, an anomaly detection method based on regression and an anomaly detection method based on random distribution. The distribution-based anomaly detection method calculates the median m, the quartile range iqr, the event interval k, the period T,Andand determining whether to select the distributed information security anomaly detection method.
The online anomaly detection module adopts a distribution-based method to detect the anomalies of the alarm time sequence reported by the real-time alarm module in real time online, and reports the detection results to a related display module or a safety analyst for further processing.
The knowledge base stores various statistical parameters, anomaly detection methods, application scenarios thereof and the like.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention; all equivalent changes and modifications made according to the present invention are considered to be covered by the scope of the present invention.

Claims (2)

1. The invention provides a method and a system for detecting information security abnormity based on random probability distribution, which comprises a real-time alarm module, a historical alarm module, an off-line abnormity detection modeling module, an on-line abnormity detection module (based on distribution) and a knowledge base;
1) the real-time alarm module receives alarms from various safety devices in real time and reports the alarms to the history alarm module and the online distribution-based abnormality detection module respectively;
2) the historical alarm module can be used as a backup of alarm data and can also provide the alarm data for the offline abnormality detection modeling module;
3) the off-line anomaly detection modeling module models the alarm time sequence and provides a safety attack anomaly detection method guide based on regression;
4) the distribution-based anomaly detection method guideline calculates the median m, the quartile range iqr, the event interval k, the period T and the time,Anddetermining whether to select a distribution-based information security anomaly detection method or not, and feeding back to an online distribution-based anomaly detection module in real time;
5) the online abnormity detection module detects the abnormity of the alarm reported by the real-time alarm module in real time by adopting a distribution-based method, and reports the detection result to a related display module or a safety analyst for further processing;
6) the knowledge base stores various statistical parameters, anomaly detection methods, application scenarios thereof and the like.
2. The method and system for distributed-based information security anomaly detection as recited in claim 1, further comprising: middle number m, quarter-bit distance iqr and variance coefficientSatisfy the requirements ofAnd k is1. T! =0 andand then, adopting a distribution-based information security abnormity detection method.
CN201610145685.8A 2016-03-15 2016-03-15 A kind of method and system of the information security abnormality detection based on random probability distribution Active CN105808368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610145685.8A CN105808368B (en) 2016-03-15 2016-03-15 A kind of method and system of the information security abnormality detection based on random probability distribution

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610145685.8A CN105808368B (en) 2016-03-15 2016-03-15 A kind of method and system of the information security abnormality detection based on random probability distribution

Publications (2)

Publication Number Publication Date
CN105808368A true CN105808368A (en) 2016-07-27
CN105808368B CN105808368B (en) 2019-04-30

Family

ID=56468365

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610145685.8A Active CN105808368B (en) 2016-03-15 2016-03-15 A kind of method and system of the information security abnormality detection based on random probability distribution

Country Status (1)

Country Link
CN (1) CN105808368B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239388A (en) * 2017-05-27 2017-10-10 郑州云海信息技术有限公司 A kind of monitoring alarm method and system
CN107451040A (en) * 2017-07-07 2017-12-08 深信服科技股份有限公司 Localization method, device and the computer-readable recording medium of failure cause
CN110751371A (en) * 2019-09-20 2020-02-04 苏宁云计算有限公司 Commodity inventory risk early warning method and system based on statistical four-bit distance and computer readable storage medium
CN111245538A (en) * 2018-11-29 2020-06-05 华为技术有限公司 Link quality detection method and related equipment
TWI712917B (en) * 2018-07-17 2020-12-11 開曼群島商創新先進技術有限公司 Method and device for detecting safety of data model
CN117113260A (en) * 2023-10-19 2023-11-24 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014031A (en) * 2010-12-31 2011-04-13 湖南神州祥网科技有限公司 Method and system for network flow anomaly detection
US20140006325A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation Detecting anomalies in real-time in multiple time series data with automated thresholding
CN104113872A (en) * 2013-04-22 2014-10-22 ***通信集团湖北有限公司 Method and system for data service monitoring
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
CN105071985A (en) * 2015-07-24 2015-11-18 四川大学 Server network behavior description method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102014031A (en) * 2010-12-31 2011-04-13 湖南神州祥网科技有限公司 Method and system for network flow anomaly detection
US20140006325A1 (en) * 2012-06-28 2014-01-02 International Business Machines Corporation Detecting anomalies in real-time in multiple time series data with automated thresholding
CN104113872A (en) * 2013-04-22 2014-10-22 ***通信集团湖北有限公司 Method and system for data service monitoring
CN104915846A (en) * 2015-06-18 2015-09-16 北京京东尚科信息技术有限公司 Electronic commerce time sequence data anomaly detection method and system
CN105071985A (en) * 2015-07-24 2015-11-18 四川大学 Server network behavior description method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邹柏贤: "网络异常的检测与诊断方法", 《小型微型计算机***》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239388A (en) * 2017-05-27 2017-10-10 郑州云海信息技术有限公司 A kind of monitoring alarm method and system
CN107451040A (en) * 2017-07-07 2017-12-08 深信服科技股份有限公司 Localization method, device and the computer-readable recording medium of failure cause
CN107451040B (en) * 2017-07-07 2020-12-01 深信服科技股份有限公司 Method and device for positioning fault reason and computer readable storage medium
TWI712917B (en) * 2018-07-17 2020-12-11 開曼群島商創新先進技術有限公司 Method and device for detecting safety of data model
CN111245538A (en) * 2018-11-29 2020-06-05 华为技术有限公司 Link quality detection method and related equipment
CN110751371A (en) * 2019-09-20 2020-02-04 苏宁云计算有限公司 Commodity inventory risk early warning method and system based on statistical four-bit distance and computer readable storage medium
CN117113260A (en) * 2023-10-19 2023-11-24 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis
CN117113260B (en) * 2023-10-19 2024-01-30 深圳市磐锋精密技术有限公司 Intelligent laminating equipment fault early warning system based on data analysis

Also Published As

Publication number Publication date
CN105808368B (en) 2019-04-30

Similar Documents

Publication Publication Date Title
CN105656693B (en) A kind of method and system of the information security abnormality detection based on recurrence
CN105808368B (en) A kind of method and system of the information security abnormality detection based on random probability distribution
US11150974B2 (en) Anomaly detection using circumstance-specific detectors
JP6545819B2 (en) Integrated discovery of communities and roles in corporate networks
US9544321B2 (en) Anomaly detection using adaptive behavioral profiles
US9652354B2 (en) Unsupervised anomaly detection for arbitrary time series
US20190095266A1 (en) Detection of Misbehaving Components for Large Scale Distributed Systems
EP2142994B1 (en) Statistical method and system for network anomaly detection
US10635817B2 (en) Targeted security alerts
US20140223555A1 (en) Method and system for improving security threats detection in communication networks
GB2547201A (en) Cyber security
US11074652B2 (en) System and method for model-based prediction using a distributed computational graph workflow
CN114978568A (en) Data center management using machine learning
US20170054750A1 (en) Risk assessment
US9961047B2 (en) Network security management
US8661113B2 (en) Cross-cutting detection of event patterns
IL259472A (en) An anomaly detection system and method
US10936401B2 (en) Device operation anomaly identification and reporting system
CN107682173B (en) Automatic fault positioning method and system based on transaction model
Amiri et al. A complete operational architecture of alert correlation
WO2021055964A1 (en) System and method for crowd-sourced refinement of natural phenomenon for risk management and contract validation
CN115632855A (en) Network intrusion detection system, method, equipment and storage medium
CN113992496A (en) Abnormal operation warning method and device based on quartile algorithm and computing equipment
CN117897702A (en) System and method for automatically evaluating quality of network traffic signatures
Healing et al. Intrusion Alert Prioritisation and Attack Detection using Post-Correlation Analysis

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: A small road in Yuhuatai District of Nanjing City, Jiangsu province 210012 Building No. 158 Building 1 new ideal

Applicant after: Nanjing Liancheng science and technology development Limited by Share Ltd

Address before: A small road in Yuhuatai District of Nanjing City, Jiangsu province 210012 Building No. 158 Building 1 new ideal

Applicant before: NANJING LIANCHENG TECHNOLOGY DEVELOPMENT CO., LTD.

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000, Nanjing high tech Zone, Jiangsu, Nanjing Software Park, No. 99 unity Road, Eagle building, block A, 14 floor

Applicant after: Nanjing Liancheng science and technology development Limited by Share Ltd

Address before: A small road in Yuhuatai District of Nanjing City, Jiangsu province 210012 Building No. 158 Building 1 new ideal

Applicant before: Nanjing Liancheng science and technology development Limited by Share Ltd

CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000 14F, building A, Eagle building, 99 solidarity Road, Nanjing Software Park, Nanjing hi tech Zone, Jiangsu

Applicant after: Nanjing Liancheng science and technology development Limited by Share Ltd

Address before: 210000, Nanjing high tech Zone, Jiangsu, Nanjing Software Park, No. 99 unity Road, Eagle building, block A, 14 floor

Applicant before: Nanjing Liancheng science and technology development Limited by Share Ltd

GR01 Patent grant
GR01 Patent grant