CN116627707A - Detection method and system for abnormal operation behavior of user - Google Patents
Detection method and system for abnormal operation behavior of user Download PDFInfo
- Publication number
- CN116627707A CN116627707A CN202310890239.XA CN202310890239A CN116627707A CN 116627707 A CN116627707 A CN 116627707A CN 202310890239 A CN202310890239 A CN 202310890239A CN 116627707 A CN116627707 A CN 116627707A
- Authority
- CN
- China
- Prior art keywords
- sequence
- user
- data
- trend
- abnormal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002159 abnormal effect Effects 0.000 title claims abstract description 66
- 238000001514 detection method Methods 0.000 title description 8
- 230000006399 behavior Effects 0.000 claims abstract description 58
- 230000001932 seasonal effect Effects 0.000 claims abstract description 56
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 33
- 238000009499 grossing Methods 0.000 claims abstract description 31
- 238000005259 measurement Methods 0.000 claims abstract description 29
- 238000000354 decomposition reaction Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 25
- 206010000117 Abnormal behaviour Diseases 0.000 claims abstract description 24
- 238000004364 calculation method Methods 0.000 claims description 12
- 230000002776 aggregation Effects 0.000 claims description 4
- 238000004220 aggregation Methods 0.000 claims description 4
- 230000004044 response Effects 0.000 claims description 4
- 239000004576 sand Substances 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 description 11
- 230000006870 function Effects 0.000 description 10
- 238000012545 processing Methods 0.000 description 5
- 230000008859 change Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002155 anti-virotic effect Effects 0.000 description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000004615 ingredient Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000002547 anomalous effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
- 230000000737 periodic effect Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/27—Regression, e.g. linear or logistic regression
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Fuzzy Systems (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Biomedical Technology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application provides a method and a system for detecting abnormal operation behaviors of a user, and belongs to the technical field of information security. The method comprises the following steps: acquiring user operation behavior data and generating a time sequence; smoothing the time sequence by adopting a local weighted regression algorithm to generate seasonal components; generating a trend residual sequence by using the time sequence and the seasonal component, and smoothing the trend residual sequence to generate a trend component; calculating a sum of the seasonal component and the trend component as a baseline component; generating a residual sequence using the time sequence and the baseline component; performing quarter bit distance measurement on the residual sequence, and calculating to generate a measurement value; calculating a judging section according to the measurement value; and identifying abnormal points of the residual sequence by utilizing the judging section so as to determine abnormal time sequence data. The application adopts a mode based on base line and residual sequence decomposition to identify the abnormal operation data of the user, thereby effectively improving the accuracy of identifying the abnormal behavior of the user.
Description
Technical Field
The application relates to the technical field of information security, in particular to a method and a system for detecting abnormal operation behaviors of a user.
Background
With the rapid development of network technology, the security of network information is receiving more and more attention. The key to the protection of network information security is the prediction and identification of abnormal operation behavior and attack behavior of users. Currently, anti-virus software is generally adopted to perform security protection on network information, while aiming at suspicious behaviors, potential threats and attacks which can not be detected by traditional anti-virus software, a baseline-based UEBA (User and Entity Behavior Analytics, user and entity behavior-based security analysis method) analysis method is adopted to detect and identify potential security threats and abnormal activities.
In a baseline based UEBA analysis, a baseline model is first established that represents the normal behavior patterns of users and entities. The baseline model may be constructed based on historical data or predefined rules. Abnormal behavior is then detected by comparing the real-time data to the baseline model. The baseline based UEBA analysis step generally includes:
and (3) data collection: logs, events, and metrics data including user and entity behavior data are collected. Such data may include login activity, file access, network communications, rights changes, and the like.
Feature extraction: meaningful features are extracted from the collected data for describing the behavior of the user and the entity. Features may include time stamps, frequency of behavior, duration of behavior, type of behavior, etc.
Baseline modeling: a baseline model is constructed using historical data or predefined rules to describe normal behavior of users and entities. The baseline model may be constructed based on statistical analysis methods or machine learning algorithms.
Abnormality detection: the real-time data is compared with the baseline model, and behavior which is significantly different from the baseline model is detected and identified. These differences may represent potential security threats or abnormal activities. Common anomaly detection methods include threshold detection, outlier detection, machine learning classification, and the like.
Baseline-based UEBA analysis may help organizations discover potential internal and external threats and provide timely security responses. By monitoring and analyzing the behaviors of users and entities, abnormal activities and unusual modes are identified, so that the safety is improved and sensitive data are protected. However, this approach still has some of the following objective drawbacks:
1. since constructing the baseline model requires consideration of a number of factors, such as differences between different users and entities, variations in different time periods, and the like. For complex environments and varying patterns of behavior, it may be difficult to construct an appropriate baseline model.
2. The UEBA analysis often faces the problem of false alarm, and the high false alarm rate may reduce the reliability of the analysis result, and increase the load of verification and confirmation of the analysis result.
3. The baseline model requires constant maintenance and updating to accommodate changing environmental and behavioral patterns. However, in the prior art, a dynamic update strategy of the baseline data is lacking, and is behind the service requirement, so that the baseline threshold value is inaccurate, and a false-alarm analysis result appears.
Disclosure of Invention
Aiming at the problems existing in the prior art, the application aims to provide a method and a system for detecting abnormal operation behaviors of a user, which are used for identifying abnormal operation data of the user in a mode of decomposing based on a base line and a residual sequence, so that the accuracy of identifying the abnormal operation behaviors of the user is effectively improved.
The application aims to achieve the aim, and the aim is achieved by the following technical scheme:
a detection method of abnormal operation behavior of a user comprises the following steps:
acquiring user operation behavior data, and generating a time sequence according to the time characteristics of the data;
smoothing the time sequence by adopting a local weighted regression algorithm to generate seasonal components;
generating a trend residual sequence by using the time sequence and the seasonal component, and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component;
calculating a sum of the seasonal component and the trend component as a baseline component;
generating a residual sequence using the time sequence and the baseline component;
performing quarter bit distance measurement on the residual sequence, and calculating to generate a measurement value;
calculating a user abnormal behavior judgment section according to the measurement value;
and identifying abnormal points of the residual sequence by using the abnormal behavior judgment section of the user so as to determine abnormal time sequence data.
Further, the obtaining operation behavior data of the user and generating a time sequence according to time characteristics of the data includes:
and acquiring operation behavior data of the user, and generating a time sequence in a time aggregation mode according to time characteristics of the data.
Further, the smoothing the time sequence by adopting a local weighted regression algorithm to generate seasonal components includes:
smoothing the time sequence data in a time window by adopting a local weighted regression algorithm, and storing trend characteristics of the time sequence data;
determining seasonal components by calculating a moving average of the smoothed time series;
and when the moving average value is calculated, the adopted time window is a time window matched with the seasonal period.
Further, the generating a trend residual sequence by using the time sequence and the seasonal component, and smoothing the trend residual sequence by adopting a local weighted regression algorithm, generating a trend component includes:
subtracting the seasonal component from the time sequence to generate a trend residual sequence;
and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component.
Further, the local weighted regression algorithm includes the steps of:
step 1: let the data points in the time series be%,/>) The objective function of defining the weighted regression is as follows:;
wherein ,is the weight function of the ith data point, x is the position of the point to be smoothed, +.>Is the location of the ith data point, +.>Is a smoothing parameter, i.e. a bandwidth function, for controlling the distribution of weights;
step 2: performing least square regression, and fitting a local polynomial model; let the local polynomial model be:;
by-pass square objective functionDetermining coefficients β of the local polynomial model, wherein n is the total number of data points, +.>Is the response value of the ith data point;
step 3: selection ofCalculating a weight according to the distance of the data points;
the bandwidth function is defined as:k × median(|/>-/>|);
where k is a bandwidth adjustment factor for controlling the size of the bandwidth;
step 4: by minimizing the objective function SAnd calculating a smoothing estimated value of each point to be smoothed.
Further, the generating a residual sequence using the time sequence and the baseline component includes:
the baseline component is subtracted from the time series to obtain a residual sequence.
Further, the performing a quarter-bit distance measurement on the residual sequence, and calculating to generate a measurement value includes:
performing quartile range measurement on the residual sequence to obtain a lower quartile Q1 and an upper quartile Q3;
using the formula iqr=q3-Q1, a quarter-bit distance IQR is generated.
Further, the calculating the abnormal behavior determination section of the user according to the measurement value includes:
calculating a lower limit value A and an upper limit value B of the abnormal behavior judgment section of the user according to formulas A=Q1-kIQR and B=Q3+kIQR;
and taking the [ A, B ] as a user abnormal behavior judgment section.
Further, the identifying abnormal points of the residual sequence by using the abnormal behavior determination section of the user to determine abnormal time series data includes:
judging whether the observed value of the residual sequence belongs to the interval [ A, B ];
if yes, marking the corresponding time series data as normal points; if not, marking the corresponding time series data as abnormal points;
generating an abnormal marking sequence according to the marked abnormal points in the time sequence, wherein the corresponding user operation behavior data have user abnormal operation behaviors.
Correspondingly, the application also discloses a system for detecting the abnormal operation behavior of the user, which comprises the following steps:
the data acquisition unit is configured to acquire user operation behavior data and generate a time sequence according to the time characteristics of the data;
the seasonal decomposition unit is configured to carry out smoothing treatment on the time sequence by adopting a local weighted regression algorithm to generate a seasonal component;
the trend decomposition unit is configured to generate a trend residual sequence by using the time sequence and the seasonal component, and carry out smoothing treatment on the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component;
a base line creation unit configured to calculate a sum of seasonal components and trend components as a base line component;
a residual sequence creation unit configured to generate a residual sequence using the time sequence and the baseline component;
the first calculation unit is configured to perform quarter bit distance measurement on the residual sequence and calculate a generated measurement value;
the second calculation unit is configured to calculate a user abnormal behavior judgment section according to the measurement value;
and an identification unit configured to identify an abnormal point of the residual sequence using the user abnormal behavior determination section to determine abnormal time-series data.
Compared with the prior art, the application has the beneficial effects that: the application provides a detection method and a detection system for abnormal operation behaviors of a user, wherein a Trend component (Trend), a Seasonal component (Seasonal) and a Residual component (Residual) are taken as UEBA consideration factors through decomposition of a time sequence, so that the error recognition rate is greatly reduced compared with the traditional baseline analysis based on data statistics. Because the generation and updating of the traditional base line are behind the change of service data, and meanwhile, the trend component and the period component are not taken into consideration in the analysis of the UEBA, the service has higher false recognition rate when the service changes along with seasons or economic periods.
It can be seen that the present application has outstanding substantial features and significant advances over the prior art, as well as the benefits of its implementation.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method of an embodiment of the present application.
Fig. 2 is a system configuration diagram of an embodiment of the present application.
In the figure, 1, a data acquisition unit; 2. a seasonal decomposition unit; 3. a trend decomposition unit; 4. a base line creation unit; 5. a residual sequence creation unit; 6. a first calculation unit; 7. a second calculation unit; 8. and an identification unit.
Detailed Description
The following describes specific embodiments of the present application with reference to the drawings.
The method for detecting the abnormal operation behavior of the user shown in fig. 1 comprises the following steps:
s1: and acquiring user operation behavior data, and generating a time sequence according to the time characteristics of the data.
Specifically, operation behavior data of a user is obtained, and a time sequence is generated in a time aggregation mode according to time characteristics of the data.
It should be noted that in the present method, the composition of the time series includes a trend period component, a seasonal component, and a remainder component (any other content of the time series), wherein the trend and period are combined into a trend period component. The additive decomposition of the time series can be expressed as:=/>,/>is data, & lt + & gt>Is seasonal ingredient, is->Is a trend period component, ++>The remainder, the multiplicative decomposition of the time series, can be expressed as: />=/>The time sequence decomposition algorithm includes Moving images, classical decomposition, X11 decompensation, SEATS decomposition and STL decomposition.
S2: and smoothing the time sequence by adopting a local weighted regression algorithm to generate seasonal components.
Specifically, the objective of this step is to smooth the time series to reduce the effect of noise. In a specific embodiment, a local weighted regression (Loess) algorithm is adopted for smoothing, so that data in a time window is smoothed, and meanwhile, overall trend characteristics are reserved. The seasonal component is estimated by calculating a moving average for the smoothed time series. The time window size of the moving average is typically matched to the seasonal period to capture seasonal variations.
The local weighted regression (Loess) algorithm adopted by the method specifically comprises the following steps:
step 1: let the data points in the time series be%,/>) The objective function of defining the weighted regression is as follows:;
wherein ,is the weight function of the ith data point, x is the position of the point to be smoothed, +.>Is the location of the ith data point, +.>Is a smoothing parameter, i.e. a bandwidth function, for controlling the distribution of weights.
Step 2: performing least square regression, and fitting a local polynomial model; common polynomial models are linear models (first order polynomials) or quadratic models (second order polynomials). Assume the local polynomial model is:;
by-pass square objective functionDetermining coefficients β of the local polynomial model, wherein n is the total number of data points, +.>Is the response value of the i-th data point.
Step 3: selection ofAnd calculates the weight based on the distance of the data points.
The bandwidth function is defined as:k × median(|/>-/>|);
where k is a bandwidth adjustment factor for controlling the size of the bandwidth.
Step 4: by minimizing the objective function SAnd calculating a smoothing estimated value of each point to be smoothed.
S3: and generating a trend residual sequence by using the time sequence and the seasonal component, and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component.
On the basis of seasonal decomposition, a trend residual sequence is obtained by subtracting the seasonal component from the time sequence. A Loess smoothing method is applied to the trend residual sequence to estimate trend components.
S4: the sum of the seasonal component and the trend component is calculated as the baseline component.
The purpose of this step is to create a baseline, which is the sum of the periodic and trend components.
S5: a residual sequence is generated using the time sequence and the baseline component.
Specifically, on the basis of trend decomposition, the estimated seasonal and trend components are subtracted from the original time sequence, i.e., the residual sequence is the X-baseline component, resulting in a residual sequence. The residual sequence represents portions of the original data that cannot be interpreted by trends and seasonally, i.e., random noise and aperiodic variations (e.g., abnormal behavioral components).
As can be seen from the above steps, the method adopts STL (Seasonal and Trend decomposition using Loess) time series decomposition algorithm to decompose the time series. Specifically, based on the idea of local weighted regression (Loess), the time series is decomposed into three parts, trend (Trend), seasonal (Seasonal), residual (Residual) in an iterative manner.
Compared with the Moving tools, classification, X11 and SEATS decomposition algorithms commonly used in the prior art, the STL decomposition algorithm has the following advantages:
(1) In contrast to SEATS and X11, STL may handle any type of seasonal, not just monthly and quarterly data.
(2) The seasonal ingredient is allowed to change over time and the rate of change may be controlled by the user.
(3) The smoothness of the trend period may also be controlled by the user.
(4) The outlier is robust and occasional anomalous observations do not affect the estimation of trend periods and seasonal components.
S6: and performing quarter bit distance measurement on the residual sequence, and calculating to generate a measurement value.
Firstly, carrying out quartile range measurement on a residual sequence to obtain a lower quartile Q1 and an upper quartile Q3; then, using the formula iqr=q3-Q1, the quarter-bit distance IQR is generated.
In particular embodiments, to detect outliers on the residual sequence, the algorithm employed is Tukey features, which employs a quarter-bit distance (IQR) metric, i.e., the spread of data. IQR is referred to as medium speed, middle 50%, fourth diffusion, or H-point difference. IQR is defined as the difference between the 75 th and 25 th percentiles of data. For the calculation of IQR, the data set is divided into quartiles, represented by Q1 (also called lower quartiles), Q2 (median), and Q3 (also called upper quartiles). The lower quartile corresponds to the 25 th percentile and the upper quartile corresponds to the 75 th percentile, so the calculation formula is: iqr=q3-Q1.
S7: and calculating a user abnormal behavior judgment section according to the measurement value.
Specifically, according to formulas a=q1-kIQR and b=q3+kiqr, a lower limit value a and an upper limit value B of the user abnormal behavior determination section are calculated; and taking the [ A, B ] as a user abnormal behavior judgment section.
In particular embodiments, by measuring observations on residual components by quartile range, Q1 and Q3 are the lower and upper quartiles, respectively, an outlier can be defined as any observation outside of the range: [ Q1-k (Q3-Q1), q3+k (Q3-Q1) ] wherein k=1.5 represents an "outlier", i.e. an outlier is defined as an observed value below Q1-1.5×iqr or above q3+1.5×iqr.
S8: and identifying abnormal points of the residual sequence by using the abnormal behavior judgment section of the user so as to determine abnormal time sequence data.
Specifically, first, whether an observed value of a residual sequence belongs to a section [ A, B ] is judged; if yes, marking the corresponding time series data as normal points; if not, marking the corresponding time series data as abnormal points. And finally, generating an abnormal marking sequence according to the abnormal points marked in the time sequence, wherein the corresponding user operation behavior data have user abnormal operation behaviors.
In a specific embodiment, the decision interval is first calculated according to formulas Q1-kIQR and Q3+kIQR. Outliers may be defined as observations below Q1-1.5×iqr or above q3+1.5×iqr, depending on the particular needs.
Observations below Q1-1.5×iqr or above q3+1.5×iqr are marked as outliers when outliers are detected on the residual sequence. Accordingly, if the observed value is within the range of [ Q1-k (Q3-Q1), Q3+k (Q3-Q1) ] it is marked as a normal point. And finally, generating an abnormal mark sequence according to the marked abnormal points.
Correspondingly, as shown in fig. 2, the application also discloses a system for detecting abnormal operation behaviors of a user, which comprises the following steps: a data acquisition unit 1, a seasonal decomposition unit 2, a trend decomposition unit 3, a baseline creation unit 4, a residual sequence creation unit 5, a first calculation unit 6, a second calculation unit 7 and an identification unit 8.
The data acquisition unit 1 is configured to acquire user operation behavior data and generate a time sequence according to time characteristics of the data.
In a specific embodiment, the data acquisition unit 1 is specifically configured to: and acquiring operation behavior data of the user, and generating a time sequence in a time aggregation mode according to time characteristics of the data.
And a seasonal decomposition unit 2 configured to smooth the time series by using a local weighted regression algorithm to generate a seasonal component.
In a specific embodiment, the seasonal decomposition unit 2 is specifically configured to: smoothing the time sequence data in a time window by adopting a local weighted regression algorithm, and storing trend characteristics of the time sequence data; determining seasonal components by calculating a moving average of the smoothed time series; and when the moving average value is calculated, the adopted time window is a time window matched with the seasonal period.
And a trend decomposition unit 3 configured to generate a trend residual sequence by using the time sequence and the seasonal component, and to perform smoothing processing on the trend residual sequence by using a local weighted regression algorithm to generate a trend component.
In the specific embodiment, the trend decomposing unit 3 is specifically configured to: subtracting the seasonal component from the time sequence to generate a trend residual sequence; and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component.
A base line creating unit 4 configured to calculate the sum of the seasonal component and the trend component as a base line component.
A residual sequence creation unit 5 configured to generate a residual sequence using the time sequence and the baseline component.
In a specific embodiment, the residual sequence creation unit 5 is specifically configured to: the baseline component is subtracted from the time series to obtain a residual sequence.
A first calculation unit 6 configured to perform a quarter-bit distance measurement on the residual sequence, and calculate a generated measurement value.
In a specific embodiment, the first computing unit 6 is specifically configured to: performing quartile range measurement on the residual sequence to obtain a lower quartile Q1 and an upper quartile Q3; using the formula iqr=q3-Q1, a quarter-bit distance IQR is generated.
A second calculation unit 7 configured to calculate a user abnormal behavior determination section from the metric value.
In a specific embodiment, the second computing unit 7 is specifically configured to: calculating a lower limit value A and an upper limit value B of the abnormal behavior judgment section of the user according to formulas A=Q1-kIQR and B=Q3+kIQR; and taking the [ A, B ] as a user abnormal behavior judgment section.
An identifying unit 8 configured to identify an abnormal point of the residual sequence using the user abnormal behavior determination section to determine abnormal time-series data.
In a specific embodiment, the identification unit 8 is specifically configured to: judging whether the observed value of the residual sequence belongs to the interval [ A, B ]; if yes, marking the corresponding time series data as normal points; if not, marking the corresponding time series data as abnormal points; generating an abnormal marking sequence according to the marked abnormal points in the time sequence, wherein the corresponding user operation behavior data have user abnormal operation behaviors.
It will be apparent to those skilled in the art that the techniques of embodiments of the present application may be implemented in software plus a necessary general purpose hardware platform. Based on such understanding, the technical solution in the embodiments of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium such as a U-disc, a mobile hard disc, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk or an optical disk, etc. various media capable of storing program codes, including several instructions for causing a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, etc.) to execute all or part of the steps of the method described in the embodiments of the present application. The same or similar parts between the various embodiments in this specification are referred to each other. In particular, for the terminal embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and reference should be made to the description in the method embodiment for relevant points.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, and methods may be implemented in other ways. For example, the system embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, system or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in the embodiments of the present application may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit.
Similarly, each processing unit in the embodiments of the present application may be integrated in one functional module, or each processing unit may exist physically, or two or more processing units may be integrated in one functional module.
The application will be further described with reference to the accompanying drawings and specific embodiments. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Further, it will be understood that various changes and modifications may be made by those skilled in the art after reading the teachings of the application, and equivalents thereof fall within the scope of the application as defined by the claims.
Claims (10)
1. A method for detecting abnormal operation behavior of a user, comprising:
acquiring user operation behavior data, and generating a time sequence according to the time characteristics of the data;
smoothing the time sequence by adopting a local weighted regression algorithm to generate seasonal components;
generating a trend residual sequence by using the time sequence and the seasonal component, and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component;
calculating a sum of the seasonal component and the trend component as a baseline component;
generating a residual sequence using the time sequence and the baseline component;
performing quarter bit distance measurement on the residual sequence, and calculating to generate a measurement value;
calculating a user abnormal behavior judgment section according to the measurement value;
and identifying abnormal points of the residual sequence by using the abnormal behavior judgment section of the user so as to determine abnormal time sequence data.
2. The method for detecting abnormal operation behavior of a user according to claim 1, wherein the acquiring operation behavior data of the user and generating a time series according to time characteristics of the data comprises:
and acquiring operation behavior data of the user, and generating a time sequence in a time aggregation mode according to time characteristics of the data.
3. The method for detecting abnormal operation behavior of a user according to claim 2, wherein smoothing the time series by using a locally weighted regression algorithm to generate seasonal components comprises:
smoothing the time sequence data in a time window by adopting a local weighted regression algorithm, and storing trend characteristics of the time sequence data;
determining seasonal components by calculating a moving average of the smoothed time series;
and when the moving average value is calculated, the adopted time window is a time window matched with the seasonal period.
4. The method for detecting abnormal operation behavior of a user according to claim 3, wherein the generating a trend residual sequence using a time sequence and a seasonal component, and smoothing the trend residual sequence using a local weighted regression algorithm, generating a trend component, comprises:
subtracting the seasonal component from the time sequence to generate a trend residual sequence;
and smoothing the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component.
5. The method for detecting abnormal operation behavior of a user according to claim 4, wherein the local weighted regression algorithm comprises the steps of:
step 1: let the data points in the time series be%,/>) The objective function of defining the weighted regression is as follows:;
wherein ,is the weight function of the ith data point, x is the position of the point to be smoothed, +.>Is the location of the ith data point, +.>Is a smoothing parameter, i.e. a bandwidth function, for controlling the distribution of weights;
step 2: performing least square regression, and fitting a local polynomial model; let the local polynomial model be:;
by-pass square objective functionDetermining coefficients β of the local polynomial model, wherein n is the total number of data points, +.>Is the response value of the ith data point;
step 3: selection ofCalculating a weight according to the distance of the data points;
the bandwidth function is defined as: k × median(|/> -/>|);
where k is a bandwidth adjustment factor for controlling the size of the bandwidth;
step 4: by minimizing the objective function SAnd calculating a smoothing estimated value of each point to be smoothed.
6. The method for detecting abnormal operation behavior of a user according to claim 4, wherein generating a residual sequence using a time sequence and a baseline component comprises:
the baseline component is subtracted from the time series to obtain a residual sequence.
7. The method for detecting abnormal operation behavior of a user according to claim 6, wherein the performing a quarter-bit distance measurement on the residual sequence, calculating a generated measurement value, comprises:
performing quartile range measurement on the residual sequence to obtain a lower quartile Q1 and an upper quartile Q3;
using the formula iqr=q3-Q1, a quarter-bit distance IQR is generated.
8. The method for detecting abnormal operation behavior of a user according to claim 7, wherein calculating the abnormal operation behavior determination section of the user based on the metric value comprises:
calculating a lower limit value A and an upper limit value B of the abnormal behavior judgment section of the user according to formulas A=Q1-kIQR and B=Q3+kIQR;
and taking the [ A, B ] as a user abnormal behavior judgment section.
9. The method for detecting abnormal operation behavior of a user according to claim 8, wherein the identifying abnormal points of the residual sequence by using the abnormal operation behavior determination section of the user to determine abnormal time series data comprises:
judging whether the observed value of the residual sequence belongs to the interval [ A, B ];
if yes, marking the corresponding time series data as normal points; if not, marking the corresponding time series data as abnormal points;
generating an abnormal marking sequence according to the marked abnormal points in the time sequence, wherein the corresponding user operation behavior data have user abnormal operation behaviors.
10. A system for detecting abnormal operation behavior of a user, comprising:
the data acquisition unit is configured to acquire user operation behavior data and generate a time sequence according to the time characteristics of the data;
the seasonal decomposition unit is configured to carry out smoothing treatment on the time sequence by adopting a local weighted regression algorithm to generate a seasonal component;
the trend decomposition unit is configured to generate a trend residual sequence by using the time sequence and the seasonal component, and carry out smoothing treatment on the trend residual sequence by adopting a local weighted regression algorithm to generate a trend component;
a base line creation unit configured to calculate a sum of seasonal components and trend components as a base line component;
a residual sequence creation unit configured to generate a residual sequence using the time sequence and the baseline component;
the first calculation unit is configured to perform quarter bit distance measurement on the residual sequence and calculate a generated measurement value;
the second calculation unit is configured to calculate a user abnormal behavior judgment section according to the measurement value;
and an identification unit configured to identify an abnormal point of the residual sequence using the user abnormal behavior determination section to determine abnormal time-series data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310890239.XA CN116627707A (en) | 2023-07-20 | 2023-07-20 | Detection method and system for abnormal operation behavior of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310890239.XA CN116627707A (en) | 2023-07-20 | 2023-07-20 | Detection method and system for abnormal operation behavior of user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116627707A true CN116627707A (en) | 2023-08-22 |
Family
ID=87602877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310890239.XA Pending CN116627707A (en) | 2023-07-20 | 2023-07-20 | Detection method and system for abnormal operation behavior of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116627707A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117310118A (en) * | 2023-11-28 | 2023-12-29 | 济南中安数码科技有限公司 | Visual monitoring method for groundwater pollution |
CN117350508A (en) * | 2023-10-31 | 2024-01-05 | 深圳市黑云精密工业有限公司 | Production work order distribution system based on real-time acquisition data of production line collector |
CN117421610A (en) * | 2023-12-19 | 2024-01-19 | 山东德源电力科技股份有限公司 | Data anomaly analysis method for electric energy meter running state early warning |
CN117648590A (en) * | 2024-01-30 | 2024-03-05 | 山东万洋石油科技有限公司 | Omnibearing gamma logging data optimization processing method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111324639A (en) * | 2020-02-11 | 2020-06-23 | 京东数字科技控股有限公司 | Data monitoring method and device and computer readable storage medium |
WO2020127656A1 (en) * | 2018-12-20 | 2020-06-25 | Worldline | Anomaly detection in data flows with confidence intervals |
CN111444168A (en) * | 2020-03-26 | 2020-07-24 | 易电务(北京)科技有限公司 | Distribution room transformer daily maximum load abnormal data detection processing method |
CN112965876A (en) * | 2021-03-10 | 2021-06-15 | 中国民航信息网络股份有限公司 | Monitoring alarm method and device |
CN112966222A (en) * | 2021-03-10 | 2021-06-15 | 中国民航信息网络股份有限公司 | Time series abnormal data detection method and related equipment |
CN114218009A (en) * | 2021-12-30 | 2022-03-22 | 山东云海国创云计算装备产业创新中心有限公司 | Time series abnormal value detection method, device, equipment and storage medium |
WO2022117911A1 (en) * | 2020-12-04 | 2022-06-09 | Elisa Oyj | Anomaly detection |
-
2023
- 2023-07-20 CN CN202310890239.XA patent/CN116627707A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020127656A1 (en) * | 2018-12-20 | 2020-06-25 | Worldline | Anomaly detection in data flows with confidence intervals |
CN111324639A (en) * | 2020-02-11 | 2020-06-23 | 京东数字科技控股有限公司 | Data monitoring method and device and computer readable storage medium |
CN111444168A (en) * | 2020-03-26 | 2020-07-24 | 易电务(北京)科技有限公司 | Distribution room transformer daily maximum load abnormal data detection processing method |
WO2022117911A1 (en) * | 2020-12-04 | 2022-06-09 | Elisa Oyj | Anomaly detection |
CN112965876A (en) * | 2021-03-10 | 2021-06-15 | 中国民航信息网络股份有限公司 | Monitoring alarm method and device |
CN112966222A (en) * | 2021-03-10 | 2021-06-15 | 中国民航信息网络股份有限公司 | Time series abnormal data detection method and related equipment |
CN114218009A (en) * | 2021-12-30 | 2022-03-22 | 山东云海国创云计算装备产业创新中心有限公司 | Time series abnormal value detection method, device, equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
朱双 等: "《流域水文分析与中长期预报方法》", vol. 1, 中国地质大学出版社, pages: 50 - 51 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350508A (en) * | 2023-10-31 | 2024-01-05 | 深圳市黑云精密工业有限公司 | Production work order distribution system based on real-time acquisition data of production line collector |
CN117310118A (en) * | 2023-11-28 | 2023-12-29 | 济南中安数码科技有限公司 | Visual monitoring method for groundwater pollution |
CN117310118B (en) * | 2023-11-28 | 2024-03-08 | 济南中安数码科技有限公司 | Visual monitoring method for groundwater pollution |
CN117421610A (en) * | 2023-12-19 | 2024-01-19 | 山东德源电力科技股份有限公司 | Data anomaly analysis method for electric energy meter running state early warning |
CN117421610B (en) * | 2023-12-19 | 2024-03-15 | 山东德源电力科技股份有限公司 | Data anomaly analysis method for electric energy meter running state early warning |
CN117648590A (en) * | 2024-01-30 | 2024-03-05 | 山东万洋石油科技有限公司 | Omnibearing gamma logging data optimization processing method |
CN117648590B (en) * | 2024-01-30 | 2024-04-19 | 山东万洋石油科技有限公司 | Omnibearing gamma logging data optimization processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116627707A (en) | Detection method and system for abnormal operation behavior of user | |
CN112257063B (en) | Cooperative game theory-based detection method for backdoor attacks in federal learning | |
TWI595375B (en) | Anomaly detection using adaptive behavioral profiles | |
CN107493277B (en) | Large data platform online anomaly detection method based on maximum information coefficient | |
EP2069993B1 (en) | Security system and method for detecting intrusion in a computerized system | |
KR102464390B1 (en) | Method and apparatus for detecting anomaly based on behavior analysis | |
US10437696B2 (en) | Proactive information technology infrastructure management | |
Ye et al. | EWMA forecast of normal system activity for computer intrusion detection | |
CN109522948A (en) | A kind of fault detection method based on orthogonal locality preserving projections | |
CN116112292B (en) | Abnormal behavior detection method, system and medium based on network flow big data | |
Bai et al. | Automatic detection and removal of high‐density impulse noises | |
Ahmadi et al. | A new false data injection attack detection model for cyberattack resilient energy forecasting | |
CN109873832B (en) | Flow identification method and device, electronic equipment and storage medium | |
CN102045358A (en) | Intrusion detection method based on integral correlation analysis and hierarchical clustering | |
CN112149749A (en) | Abnormal behavior detection method and device, electronic equipment and readable storage medium | |
CN107679626A (en) | Machine learning method, device, system, storage medium and equipment | |
CN116450482A (en) | User abnormality monitoring method and device, electronic equipment and storage medium | |
Liu et al. | Online conditional outlier detection in nonstationary time series | |
CN115049410A (en) | Electricity stealing behavior identification method and device, electronic equipment and computer readable storage medium | |
Smith et al. | Testing probabilistic adaptive real‐time flood forecasting models | |
EP4116853B1 (en) | Computer-readable recording medium storing evaluation program, evaluation method, and information processing device | |
CN113971119B (en) | Unsupervised model-based user behavior anomaly analysis and evaluation method and system | |
Zhang et al. | Causal direction inference for network alarm analysis | |
CN114050941B (en) | Defect account detection method and system based on kernel density estimation | |
Sheikhrabori et al. | Maximum likelihood estimation of change point from stationary to nonstationary in autoregressive models using dynamic linear model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230822 |