CN114090402A - User abnormal access behavior detection method based on isolated forest - Google Patents

User abnormal access behavior detection method based on isolated forest Download PDF

Info

Publication number
CN114090402A
CN114090402A CN202111292478.2A CN202111292478A CN114090402A CN 114090402 A CN114090402 A CN 114090402A CN 202111292478 A CN202111292478 A CN 202111292478A CN 114090402 A CN114090402 A CN 114090402A
Authority
CN
China
Prior art keywords
isolated
data
abnormal
isolated forest
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111292478.2A
Other languages
Chinese (zh)
Inventor
廖游
黎臻
张玄
张超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 30 Research Institute
Original Assignee
CETC 30 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 30 Research Institute filed Critical CETC 30 Research Institute
Priority to CN202111292478.2A priority Critical patent/CN114090402A/en
Publication of CN114090402A publication Critical patent/CN114090402A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a user abnormal access behavior detection method based on an isolated forest, which comprises the following steps: step 10, constructing a data set X based on historical log data of user access behaviors; step 20, constructing an isolated forest model by utilizing a data set X based on an isolated forest algorithm; step 30, carrying out anomaly detection on the log data to be detected of the user access behavior by using the isolated forest model to obtain an anomaly detection result; and step 40, processing the abnormal detection result. The invention improves the defects of difficult description of artificial features and high labor cost in the prior art based on the machine learning algorithm of the isolated forest, and can automatically extract new abnormal behavior features for the self-learning of the isolated forest model, thereby further improving the robustness of the isolated forest model. And the construction of each isolated binary tree in the model is not influenced mutually, so that the isolated forest model has higher detection precision and efficiency for high-dimensional data under the balance of weight values.

Description

User abnormal access behavior detection method based on isolated forest
Technical Field
The invention relates to the technical field of user abnormal access behavior detection, in particular to a user abnormal access behavior detection method based on an isolated forest.
Background
With the continuous development of internet information technology, data becomes the core secret of each enterprise, and how to better guarantee data security becomes a new challenge to the new era. Malicious attacks from the outside tend to be very diverse, and for such outside attacks enterprises will often place security barriers at the boundaries of the network to isolate the inside from the outside, thereby combating outside attacks. In practice, the security problem faced by the enterprise comes not only from the outside, but the abnormal behavior of the internal users can even cause more serious loss. Therefore, how to rapidly and accurately monitor and detect the abnormal behavior of the user has become a research hotspot.
Currently, the mainstream technology in the market generally adopts correlation analysis methods based on artificial feature matching and data mining to detect abnormal behaviors of users, but the methods have limitations of the methods. The abnormal behaviors of the user are difficult to accurately describe manually in the artificial feature matching method, and the abnormal behaviors of the user often change continuously in an actual scene, so that the features of different behaviors cannot be summarized, specific feature description work is difficult to develop, and once abnormal behaviors except preset description features occur, the abnormal behaviors are difficult to distinguish, and the detection precision of the model is reduced continuously; then, in the process of manual feature description, an experienced engineer is needed to construct a preset scene and design related behavior description features, the workload of the work is extremely large, the experience requirements of the engineer are high, and the labor cost in the whole process is too high and does not meet the requirements of modern engineering; moreover, the method is poor in portability, once a scene changes, the description characteristics of the whole model need to be designed again, and the method is very unfriendly for users. For the correlation analysis method, it usually needs to obtain a preset user attribute information and a user historical behavior sequence, perform internal correlation on the two information by using an algorithm, and then compare the correlation information with the current behavior sequence information to judge whether the current behavior sequence is abnormal, the method has very high dependence degree on the preset information, has low accuracy and slow response speed for judging the content exceeding the preset category, and has high real-time requirement and poor adaptability to the continuously changing scenes.
Disclosure of Invention
The invention aims to provide a user abnormal access behavior detection method based on an isolated forest so as to solve the existing problems.
The invention provides a user abnormal access behavior detection method based on an isolated forest, which comprises the following steps:
step 10, constructing a data set X based on historical log data of user access behaviors;
step 20, constructing an isolated forest model by utilizing a data set X based on an isolated forest algorithm;
step 30, carrying out anomaly detection on the log data to be detected of the user access behavior by using the isolated forest model to obtain an anomaly detection result;
and step 40, processing the abnormal detection result.
Further, the method for constructing the data set X based on the historical log data of the user access behaviors in step 10 includes:
step 11, collecting historical log data of user access behaviors;
step 12, performing data preprocessing on the collected historical log data of the user access behaviors, and removing redundant data in the historical log data of the user access behaviors;
and step 13, describing the historical log data of the user access behaviors after data preprocessing in a tuple mode, and accordingly sorting and combining the historical log data to form a data set X.
Further, the method for constructing the isolated forest model by using the data set X based on the isolated forest algorithm in step 20 includes:
step 21, randomly selecting m pieces of sample data from a data set X, and recording a set of the sample data as a subset XiSubset XiGenerating an isolated binary tree;
step 22, from subset XiRandomly selecting a feature f and randomly selecting a tangent point p for classificationCutting set XiWherein the value of p is between the maximum value and the minimum value of the characteristic f, and p is taken as a hyperplane to take the subset XiThe sample data in (1) is divided into two parts;
step 23, if the subset X is not the sameiIf the value of the characteristic f of certain sample data in the node is larger than the value of the tangent point p, dividing the sample data to the right child of the node; if the subset XiIf the value of the characteristic f of certain sample data in the node is smaller than the value of the tangent point p, the sample data is divided into the left children of the node;
step 24, repeatedly carrying out segmentation on the left child and the right child of the node according to the method in the steps 22 and 23, and stopping continuous generation when a set condition is reached to obtain all nodes of the isolated binary tree;
step 25, the set of nodes of the isolated binary tree is marked as Node ═ n1,n2,…,nrH, the path length of the node is recorded as H ═ H1,h2,…,hrCalculating the standard deviation sigma of the path length of the nodes of the isolated binary tree;
step 26, repeating steps 21 to 25 until a set of n isolated binary trees and the standard deviation of the path length of the corresponding isolated binary tree is generated;
step 27, performing normalization processing on the set of standard deviations of the path lengths of all the isolated binary trees to generate a set of weight values corresponding to each isolated binary tree;
and 28, returning an isolated forest model, wherein the isolated forest model comprises the generated n isolated binary trees and the corresponding weight value sets thereof.
Further, the setting conditions in step 24 include the following three conditions:
(1) the isolated binary tree reaches a set maximum height;
(2) the nodes of the left child and/or the right child have a plurality of same sample data;
(3) there is only one piece of data in the nodes of the left child and/or the right child.
Further, the calculation formula for calculating the standard deviation σ of the path lengths of the nodes of the isolated binary tree in step 25 is:
Figure BDA0003335346760000031
where μ is the average of the path lengths of all nodes, r represents the number of nodes in the isolated binary tree, hjRepresenting the jth node of the isolated binary tree.
Further, the method for normalizing the set of standard deviations of the path lengths of all isolated binary trees in step 27 includes:
Figure BDA0003335346760000041
wherein the content of the first and second substances,
Figure BDA0003335346760000042
representing the weight value corresponding to the ith binary isolated tree; set of standard deviations of path lengths for all isolated binary trees { σ }12,…,σn},σiStandard deviation, σ, representing the path length of the ith isolated binary treemaxIs the maximum value, σ, in the set of standard deviations of the path lengths of all isolated binary treesminIs the minimum value in the set of standard deviations of the path lengths of all isolated binary trees.
Further, the method for performing anomaly detection on the log data to be detected of the user access behavior by using the isolated forest model in step 30 includes:
step 31, performing data preprocessing on the log data to be detected of the user access behavior, and removing redundant data in the log data to be detected of the user access behavior;
step 32, describing the log data to be detected of the user access behavior after data preprocessing in a tuple mode, and accordingly sorting and combining the log data to be detected to form a data set to be detected;
step 33, inputting the data set to be detected into the isolated forest model, and calculating the number of each sample in the data set to be detected in the isolated forest modelAccording to the set of path lengths H ═ H on each isolated binary tree1,h2,…,hnThe set of weighted values obtained in step 20 is recorded as
Figure BDA0003335346760000043
And (3) weighting and calculating an abnormal value y (a, m) of the sample data a in the data set to be detected according to the following formula:
Figure BDA0003335346760000044
Figure BDA0003335346760000045
H(x)=ln x+ξ
wherein xi is an Euler constant;
and step 33, finally determining whether the sample data is abnormal according to the abnormal value of the sample data in the data set to be detected, and outputting a corresponding abnormal detection result.
Further, the method for processing the abnormality detection result in step 40 includes:
step 41, storing the context data of the sample data identified as abnormal in the abnormal detection result; independently storing the sample data identified as abnormal, and keeping the key directivity information in the sample data;
step 42, generating abnormal alarm information according to the abnormal detection result; the abnormal alarm information encapsulates the user object corresponding to the sample data identified as abnormal, the specific content of the sample data and the alarm level; and sending the generated abnormal alarm information to a corresponding alarm center for response, and meanwhile, reserving the abnormal alarm information.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. the invention improves the defects of difficult description of artificial features and high labor cost in the prior art based on the machine learning algorithm of the isolated forest, replaces the description of the artificial features by automatically extracting the features, thereby reducing the occupation ratio of artificial intervention in technical realization, simultaneously enabling the whole system to continuously perform self-improvement when the whole system faces continuously changing abnormal behaviors, automatically extracting new abnormal behavior features for self-learning of the isolated forest model, and further improving the robustness of the isolated forest model.
2. From the practical application perspective, the method well solves the problems of low response speed and high development cost of the traditional method in the practical application scene, realizes the self-iteration function of the isolated forest model in the system to further reduce the operation and maintenance cost in the later period, realizes the quick response to mass data in the whole system based on the streaming processing framework, basically considers the method to be a real-time detection method, and is very suitable for the application scene with huge data volume and high real-time requirement as a core detection method of a complete abnormal behavior detection system. And the construction of each isolated binary tree in the model is not influenced mutually, so that the isolated forest model has higher detection precision and efficiency for high-dimensional data under the balance of weight values.
3. The method simplifies the isolation boundary of the log data use of the user access behavior, arranges the log data describing the user access behavior in the form of tuple, breaks through the use form of the traditional single log, describes the data characteristics more simply and efficiently, effectively reduces the redundancy degree of the data, and provides higher-value input for the subsequent algorithm.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention, and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a flowchart of a user abnormal access behavior detection method based on an isolated forest according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an isolated binary tree according to an embodiment of the present invention.
Fig. 3 is a schematic diagram of a construction process of an isolated forest model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
Description of the drawings: the isolated forest is a machine learning algorithm applied to anomaly detection, the algorithm is suitable for an anomaly analysis task of continuous data, data in the algorithm is divided into two types, one type is normal data, the other type is abnormal data, the distribution of the abnormal data is often inconsistent with the normal data, the distribution is loose, therefore, the abnormal data is also called isolated outliers, and the abnormal data is usually isolated by using the characteristics of small quantity of abnormal data, large difference between characteristic distribution and the normal data and the like. Different from other machine learning algorithms, the method does not need to calculate the distance and the density in the implementation of the isolated forest, avoids a large amount of calculation operation, has linear time complexity, is not influenced by other trees in the growth process of each isolated binary tree, and has very good response capability to mass data, so that the algorithm is very suitable for being used in a system with high real-time requirement and huge data volume.
In a complete user abnormal behavior analysis system, a database cluster forms a data storage layer as a basic support of the system, and bears a data core of the whole system to support an upper application to perform further processing and analysis work of data; the computing layer utilizes stable and rapid stream processing and middleware technology to improve the real-time computing capability and response speed of the system, so that the system can better cope with the massive data scene; the anomaly analysis module is the key content of the whole system, the core of the anomaly analysis module can be realized based on an isolated forest, data such as logs, network flow and the like which are processed in an accelerating way are accessed, automatic extraction modeling of data semantic features is carried out through the isolated forest, then, implicit relation information, sequence information and the like in the data are analyzed by utilizing an isolated binary tree with weight, and whether the data are abnormal behaviors or not is judged according to the final weight result; the uppermost layer is responsible for interacting with the user and displaying the packaged analysis result.
In this embodiment, a method for detecting abnormal user access behavior based on an isolated forest is provided in an abnormality analysis module based on the above contents, and the detection method automatically extracts a hidden distribution pattern of data features through a machine learning model, performs abnormality detection on log data of user access behavior in each isolated binary tree with a weight value, and finally performs weighting calculation on results of all the isolated binary trees to obtain a final analysis detection result. Specifically, as shown in fig. 1, the method for detecting abnormal user access behavior based on the isolated forest includes the following steps:
step 10, constructing a data set X based on historical log data of user access behaviors:
step 11, collecting historical log data of user access behaviors;
step 12, performing data preprocessing on the collected historical log data of the user access behaviors, and removing redundant data in the historical log data of the user access behaviors;
and step 13, describing the historical log data of the user access behaviors after data preprocessing in a tuple mode, and accordingly sorting and combining the historical log data to form a data set X.
Step 20, constructing an isolated forest model based on an isolated forest algorithm and by using the data set X, as shown in fig. 2, specifically including:
step 21, randomly selecting m pieces of sample data from a data set X, and recording a set of the sample data as a subset XiSubset XiFor generating an isolated binary tree, as shown in FIG. 3, a first subset X is schematically illustrated as A, B, C, D, E15 sample data in;
step 22, from subset XiRandomly selecting a feature f and then randomly selecting a tangent point p to segment the subset XiWherein the value of p is between the maximum value and the minimum value of the characteristic f, and p is taken as a hyperplane to take the subset XiThe sample data in (1) is divided into two parts;
step 23, if the subset X is not the sameiIf the value of the characteristic f of certain sample data in the node is larger than the value of the tangent point p, dividing the sample data to the right child of the node; if the subset XiIf the value of the characteristic f of certain sample data in the node is smaller than the value of the tangent point p, the sample data is divided into the left children of the node;
step 24, repeatedly carrying out segmentation on the left child and the right child of the node according to the method in the steps 22 and 23, and stopping continuous generation when a set condition is reached to obtain all nodes of the isolated binary tree; wherein the setting conditions include the following three conditions:
(1) the isolated binary tree reaches a set maximum height;
(2) the nodes of the left child and/or the right child have a plurality of same sample data;
(3) there is only one piece of data in the nodes of the left child and/or the right child.
Step 25, the set of nodes of the isolated binary tree is marked as Node ═ n1,n2,…,nrH, the path length of the node is recorded as H ═ H1,h2,…,hr-calculating the standard deviation σ of the path lengths of the nodes of the isolated binary tree:
Figure BDA0003335346760000081
where μ is the average of the path lengths of all nodes, r represents the number of nodes in the isolated binary tree, hjRepresenting the jth node of the isolated binary tree.
Step 26, repeating steps 21 to 25 until a set of n isolated binary trees and the standard deviation of the path length of the corresponding isolated binary tree is generated;
step 27, performing normalization processing on the set of standard deviations of the path lengths of all the isolated binary trees to generate a set of weight values corresponding to each isolated binary tree:
Figure BDA0003335346760000091
wherein the content of the first and second substances,
Figure BDA0003335346760000092
representing the weight value corresponding to the ith binary isolated tree; set of standard deviations of path lengths for all isolated binary trees { σ }12,…,σn},σiStandard deviation, σ, representing the path length of the ith isolated binary treemaxIs the maximum value, σ, in the set of standard deviations of the path lengths of all isolated binary treesminIs the minimum value in the set of standard deviations of the path lengths of all isolated binary trees.
And 28, returning an isolated forest model, wherein the isolated forest model comprises the generated n isolated binary trees and the corresponding weight value sets thereof.
As can be seen from this, it is,
step 30, carrying out anomaly detection on the log data to be detected of the user access behavior by using the isolated forest model to obtain an anomaly detection result:
step 31, performing data preprocessing on the log data to be detected of the user access behavior, and removing redundant data in the log data to be detected of the user access behavior;
step 32, describing the log data to be detected of the user access behavior after data preprocessing in a tuple mode, and accordingly sorting and combining the log data to be detected to form a data set to be detected;
step 33, inputting the data set to be detected into the isolated forest model, and calculating a set H ═ H of the path length of each sample data in the data set to be detected on each isolated binary tree in the isolated forest model1,h2,…,hnThe set of weighted values obtained in step 20 is recorded as
Figure BDA0003335346760000093
And (3) weighting and calculating an abnormal value y (a, m) of the sample data a in the data set to be detected according to the following formula:
Figure BDA0003335346760000101
Figure BDA0003335346760000102
H(x)=ln x+ξ
wherein xi is an Euler constant;
and step 33, finally determining whether the sample data is abnormal according to the abnormal value of the sample data in the data set to be detected, and outputting a corresponding abnormal detection result. Specifically, an abnormal value threshold may be preset, and when the abnormal value of the sample data in the data set to be detected exceeds the abnormal value threshold, the sample data is considered to be abnormal. The abnormal value threshold value can be set according to the actual application condition.
Step 40, processing the abnormal detection result:
step 41, storing the context data of the sample data identified as abnormal in the abnormal detection result; the method comprises the steps of performing independent storage on sample data identified as abnormal, and reserving key directivity information in the sample data to ensure that the abnormal data can be traced subsequently; the key directivity information mainly refers to user-related information in log data of user access behavior corresponding to the sample data, such as user basic information, an ip address when the data is generated, what content of which module is accessed, and the like.
Step 42, generating abnormal alarm information according to the abnormal detection result; the abnormal alarm information encapsulates the user object corresponding to the sample data identified as abnormal, the specific content of the sample data and the alarm level; and sending the generated abnormal alarm information to a corresponding alarm center for response, and meanwhile, reserving the abnormal alarm information. The alarm level is a three-level alarm level when the abnormal value of the sample data exceeds the abnormal value threshold and reaches a first level threshold, wherein a plurality of level thresholds are set after the abnormal value threshold is exceeded, for example, 3 level thresholds are set; when the abnormal value of the sample data exceeds the abnormal value threshold and reaches a second level threshold, the sample data is in a second-level alarm level; when the abnormal value of the sample data exceeds the abnormal value threshold and reaches a third level threshold, determining the sample data is a first-level alarm level; the conditions represented by the third-level alarm level and the second-level alarm level and the first-level alarm level are sequentially from light to serious.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A user abnormal access behavior detection method based on an isolated forest is characterized by comprising the following steps:
step 10, constructing a data set X based on historical log data of user access behaviors;
step 20, constructing an isolated forest model by utilizing a data set X based on an isolated forest algorithm;
step 30, carrying out anomaly detection on the log data to be detected of the user access behavior by using the isolated forest model to obtain an anomaly detection result;
and step 40, processing the abnormal detection result.
2. The isolated forest-based user abnormal access behavior detection method as claimed in claim 1, wherein the method for constructing the data set X based on historical log data of user access behaviors in step 10 comprises the following steps:
step 11, collecting historical log data of user access behaviors;
step 12, performing data preprocessing on the collected historical log data of the user access behaviors, and removing redundant data in the historical log data of the user access behaviors;
and step 13, describing the historical log data of the user access behaviors after data preprocessing in a tuple mode, and accordingly sorting and combining the historical log data to form a data set X.
3. The isolated forest-based user abnormal access behavior detection method as claimed in claim 2, wherein the method for constructing the isolated forest model based on the isolated forest algorithm and by using the data set X in the step 20 comprises the following steps:
step 21, randomly selecting m pieces of sample data from a data set X, and recording a set of the sample data as a subset XiSubset XiGenerating an isolated binary tree;
step 22, from subset XiRandomly selecting a feature f and then randomly selecting a tangent point p to segment the subset XsWherein the value of p is between the maximum value and the minimum value of the characteristic f, and p is taken as a hyperplane to take the subset XiThe sample data in (1) is divided into two parts;
step 23, if the subset X is not the sameiIf the value of the characteristic f of certain sample data in the node is larger than the value of the tangent point p, dividing the sample data to the right child of the node; if the subset XiIf the value of the characteristic f of certain sample data in the node is smaller than the value of the tangent point p, the sample data is divided into the left children of the node;
step 24, repeatedly carrying out segmentation on the left child and the right child of the node according to the method in the steps 22 and 23, and stopping continuous generation when a set condition is reached to obtain all nodes of the isolated binary tree;
step 25, the set of nodes of the isolated binary tree is marked as Node ═ n1,n2,…,nrH, the path length of the node is recorded as H ═ H1,h2,…,hrCalculating the standard deviation sigma of the path length of the nodes of the isolated binary tree;
step 26, repeating steps 21 to 25 until a set of n isolated binary trees and the standard deviation of the path length of the corresponding isolated binary tree is generated;
step 27, performing normalization processing on the set of standard deviations of the path lengths of all the isolated binary trees to generate a set of weight values corresponding to each isolated binary tree;
and 28, returning an isolated forest model, wherein the isolated forest model comprises the generated n isolated binary trees and the corresponding weight value sets thereof.
4. The isolated forest-based user abnormal access behavior detection method as claimed in claim 3, wherein the setting conditions in step 24 comprise three conditions:
(1) the isolated binary tree reaches a set maximum height;
(2) the nodes of the left child and/or the right child have a plurality of same sample data;
(3) there is only one piece of data in the nodes of the left child and/or the right child.
5. A method for detecting abnormal user access behavior based on isolated forests as claimed in claim 3, wherein the calculation formula for calculating the standard deviation σ of the path lengths of the nodes of the isolated binary tree in step 25 is:
Figure FDA0003335346750000021
where μ is the average of the path lengths of all nodes, r represents the number of nodes in the isolated binary tree, hjRepresenting the jth node of the isolated binary tree.
6. The isolated forest-based user abnormal access behavior detection method as claimed in claim 3, wherein the method for normalizing the set of standard deviations of the path lengths of all isolated binary trees in step 27 comprises:
Figure FDA0003335346750000031
wherein the content of the first and second substances,
Figure FDA0003335346750000032
representing the weight value corresponding to the ith binary isolated tree; set of standard deviations of path lengths for all isolated binary trees { σ }12,…,σn},σiStandard deviation, σ, representing the path length of the ith isolated binary treemaxIs the maximum value, σ, in the set of standard deviations of the path lengths of all isolated binary treesminIs the minimum value in the set of standard deviations of the path lengths of all isolated binary trees.
7. The isolated forest-based user abnormal access behavior detection method as claimed in claim 6, wherein the method for performing abnormal detection on to-be-detected log data of user access behaviors by using the isolated forest model in step 30 comprises:
step 31, performing data preprocessing on the log data to be detected of the user access behavior, and removing redundant data in the log data to be detected of the user access behavior;
step 32, describing the log data to be detected of the user access behavior after data preprocessing in a tuple mode, and accordingly sorting and combining the log data to be detected to form a data set to be detected;
step 33, inputting the data set to be detected into the isolated forest model, and calculating a set H ═ H of the path length of each sample data in the data set to be detected on each isolated binary tree in the isolated forest model1,h2,…,hnThe set of weighted values obtained in step 20 is recorded as
Figure FDA0003335346750000033
And (3) weighting and calculating an abnormal value y (a, m) of the sample data a in the data set to be detected according to the following formula:
Figure FDA0003335346750000034
Figure FDA0003335346750000035
H(x)=lnx+ξ
wherein xi is an Euler constant;
and step 33, finally determining whether the sample data is abnormal according to the abnormal value of the sample data in the data set to be detected, and outputting a corresponding abnormal detection result.
8. The isolated forest-based user abnormal access behavior detection method as claimed in claim 1, wherein the method for processing the abnormal detection result in step 40 comprises the following steps:
step 41, storing the context data of the sample data identified as abnormal in the abnormal detection result; independently storing the sample data identified as abnormal, and keeping the key directivity information in the sample data;
step 42, generating abnormal alarm information according to the abnormal detection result; the abnormal alarm information encapsulates the user object corresponding to the sample data identified as abnormal, the specific content of the sample data and the alarm level; and sending the generated abnormal alarm information to a corresponding alarm center for response, and meanwhile, reserving the abnormal alarm information.
CN202111292478.2A 2021-11-03 2021-11-03 User abnormal access behavior detection method based on isolated forest Pending CN114090402A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111292478.2A CN114090402A (en) 2021-11-03 2021-11-03 User abnormal access behavior detection method based on isolated forest

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111292478.2A CN114090402A (en) 2021-11-03 2021-11-03 User abnormal access behavior detection method based on isolated forest

Publications (1)

Publication Number Publication Date
CN114090402A true CN114090402A (en) 2022-02-25

Family

ID=80298724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111292478.2A Pending CN114090402A (en) 2021-11-03 2021-11-03 User abnormal access behavior detection method based on isolated forest

Country Status (1)

Country Link
CN (1) CN114090402A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114991225A (en) * 2022-04-14 2022-09-02 华中科技大学 Deep foundation pit deformation monitoring method and device and server
CN115238779A (en) * 2022-07-12 2022-10-25 中移互联网有限公司 Anomaly detection method, device, equipment and medium for cloud disk
CN115563616A (en) * 2022-08-19 2023-01-03 广州大学 Defense method for localized differential privacy data virus attack
CN116628775A (en) * 2023-07-20 2023-08-22 江苏华存电子科技有限公司 Abnormal access identification method and system for cloud storage data
CN117592975A (en) * 2024-01-18 2024-02-23 山东通维信息工程有限公司 Operation and maintenance decision processing method and system for electromechanical equipment of expressway based on cloud computing
CN117670067A (en) * 2024-02-01 2024-03-08 青岛博什兰物联技术有限公司 Quality safety management method and platform based on big data

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114991225A (en) * 2022-04-14 2022-09-02 华中科技大学 Deep foundation pit deformation monitoring method and device and server
CN114991225B (en) * 2022-04-14 2023-12-26 华中科技大学 Deep foundation pit deformation monitoring method, device and server
CN115238779A (en) * 2022-07-12 2022-10-25 中移互联网有限公司 Anomaly detection method, device, equipment and medium for cloud disk
CN115238779B (en) * 2022-07-12 2023-09-19 中移互联网有限公司 Cloud disk abnormality detection method, device, equipment and medium
CN115563616A (en) * 2022-08-19 2023-01-03 广州大学 Defense method for localized differential privacy data virus attack
CN115563616B (en) * 2022-08-19 2024-04-16 广州大学 Defense method for localized differential privacy data poisoning attack
CN116628775A (en) * 2023-07-20 2023-08-22 江苏华存电子科技有限公司 Abnormal access identification method and system for cloud storage data
CN116628775B (en) * 2023-07-20 2023-11-14 江苏华存电子科技有限公司 Abnormal access identification method and system for cloud storage data
CN117592975A (en) * 2024-01-18 2024-02-23 山东通维信息工程有限公司 Operation and maintenance decision processing method and system for electromechanical equipment of expressway based on cloud computing
CN117670067A (en) * 2024-02-01 2024-03-08 青岛博什兰物联技术有限公司 Quality safety management method and platform based on big data

Similar Documents

Publication Publication Date Title
CN114090402A (en) User abnormal access behavior detection method based on isolated forest
CN107622333B (en) Event prediction method, device and system
CN108874927B (en) Intrusion detection method based on hypergraph and random forest
CN107493277B (en) Large data platform online anomaly detection method based on maximum information coefficient
CN107992746A (en) Malicious act method for digging and device
CN111600919B (en) Method and device for constructing intelligent network application protection system model
CN109818961B (en) Network intrusion detection method, device and equipment
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN111340063A (en) Coal mill data anomaly detection method
CN111556016B (en) Network flow abnormal behavior identification method based on automatic encoder
CN111598179B (en) Power monitoring system user abnormal behavior analysis method, storage medium and equipment
WO2023093100A1 (en) Method and apparatus for identifying abnormal calling of api gateway, device, and product
CN111191767A (en) Vectorization-based malicious traffic attack type judgment method
CN111191720B (en) Service scene identification method and device and electronic equipment
CN112613599A (en) Network intrusion detection method based on generation countermeasure network oversampling
CN114817425B (en) Method, device and equipment for classifying cold and hot data and readable storage medium
CN110011990A (en) Intranet security threatens intelligent analysis method
CN117081858A (en) Intrusion behavior detection method, system, equipment and medium based on multi-decision tree
CN115828180A (en) Log anomaly detection method based on analytic optimization and time sequence convolution network
Abinaya et al. Spam detection on social media platforms
CN112882899B (en) Log abnormality detection method and device
CN116841779A (en) Abnormality log detection method, abnormality log detection device, electronic device and readable storage medium
CN111431884A (en) Host computer defect detection method and device based on DNS analysis
CN116582309A (en) GAN-CNN-BiLSTM-based network intrusion detection method
CN111209158B (en) Mining monitoring method and cluster monitoring system for server cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination