CN115879028A - Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium - Google Patents

Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium Download PDF

Info

Publication number
CN115879028A
CN115879028A CN202211733419.9A CN202211733419A CN115879028A CN 115879028 A CN115879028 A CN 115879028A CN 202211733419 A CN202211733419 A CN 202211733419A CN 115879028 A CN115879028 A CN 115879028A
Authority
CN
China
Prior art keywords
real
time
data
training
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211733419.9A
Other languages
Chinese (zh)
Inventor
李昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Original Assignee
Beijing Topsec Technology Co Ltd
Beijing Topsec Network Security Technology Co Ltd
Beijing Topsec Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Topsec Technology Co Ltd, Beijing Topsec Network Security Technology Co Ltd, Beijing Topsec Software Co Ltd filed Critical Beijing Topsec Technology Co Ltd
Priority to CN202211733419.9A priority Critical patent/CN115879028A/en
Publication of CN115879028A publication Critical patent/CN115879028A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a real-time anomaly detection method based on isolated forest dynamic training, which comprises the following steps: acquiring historical flow data and real-time flow data generated in a current time window based on historical retrospective time set by a user to generate a data set based on the historical flow data and the real-time flow data, and randomly sampling sample data from the data set based on sampling parameters set by the user; training the sample data based on sample division attributes, division values and training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension and the like. The method and the device can realize dynamic detection of the abnormal behavior event, and improve the flexibility of training and detection.

Description

Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium
Technical Field
The application relates to the technical field of computers, in particular to a real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and a storage medium.
Background
With the social and economic development, the scenes of network behaviors are increasingly diversified, the occurrence frequency of abnormal behavior events is increased, and the technical means for detecting abnormal persons/groups is enriched.
At present, the prior art provides a training method of a random forest model and an abnormal flow detection device based on the method, however, a base line obtained by training in the method is constant, and unless the device is restarted for training again, a training result is always a constant value in a complete detection process, so that the method cannot be applied to a real-time distraction scene.
In addition, the method has the defects that the user is difficult to refer to the training detection process, the accuracy is low and the like.
Disclosure of Invention
The embodiment of the application aims to provide a real-time abnormal detection method and device based on isolated forest dynamic training, electronic equipment and a storage medium, which are used for dynamically detecting abnormal behavior events and improving the flexibility of training and detection.
In a first aspect, the present invention provides a real-time anomaly detection method based on isolated forest dynamic training, which is applied to a real-time computation framework, wherein the real-time computation framework executes the method cyclically, and each time the method is executed, the method is a time window, and the method includes:
acquiring historical flow data and real-time flow data generated in a current time window based on historical tracing time set by a user;
generating a data set based on the historical flow data and the real-time flow data, and randomly sampling sample data from the data set based on the sampling parameters set by the user;
training the sample data based on sample partition attributes, partition values and training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
inputting the real-time traffic data into the orphan forest model such that the orphan forest model calculates an outlier score of the real-time traffic data;
and judging whether the real-time flow data is an abnormal behavior event or not based on the score threshold set by the user and the abnormal value score of the real-time flow data.
In the first aspect of the present application, the real-time anomaly detection method based on isolated forest dynamic training is applied to a real-time computation framework, and the real-time computation framework executes the method once every time window, so that along with the rolling of the time window of the real-time computation framework, an isolated forest model can be dynamically trained, and whether real-time traffic data is an abnormal behavior event or not can be dynamically detected through the isolated forest model, that is, the present application can implement dynamic sampling, dynamic training and dynamic anomaly detection, and thus can be applied to a real-time analysis scene.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, the data marking of the sample data is not needed, so that the performance loss caused by the data marking can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In an optional embodiment, the determining whether the real-time traffic data is an abnormal behavior event based on the score threshold set by the user and the abnormal value score of the real-time traffic data includes:
and comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In the above optional embodiment, by comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, the real-time traffic data can be determined as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In an optional implementation manner, before the training the sample data based on the sample partition attribute and the partition value and obtaining the isolated forest model, the method further includes:
generating the sample partitioning attribute and the partitioning value based on a stochastic algorithm.
In the above optional embodiment, the sample partition attribute and the partition value may be generated by a random algorithm, so that the flexibility and the reliability of the isolated forest model may be improved. Compared with the prior art, the prior art selects the partition attribute and the partition value according to the information gain or based on the index, and the method has the defects of low flexibility and reliability.
In an alternative embodiment, the isolated forest model calculates an outlier score of the real-time traffic data, comprising:
calculating the path length of the real-time flow data at each training result tree;
calculating a path length expected value based on the path length of the real-time flow data at each training result tree;
calculating an outlier score for the real-time traffic data based on the path length expected value and a standard path length.
In the above optional embodiment, by calculating the path length of the real-time traffic data at each of the training result trees, the expected path length value can be calculated based on the path length of the real-time traffic data at each of the training result trees, and the abnormal value score of the real-time traffic data can be calculated based on the expected path length value and the standard path length.
In a second aspect, the present invention provides a real-time anomaly detection apparatus based on isolated forest dynamic training, which is applied in a real-time computing framework, wherein the real-time computing framework calls the apparatus cyclically, and each time the apparatus is called as a time window, the apparatus includes:
the acquisition module is used for acquiring historical flow data and real-time flow data generated in a current time window based on historical retroactive time set by a user;
the first generation module is used for generating a data set based on the historical flow data and the real-time flow data and randomly sampling sample data from the data set based on the sampling parameters set by the user;
the training module is used for training the sample data based on sample division attributes, division values and training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
a detection module for inputting the real-time traffic data into the isolated forest model so that the isolated forest model calculates an outlier score of the real-time traffic data;
and the judging module is used for judging whether the real-time flow data is an abnormal behavior event or not based on the score threshold set by the user and the abnormal value score of the real-time flow data.
In the second aspect of the present application, the real-time anomaly detection method based on isolated forest dynamic training is applied to a real-time computation framework, and the real-time computation framework executes the method once in each time window, so that the isolated forest model can be dynamically trained and whether real-time flow data is an abnormal behavior event can be dynamically detected through the isolated forest model along with the rolling of the time window of the real-time computation framework, that is, the present application can realize dynamic sampling, dynamic training and dynamic anomaly detection, and thus can be applied to a real-time analysis scene.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, the data marking of the sample data is not needed, so that the performance loss caused by the data marking can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In an optional embodiment, the specific way for the determining module to determine whether the real-time traffic data is an abnormal behavior event based on the score threshold set by the user and the abnormal value score of the real-time traffic data is as follows:
and comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In the above optional embodiment, by comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, the real-time traffic data 5 can be determined as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In an alternative embodiment, the apparatus further comprises:
a second generation module to generate the sample division attribute and the division value based on a random algorithm.
In the above optional embodiment, the sample partition attribute and the partition value 0 may be generated by a random algorithm, so that the flexibility and the reliability of the isolated forest model may be improved. Compared with the prior art, the method has the advantages that,
the prior art selects the partition attribute and the partition value according to the information gain or based on the index, which has the disadvantages of low flexibility and reliability.
In an alternative embodiment, the detection module comprises:
the first calculation submodule is used for calculating the path length of the real-time flow data at each training result tree 5;
the second calculation submodule is used for calculating a path length expected value based on the path length of the real-time flow data in each training result tree;
and the third calculation submodule is used for calculating the abnormal value fraction of the real-time flow data based on the expected path length value and the standard path length.
In the above optional embodiment, by calculating the path length of the real-time traffic data at each of the training result trees, the expected path length value can be calculated based on the path length of the real-time traffic data at each of the training result trees, and the abnormal value score of the real-time traffic data can be calculated based on the expected path length value and the standard path length.
In a third aspect, the present invention provides an electronic device comprising:
a processor; and
a memory configured to store machine readable instructions that, when executed by the processor, perform the isolated forest dynamic training based real-time anomaly detection method according to any one of the preceding embodiments.
In the third aspect of the present application, the electronic device can implement dynamic sampling, dynamic training, and dynamic anomaly detection by executing a real-time anomaly detection method based on isolated forest dynamic training, and thus can be applied in a real-time analysis scenario.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, the data marking of the sample data is not needed, so that the performance loss caused by the data marking can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In a fourth aspect, the present invention provides a storage medium storing a computer program, wherein the computer program is executed by a processor to execute the real-time anomaly detection method based on isolated forest dynamic training according to any one of the foregoing embodiments.
In the fourth aspect of the present application, the storage medium can implement dynamic sampling, dynamic training, and dynamic anomaly detection by the executed real-time anomaly detection method based on isolated forest dynamic training, and thus can be applied to a real-time analysis scenario.
On the other hand, the historical flow data acquisition process adopts the historical retroactive time set by the user, the data sampling process adopts the sampling parameters set by the user, the model training process adopts the training parameters set by the user, and the abnormal behavior event judgment process adopts the score threshold set by the user, so that the user can adjust the parameters according to the actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of model training detection cannot be adjusted.
On the other hand, the data marking of the sample data is not needed, so that the performance loss caused by the data marking can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
FIG. 1 is a schematic flow chart of a real-time anomaly detection method based on isolated forest dynamic training disclosed in an embodiment of the present application;
FIG. 2 is a schematic diagram of a real-time computing framework according to an embodiment of the present disclosure, which performs a real-time anomaly detection method based on isolated forest dynamic training in a loop;
FIG. 3 is a schematic structural diagram of a real-time anomaly detection device based on isolated forest dynamic training disclosed in an embodiment of the present application;
fig. 4 is a schematic structural diagram of an electronic device disclosed in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a real-time anomaly detection method based on isolated forest dynamic training, which is applied to a real-time computation framework, wherein the real-time computation framework cyclically executes a method, and each execution method is a time window, and the method in the embodiment of the present application includes the following steps:
101. acquiring historical flow data and real-time flow data generated in a current time window based on historical tracing time set by a user;
102. generating a data set based on historical flow data and real-time flow data, and randomly sampling sample data from the data set based on sampling parameters set by a user;
103. training sample data based on sample division attributes, division values and training parameters set by a user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
104. inputting the real-time flow data into the isolated forest model so that the isolated forest model calculates the abnormal value fraction of the real-time flow data;
105. and judging whether the real-time flow data is an abnormal behavior event or not based on a score threshold set by a user and the abnormal value score of the real-time flow data.
In the embodiment of the application, the real-time anomaly detection method based on the isolated forest dynamic training is applied to the real-time calculation frame, and the real-time calculation frame executes the method once every time window, so that the isolated forest model can be dynamically trained and whether the real-time flow data is an abnormal behavior event or not can be dynamically detected through the isolated forest model along with the rolling of the time window of the real-time calculation frame, namely, the application can realize dynamic sampling, dynamic training and dynamic anomaly detection, and can be applied to a real-time analysis scene.
On the other hand, the historical flow data acquisition process adopts the historical retroactive time set by the user, the data sampling process adopts the sampling parameters set by the user, the model training process adopts the training parameters set by the user, and the abnormal behavior event judgment process adopts the score threshold set by the user, so that the user can adjust the parameters according to the actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, since the embodiment of the application does not need to perform data annotation on the sample data, performance loss caused by data annotation can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In an embodiment of the present application, please refer to fig. 2 for a method for a real-time computing framework to execute the embodiment of the present application in a circulating manner, and fig. 2 is a schematic diagram of a method for a real-time computing framework to execute a real-time anomaly detection method based on isolated forest dynamic training in a circulating manner according to an embodiment of the present application. As shown in fig. 2, the real-time computing framework executes the method of the embodiment of the present application once in each time window, for example, executes the method of the embodiment of the present application once in each of the rolling window 1 and the rolling window 2, wherein each time the method of the embodiment of the present application is executed, the method is executed based on the implementation traffic data acquired in the current time window, and thus, the training and the detection of the isolated forest model are executed based on dynamic samples.
In this embodiment of the present application, for step 101, a user may customize a historical trace back time, for example, set the historical trace back time to two months, so as to obtain historical traffic data within two months.
In the embodiment of the present application, for step 102, a user may customize a sampling parameter, wherein the sampling parameter may include the number of samples.
In this embodiment of the present application, for step 103, the dimension may be a domain name, or may also be an IP address, and accordingly, for n data samples, it is necessary to determine whether the traffic data is normal or not from different angles, for example, determine whether the traffic data is normal or not from the angle of the value of the IP address, and determine whether the traffic data is normal or not from the angle of the value of the domain name, where each dimension generates a training result tree.
In this embodiment of the present application, for step 103, the sample partition attribute corresponds to one dimension, where the sample partition attribute is randomly generated and corresponds to a random dimension.
In this embodiment, as an example, for step 103, for n sample data, the partition value is between the maximum value and the minimum value of the specified dimension, where the partition value partitions the n sample data into 2 subspaces, sample data smaller than the partition value is placed at the left branch of the partition value, sample data greater than or equal to the partition value is placed at the right branch of the partition value, and then the left branch and the right branch nodes with the partition value as nodes are recursively cut, and new leaf nodes are continuously constructed until only one data or tree on the leaf nodes has grown to the set height.
In an optional embodiment, the determining whether the real-time traffic data is an abnormal behavior event based on a score threshold set by a user and an abnormal value score of the real-time traffic data includes the following sub-steps:
and comparing the abnormal value score of the real-time traffic data with a score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In the above alternative embodiment, by comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, the real-time traffic data can be determined as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In an alternative embodiment, in step: before training sample data based on the sample partition attribute and the partition value and obtaining an isolated forest model, the method of the embodiment of the application further comprises the following steps:
the sample partition attributes and partition values are generated based on a random algorithm.
In the optional implementation manner, the sample partition attribute and the partition value can be generated through a random algorithm, so that the flexibility and the reliability of the isolated forest model can be improved. Compared with the prior art, the prior art selects the partition attribute and the partition value according to the information gain or based on the index, and the method has the defects of low flexibility and reliability.
In an alternative embodiment, the isolated forest model calculates an outlier score for the real-time traffic data, comprising:
calculating the path length of the real-time flow data in each training fruiting tree;
calculating a path length expected value based on the path length of the real-time flow data in each training result tree;
an outlier score is calculated for the real-time traffic data based on the path length expected value and the standard path length.
In the above alternative embodiment, by calculating the path length of the real-time traffic data at each training result tree, the expected path length value can be calculated based on the path length of the real-time traffic data at each training result tree, and the abnormal value score of the real-time traffic data can be calculated based on the expected path length value and the standard path length.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of a real-time anomaly detection apparatus based on isolated forest dynamic training, which is applied to a real-time computing frame, wherein the real-time computing frame cyclically invokes the apparatus, and each invoking apparatus is a time window. As shown in fig. 3, the apparatus of the embodiment of the present application includes the following functional modules:
an obtaining module 201, configured to obtain historical traffic data and real-time traffic data generated in a current time window based on historical tracing time set by a user;
the first generation module 202 is configured to generate a data set based on historical traffic data and real-time traffic data, and randomly sample data from the data set based on sampling parameters set by a user;
the training module 203 is used for training the sample data based on the sample division attribute, the division value and the training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
the detection module 204 is used for inputting the real-time flow data into the isolated forest model so that the isolated forest model can calculate the abnormal value fraction of the real-time flow data;
the determining module 205 is configured to determine whether the real-time traffic data is an abnormal behavior event based on a score threshold set by a user and an abnormal value score of the real-time traffic data.
In the embodiment of the application, the real-time anomaly detection method based on the isolated forest dynamic training is applied to the real-time calculation frame, and the real-time calculation frame executes the method once every time window, so that the isolated forest model can be dynamically trained and whether the real-time flow data is an abnormal behavior event or not can be dynamically detected through the isolated forest model along with the rolling of the time window of the real-time calculation frame, namely, the application can realize dynamic sampling, dynamic training and dynamic anomaly detection, and can be applied to a real-time analysis scene.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, since the embodiment of the application does not need to perform data annotation on the sample data, performance loss caused by data annotation can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In an optional embodiment, the specific manner of the determining module performing the determination of whether the real-time traffic data is an abnormal behavior event based on the score threshold set by the user and the abnormal value score of the real-time traffic data is as follows:
and comparing the abnormal value score of the real-time traffic data with a score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In the above alternative embodiment, by comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, the real-time traffic data can be determined as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
In an optional implementation manner, the apparatus in the embodiment of the present application further includes:
and the second generation module is used for generating the sample division attribute and the division value based on a random algorithm.
In the optional implementation manner, the sample partition attribute and the partition value can be generated through a random algorithm, so that the flexibility and the reliability of the isolated forest model can be improved. Compared with the prior art, the prior art selects the partition attribute and the partition value according to the information gain or based on the index, and the method has the defects of low flexibility and reliability.
In an alternative embodiment, the detection module 204 includes the following sub-modules:
the first calculation submodule is used for calculating the path length of the real-time flow data in each training result tree;
the second calculation submodule is used for calculating the path length expected value based on the path length of the real-time flow data in each training result tree;
and the third calculation sub-module is used for calculating the abnormal value fraction of the real-time flow data based on the path length expected value and the standard path length.
In the above alternative embodiment, by calculating the path length of the real-time traffic data at each training result tree, the expected path length value can be calculated based on the path length of the real-time traffic data at each training result tree, and the abnormal value score of the real-time traffic data can be calculated based on the expected path length value and the standard path length.
EXAMPLE III
Referring to fig. 4, fig. 4 is a schematic structural diagram of an electronic device disclosed in the embodiment of the present application. As shown in fig. 4, the electronic device of the embodiment of the present application includes:
a processor 301; and
a memory 302 configured to store machine readable instructions that, when executed by the processor 301, perform a method for real-time anomaly detection based on isolated forest dynamic training according to any of the previous embodiments.
In the embodiment of the application, the electronic device can realize dynamic sampling, dynamic training and dynamic anomaly detection through the executed real-time anomaly detection method based on isolated forest dynamic training, so that the electronic device can be applied to a real-time analysis scene.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, since the embodiment of the application does not need to perform data annotation on the sample data, performance loss caused by data annotation can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
Example four
The embodiment of the application provides a storage medium, wherein a computer program is stored in the storage medium, and the computer program is executed by a processor to execute the isolated forest dynamic training-based real-time anomaly detection method according to any one of the previous embodiments.
In the embodiment of the application, the storage medium can realize dynamic sampling, dynamic training and dynamic anomaly detection through the executed real-time anomaly detection method based on isolated forest dynamic training, so that the storage medium can be applied to a real-time analysis scene.
On the other hand, as the historical retroactive time set by the user is adopted in the historical flow data acquisition process, the sampling parameter set by the user is adopted in the data sampling process, the training parameter set by the user is adopted in the model training process, and the score threshold set by the user is adopted in the abnormal behavior event judgment process, the user can adjust the parameters according to actual requirements in the whole training detection process, and the method has better adjustment flexibility. However, in the prior art, the relevant parameters of the model training detection cannot be adjusted.
On the other hand, since the embodiment of the application does not need to perform data annotation on the sample data, performance loss caused by data annotation can be avoided. Simultaneously, the isolated forest model of this application compares with the random forest model among the prior art, has higher anomaly detection accuracy.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of one logic function, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
In addition, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
Furthermore, the functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
It should be noted that the functions, if implemented in the form of software functional modules and sold or used as independent products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and changes may be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (10)

1. A real-time anomaly detection method based on isolated forest dynamic training is applied to a real-time computing framework, wherein the real-time computing framework circularly executes the method, and each time the method is executed is a time window, the method comprises the following steps:
acquiring historical flow data and real-time flow data generated in a current time window based on historical tracing time set by a user;
generating a data set based on the historical flow data and the real-time flow data, and randomly sampling sample data from the data set based on sampling parameters set by the user;
training the sample data based on sample division attributes, division values and training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
inputting the real-time traffic data into the orphan forest model such that the orphan forest model calculates an outlier score of the real-time traffic data;
and judging whether the real-time flow data is an abnormal behavior event or not based on the score threshold set by the user and the abnormal value score of the real-time flow data.
2. The method of claim 1, wherein the determining whether the real-time traffic data is an anomalous behavioral event based on the user-set score threshold and the outlier score of the real-time traffic data comprises:
and comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
3. The method of claim 1, wherein prior to training the sample data and deriving an isolated forest model based on sample partition attributes and partition values, the method further comprises:
generating the sample partitioning attribute and the partitioning value based on a stochastic algorithm.
4. A method as claimed in claim 1, wherein the isolated forest model calculates an outlier score of the real-time traffic data, comprising:
calculating the path length of the real-time flow data in each training result tree;
calculating a path length expected value based on the path length of the real-time flow data at each training result tree;
calculating an outlier score for the real-time traffic data based on the path length expected value and a standard path length.
5. A real-time anomaly detection device based on isolated forest dynamic training is applied to a real-time computing framework, wherein the real-time computing framework calls the device circularly, and each time the device is called, the device comprises:
the acquisition module is used for acquiring historical flow data and real-time flow data generated in a current time window based on historical tracing time set by a user;
the first generation module is used for generating a data set based on the historical flow data and the real-time flow data and randomly sampling sample data from the data set based on the sampling parameters set by the user;
the training module is used for training the sample data based on sample division attributes, division values and training parameters set by the user to obtain an isolated forest model, wherein the isolated forest model comprises training result trees of each random dimension;
a detection module for inputting the real-time traffic data into the isolated forest model so that the isolated forest model calculates an outlier score of the real-time traffic data;
and the judging module is used for judging whether the real-time flow data is an abnormal behavior event or not based on the score threshold set by the user and the abnormal value score of the real-time flow data.
6. The apparatus of claim 5, wherein the determining module performs the determining whether the real-time traffic data is an abnormal behavior event based on 5 score thresholds set by the user and the abnormal value score of the real-time traffic data by:
comparing the abnormal value score of the real-time traffic data with the score threshold set by the user, and determining the real-time traffic data as an abnormal behavior event if the abnormal value score of the real-time traffic data is greater than the score threshold set by the user.
7. The apparatus of claim 6, wherein the apparatus further comprises:
a second generation module to generate the sample division attribute and the division value based on a random algorithm.
8. The method of claim 1, wherein the detection module comprises:
the first calculation submodule is used for calculating the 5 path length of the real-time flow data in each training result tree;
the second calculation submodule is used for calculating a path length expected value based on the path length of the real-time flow data in each training result tree;
and the third calculation submodule is used for calculating the abnormal value fraction of the real-time flow data based on the expected path length value and the standard path length.
9. An electronic device, comprising:
a processor; and
a memory configured to store machine readable instructions which, when executed by the processor, perform the isolated forest dynamic training based real-time anomaly detection method of any one of claims 1-4.
10. A storage medium, characterized in that the storage medium stores a computer program, the computer program is executed by a processor to execute the isolated forest dynamic training-based real-time anomaly detection method according to any one of claims 1-7.
CN202211733419.9A 2022-12-30 2022-12-30 Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium Pending CN115879028A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211733419.9A CN115879028A (en) 2022-12-30 2022-12-30 Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211733419.9A CN115879028A (en) 2022-12-30 2022-12-30 Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115879028A true CN115879028A (en) 2023-03-31

Family

ID=85757727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211733419.9A Pending CN115879028A (en) 2022-12-30 2022-12-30 Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115879028A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117741514A (en) * 2024-02-21 2024-03-22 山东中船线缆股份有限公司 State detection method and system for marine cable

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117741514A (en) * 2024-02-21 2024-03-22 山东中船线缆股份有限公司 State detection method and system for marine cable
CN117741514B (en) * 2024-02-21 2024-05-07 山东中船线缆股份有限公司 State detection method and system for marine cable

Similar Documents

Publication Publication Date Title
US11463476B2 (en) Character string classification method and system, and character string classification device
US10608905B2 (en) Method and system for temporal sampling in evolving network
US8849798B2 (en) Sampling analysis of search queries
US10990616B2 (en) Fast pattern discovery for log analytics
CN108718298B (en) Malicious external connection flow detection method and device
CN108521612B (en) Video abstract generation method, device, server and storage medium
CN111460011A (en) Page data display method and device, server and storage medium
Landauer et al. Time series analysis: unsupervised anomaly detection beyond outlier detection
CN112988509A (en) Alarm message filtering method and device, electronic equipment and storage medium
CN115879028A (en) Real-time anomaly detection method and device based on isolated forest dynamic training, electronic equipment and storage medium
CN111787002B (en) Method and system for analyzing safety of service data network
CN111327466B (en) Alarm analysis method, system, equipment and medium
CN113992340A (en) User abnormal behavior recognition method, device, equipment, storage medium and program
US9934291B2 (en) Dynamic presentation of a results set by a form-based software application
CN111885011B (en) Method and system for analyzing and mining safety of service data network
CN114172705B (en) Network big data analysis method and system based on pattern recognition
CN112130944A (en) Page abnormity detection method, device, equipment and storage medium
CN117093556A (en) Log classification method, device, computer equipment and computer readable storage medium
CN105550250B (en) A kind of processing method and processing device of access log
US9235639B2 (en) Filter regular expression
CN112235312B (en) Method and device for determining credibility of security event and electronic equipment
CN113961565A (en) Data detection method, system, computer system and readable storage medium
CN107644103A (en) It is a kind of can tracing information source information storage method and system
CN114710325A (en) Method, device, equipment and storage medium for constructing network intrusion detection model
CN115018124A (en) Data prediction method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination