CN110933115B - Analysis object behavior abnormity detection method and device based on dynamic session - Google Patents

Analysis object behavior abnormity detection method and device based on dynamic session Download PDF

Info

Publication number
CN110933115B
CN110933115B CN201911401991.3A CN201911401991A CN110933115B CN 110933115 B CN110933115 B CN 110933115B CN 201911401991 A CN201911401991 A CN 201911401991A CN 110933115 B CN110933115 B CN 110933115B
Authority
CN
China
Prior art keywords
analysis object
point
interval
session
points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911401991.3A
Other languages
Chinese (zh)
Other versions
CN110933115A (en
Inventor
周晓勇
梁淑云
刘胜
马影
陶景龙
王启凡
魏国富
徐�明
殷钱安
余贤喆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN201911401991.3A priority Critical patent/CN110933115B/en
Publication of CN110933115A publication Critical patent/CN110933115A/en
Application granted granted Critical
Publication of CN110933115B publication Critical patent/CN110933115B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a dynamic session-based analysis object behavior abnormity detection method and a dynamic session-based analysis object behavior abnormity detection device, wherein the method comprises the following steps: collecting an operation log of an analysis object in a service system; based on the operation log, aiming at each analysis object, acquiring an interval mark corresponding to each time interval according to whether the operation time interval corresponding to the analysis object is greater than a preset time length or not, and acquiring interval characteristics corresponding to the analysis object; carrying out normalization processing on the interval features, and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object. The invention can reduce the missing report.

Description

Analysis object behavior abnormity detection method and device based on dynamic session
Technical Field
The invention relates to analysis object behavior abnormity detection based on dynamic session, in particular to an analysis object behavior abnormity detection method based on dynamic session.
Background
With the development of internet technology and enterprise network infrastructure, the number of network users is rapidly increasing, and data security is more and more important for both enterprises and individuals. In the existing data security problem, besides external behaviors damaging data security such as hacking, virus intrusion, brute force cracking and the like, behaviors violating security policies caused by abnormal operations of internal users can greatly threaten data security, so that auditing of the behaviors of the internal users is enhanced, and further data security protection capability of enterprises is improved.
At present, a rule engine and a supervised learning model are commonly used as an auditing method for internal user behaviors, wherein the principle of the rule engine method is to summarize a series of rules in a manual or semi-manual mode by combining service experience, and when a user behavior triggers a rule, the rule belongs to an abnormal behavior, for example, when the operation frequency of a user exceeds a preset threshold value, an alarm is given; on the other hand, the behavior characteristics of the user are summarized through the historical case, a supervised detection model is trained, and whether the user behavior is abnormal or not is detected. However, when rules are summarized manually or semi-manually, a user needs to know and be familiar with each system and business, and meanwhile, the rules have rich rule configuration experience, and need to be deeply combined with the actual application of the user, so that the method has relatively poor application difficulty and operability, and meanwhile, the rules are easy to be found and further bypassed, and on the other hand, when the system has a temporary fault or is influenced by other external factors, the rule engine is easy to cause false alarm; the supervised learning model is trained by summarizing historical cases, but the method has the following problems that on one hand, internal abnormal behaviors can be generally found after a long time, so that the number of cases is small, namely, the number of label data is small; on the other hand, when some abnormal behaviors are found, similar abnormal behaviors are avoided as much as possible to prevent finding, so that the supervised model trained by the method has relatively low accuracy, poor adaptability and high missing report rate.
In conclusion, the technical problem of high missing report rate exists in the prior art.
Disclosure of Invention
The technical problem to be solved by the present invention is how to provide a method and a device for detecting abnormal behavior of an analysis object based on dynamic session to reduce the rate of missing report.
The invention solves the technical problems through the following technical means:
the embodiment of the invention provides an analysis object behavior abnormity detection method based on dynamic session, which comprises the following steps:
collecting an operation log of an analysis object in a business system, wherein the business system comprises: one or a combination of a CRM system, a decision-making system and a resource system; the analysis object includes: one or a combination of a user name, an IP address, an MAC address and an IP section;
based on the operation log, aiming at each analysis object, acquiring an interval mark corresponding to each time interval according to whether the operation time interval corresponding to the analysis object is greater than a preset time length or not, and acquiring interval characteristics corresponding to the analysis object;
carrying out normalization processing on the interval features, and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
Optionally, the collecting an operation log of the analysis object in the business system includes:
when the analysis object has different analysis object names in different service systems, performing association processing on the operation log of the analysis object by using keywords;
for each type of analysis object, summarizing operation logs of the analysis object in different service systems in the type into a table to obtain a T _ opr table;
and taking the T _ opr table as an operation log.
Optionally, for each analysis object, obtaining an interval mark corresponding to each time interval according to whether an operation time interval corresponding to the analysis object is greater than a preset duration, and obtaining an interval characteristic corresponding to the analysis object includes:
for each analysis object, acquiring an analysis object operation time interval corresponding to the analysis object based on an operation log corresponding to the analysis object;
acquiring an interval mark corresponding to each time interval according to the relative size between the operation time interval of the analysis object and a preset time length;
acquiring interval features corresponding to the analysis object according to the interval marks by using a statistical method, wherein the interval features comprise: the maximum duration of the same session _ id, the ratio of the maximum durations of the same session _ id, the maximum number of operations of the same session _ id, the maximum short interval number of operations of the same session _ id, the Mean value Mean of the number of operations of the session _ id, the standard deviation Std of the number of operations of the session _ id, the variation coefficient CV of the number of operations of the session _ id, the maximum continuous number of times of the same session _ id, the continuous number of times of the same interval, the ratio of the continuous number of times of the same interval, the continuous number of times of the short interval, the ratio of the continuous number of times of the short interval and the ratio of the number of times of the short interval are one or a combination thereof.
Optionally, the obtaining of the abnormal probability value corresponding to each analysis object by using the feature vector of each analysis object as an input of the SOS algorithm includes:
mapping the feature vectors of the analysis objects into a high-dimensional space as points;
for each point, calculating the distance from the point to other points by using a distance algorithm, and constructing a dissimilarity matrix D with n x n dimensions according to the distance, wherein the distance algorithm comprises the following steps: one of manhattan distance, mahalanobis distance, and euclidean distance;
according to the dissimilarity degree matrix, using a formula,
Figure BDA0002347715050000041
calculating the relevance between the point and other points, and constructing a relevance matrix according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
according to the matrix of the degree of association,by means of the formula (I) and (II),
Figure BDA0002347715050000042
calculating the association probability of the point relative to other points, and constructing an association probability matrix according to the association probability, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
based on the correlation probability matrix, using a formula,
Figure BDA0002347715050000043
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; Π being the sign of multiplication of successive terms.
Optionally, the calculating the distance from the point to each of the other points by using the distance algorithm includes:
for each point, the point is calculated, using the formula,
Figure BDA0002347715050000044
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
The embodiment of the invention also provides a device for detecting the abnormal behavior of the analysis object based on the dynamic session, which comprises the following components:
the collection module is used for collecting an operation log of an analysis object in a business system, wherein the business system comprises: one or a combination of a CRM system, a decision-making system and a resource system; the analysis object includes: one or a combination of a user name, an IP address, an MAC address and an IP section;
the acquisition module is used for acquiring an interval mark corresponding to each time interval according to whether the operation time interval corresponding to the analysis object is greater than the preset time length or not and acquiring interval characteristics corresponding to the analysis object aiming at each analysis object;
the anomaly identification module is used for carrying out normalization processing on the interval features and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
Optionally, the acquisition module is configured to:
when the analysis object has different analysis object names in different service systems, performing association processing on the operation log of the analysis object by using keywords;
for each type of analysis object, summarizing operation logs of the analysis object in different service systems in the type into a table to obtain a T _ opr table;
and taking the T _ opr table as an operation log.
Optionally, the obtaining module is configured to:
for each analysis object, acquiring an analysis object operation time interval corresponding to the analysis object based on an operation log corresponding to the analysis object;
acquiring an interval mark corresponding to each time interval according to the relative size between the operation time interval of the analysis object and a preset time length;
acquiring interval features corresponding to the analysis object according to the interval marks by using a statistical method, wherein the interval features comprise: the maximum duration of the same session _ id, the ratio of the maximum durations of the same session _ id, the maximum number of operations of the same session _ id, the maximum short interval number of operations of the same session _ id, the Mean value Mean of the number of operations of the session _ id, the standard deviation Std of the number of operations of the session _ id, the variation coefficient CV of the number of operations of the session _ id, the maximum continuous number of times of the same session _ id, the continuous number of times of the same interval, the ratio of the continuous number of times of the same interval, the continuous number of times of the short interval, the ratio of the continuous number of times of the short interval and the ratio of the number of times of the short interval are one or a combination thereof.
Optionally, the anomaly identification module is configured to:
mapping the feature vectors of the analysis objects into a high-dimensional space as points;
for each point, calculating the distance from the point to other points by using a distance algorithm, and constructing a dissimilarity matrix D with n x n dimensions according to the distance, wherein the distance algorithm comprises the following steps: one of manhattan distance, mahalanobis distance, and euclidean distance;
according to the dissimilarity degree matrix, using a formula,
Figure BDA0002347715050000061
calculating the relevance between the point and other points, and constructing a relevance matrix according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
according to the incidence matrix, using a formula,
Figure BDA0002347715050000062
calculating the association probability of the point relative to other points, and constructing an association probability matrix according to the association probability, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
based on the correlation probability matrix, using a formula,
Figure BDA0002347715050000071
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; Π being the sign of multiplication of successive terms.
Optionally, the anomaly identification module is configured to:
for each point, the point is calculated, using the formula,
Figure BDA0002347715050000072
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
The invention has the advantages that:
the invention analyzes the possible service characteristics of the abnormal user behavior based on the operation log data of the enterprise service system, namely, the purpose of the abnormal user is more definite, for example, the data stealing user is to steal data, all the behaviors of the data stealing user are to steal data, there may be some interfering behaviors that avoid being discovered, but still always for the purpose of "stealing" data, so that the behaviors of the abnormal users are more targeted, relatively centralized, relatively long in duration, relatively stable, because machine or script assistance may be needed to achieve the purpose, the inventors can reduce false alarm by establishing a dynamic session behavior sequence, constructing user behavior characteristics for the purpose of abnormality detection, and then performing user abnormal behavior identification.
Drawings
Fig. 1 is a schematic flowchart of a method for detecting abnormal behavior of an analysis object based on dynamic session according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a first list, a second list and a portion of interval features obtained in an embodiment of the present invention;
FIG. 3 is a diagram illustrating a correlation matrix in the SOS algorithm according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an analysis object behavior abnormality detection apparatus based on dynamic session according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Fig. 1 is a schematic flowchart of a method for detecting abnormal behavior of an analysis object based on dynamic session according to an embodiment of the present invention, where as shown in fig. 1, the method includes:
s101: collecting operation log data of an enterprise business system, such as a CRM (Customer relationship management) system, a decision-making system, a resource system, and the like, where each system log data includes, but is not limited to, the following fields: system identification (RES _ kill), system NAME (RES _ NAME), USER unique identification (USER _ ID), USER NAME (USER _ NAME), IP address (IP), MAC Address (MAC), operation time (OPR _ DATE), analysis OBJECT (OPR _ OBJECT), operation TYPE (OPR _ TYPE), and the like.
Then, in the case that each analysis object, for example, the USER name USER _ ID corresponding to the USER is the same in each service system, that is, the USER _ ID corresponding to "zhang san" is "a 0001", the USER _ ID corresponding to "zhang san" in each system is "a 0001", and the collected log data corresponding to the USER in each service system is integrated into one table, that is, T _ opr.
If the USER names of a certain USER in different service systems are different, different USER names can be processed into the same USER _ ID through the method of field association such as certificates, mailboxes or mobile phone numbers. This method is not described in detail herein for the prior art.
It should be noted that, in the embodiment of the present invention, other types of analysis objects, such as an IP address, a MAC address, and an IP segment, are also processed according to the above method.
S102: and acquiring interval marks corresponding to the time intervals according to whether the operation time intervals corresponding to the analysis objects are larger than the preset duration or not and acquiring interval characteristics corresponding to the analysis objects aiming at each analysis object on the basis of the operation logs.
For example, in the embodiment of the present invention, the processing manner in this step is described by taking a USER name USER _ ID as an example, and it can be understood that other types of analysis objects, such as an IP address, a MAC address, an IP segment, and the like, can be used as the analysis object in practical application.
The operation logs of each USER _ ID are sorted in ascending order by the operation time OPR _ DATE in a preset period, such as a unit of one day, to form a first sequence rd1(1, 2, 3 … n …) of operation logs.
Then, subtracting the operation time corresponding to the sequence n from the operation time corresponding to n +1 in the sequence to obtain a time interval OPR _ DUR between the nth operation log and the next adjacent operation log.
Then, a "dynamic session _ id" is preset for each time interval as its corresponding interval marker, where the initial value of the session _ id is 1. Hereinafter, the "dynamic session _ id" is referred to as session _ id. Judging whether the operation time interval is less than a preset threshold value for 20 minutes, if so, setting the session _ id corresponding to the operation log as an initial value of the session _ id, namely 1; if the value is not less than the preset threshold value, the session _ id corresponding to the operation log is the initial session _ id value plus 1 and is 2; by analogy, the session _ id value corresponding to each operation log can be obtained. The session _ ids corresponding to the operation logs are arranged in a time ascending manner, so that a third sequence, namely a session _ id sequence, can be obtained.
It should be emphasized that, in the embodiment of the present invention, the "dynamic session" is not exactly the same as the session in the network application, in the existing network application, the web server may determine whether the user has a session, if so, continue to use the original session, if not, create a session for the user, and after the session expires or is abandoned, the server will terminate the session. The dynamic session in the embodiment of the present invention is not a session automatically generated by a web script in a network application, but an operation time interval created according to an operation time interval of a user is used as a virtual dynamic session, which may reflect the persistence and continuity of a user operation.
Then, based on the session _ id sequence, the following method is used for constructing interval characteristics:
fig. 2 is a schematic diagram of the first list, the second list and a part of the interval characteristics obtained in the embodiment of the present invention, as shown in fig. 2,
the total operation times are the accumulated summation of the number of the user operation logs;
the total number of session _ ids is the counting statistics of the number of session _ ids of each user;
the sum of the duration time of the session _ ids is the duration time of each session _ id corresponding to each user, namely the time corresponding to the last session _ id in two or more continuous same session _ id sequences is subtracted by the time corresponding to the first session _ id in the continuous same session _ ids, so that the duration time in all the continuous session _ ids of the user can be obtained, and then the duration time corresponding to the same session _ ids is summed;
the longest duration of the same session _ id is the maximum value of the durations corresponding to the session _ ids corresponding to the users;
the ratio of the longest duration of the same session _ id is the sum of the longest duration of the same session _ id/the duration of the session _ id;
the maximum operation times of the same session _ id are the operation times in each session _ id sequence with the duration of each statistical user, and then the maximum value of the operation times corresponding to each session _ id is taken;
counting the number of operation logs of which the interval between two adjacent operations in each session _ id sequence corresponding to each user is less than a preset threshold value for 30 seconds, and taking the maximum value of the number of operations;
the Mean value Mean of the number of session _ id operations is the total number of operations/the total number of session _ id sequences of the user;
the standard deviation Std of the session _ id operation times is the mean square error of the operation times of each session _ id sequence of each user;
the session _ id operation number variation coefficient CV is the standard deviation Std of the session _ id sequence operation number/the average Mean of the session _ id operation number;
the maximum identical interval continuous times characteristic construction process is as follows: grouping by USER _ ID, operational time interval OPR _ DUR, sorting by operational time OPR _ DATE in ascending order to generate a second sequence rd2(1, 2, 3 … n …), subtracting the second sequence rd2 from the first sequence rd1 to generate two sequence difference fields rd minus, grouping by USER _ ID, operational time interval OPR _ DUR, sequence difference rd _ minus, counting the number of operational logs to generate the number of consecutive times of different OPR _ DUR for each USER _ ID, and taking the largest consecutive time of each USER _ ID as the characteristic value; for ease of understanding, reference may be made to the table in which the user "abc 123" has a maximum number of consecutive times of the same interval of 20, that is, a number of times of 20 corresponding to the same sequence difference 3 at the same operation interval of 15 s;
the continuous times of the same interval are accumulated and summed for the number of the operation logs with the continuous times exceeding 3 times;
the ratio of the same interval continuous times is the same interval continuous times/total operation times;
the short-interval continuous times are that whether an operation time interval OPR _ DUR is smaller than 30 seconds is judged firstly, if yes, the operation time interval OPR _ DUR is marked as 1, otherwise, the operation time OPR _ DUR is marked as 0, a field short _ flag is generated, then according to a USER _ ID and short _ flag grouping, sorting is carried out according to the operation time OPR _ DATE in an ascending mode, a third sequence rd3 is generated, a second sequence difference rd _ minus1 is generated by subtracting rd3 from rd1, then according to the USER _ ID, a short _ flag and a second sequence difference rd _ minus1 grouping, the corresponding operation log quantity is counted and recorded as a field short _ nums, finally, the short _ flag is counted as 1, and the cumulative sum of the short _ nums is larger than 3;
the short-interval continuous times account for short-interval continuous times/total operation times;
the short interval number ratio is cumulative sum of the operation log number of the user short interval flag short _ flag being 1.
S103: carrying out normalization processing on the interval features, and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
Illustratively, the features constructed in step S102 are normalized and then lean into a vector as the feature vector corresponding to the user.
And taking the feature vector as the input of an SOS (storage Outlier selection) algorithm to output the abnormal probability value corresponding to each record.
The sos (storage Outlier selection) algorithm is an unsupervised anomaly detection algorithm that counts as an anomaly when the degree of association (affinity) of a point with all other points is small. The SOS algorithm is a non-rigid anomaly detection algorithm which judges whether a point is abnormal or not by calculating the relevance of the point to other points. Its calculation process is roughly as follows:
firstly, the distance between each point and other points is calculated, and a dissimilarity matrix D with n x n dimensions is generated.
For each point, the point is calculated, using the formula,
Figure BDA0002347715050000131
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
In practical application, other methods for measuring the distance, such as manhattan distance, mahalanobis distance and the like, can be selected according to specific situations.
Then, fig. 3 is a schematic diagram of a correlation matrix in the SOS algorithm according to an embodiment of the present invention, and as shown in fig. 3, each point in the SOS algorithm corresponds to a variance value
Figure BDA0002347715050000132
This variance value depends on the density of the dotsPoints of higher density correspond to lower variance, points of lower density correspond to higher variance. The density of point X5 is the greatest and the variance is the least, and the density of X6 is the least and the variance is the greatest, as shown below. The variance value corresponding to each point can be obtained by setting the same complexity, namely the number of the converted neighbor points of each point is the same. Then according to the dissimilarity degree matrix, using formula,
Figure BDA0002347715050000141
calculating the relevance between the point and other points, and constructing a relevance matrix with n x n dimensions according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
then, according to the correlation matrix, using a formula,
Figure BDA0002347715050000142
calculating the association probability of the point relative to other points, constructing an association probability matrix according to the association probability, and further converting the association degree matrix A into an association probability matrix B, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
then, the user can use the device to perform the operation,
based on the correlation probability matrix, using a formula,
Figure BDA0002347715050000143
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; Π being the sign of multiplication of successive terms.
Calculating the abnormal probability value of each point, finally judging whether the abnormal probability value exceeds a preset threshold value, if so, recording the abnormal point as 1, otherwise, recording the normal point as 0;
in general, the larger the abnormality probability value is, the more abnormal the record is, and if the abnormality probability value is greater than a preset threshold (0.8), the record is determined to be abnormal.
It should be emphasized that other operation objects are also processed according to the above steps S101 to S103, and the embodiment of the present invention is not described herein again.
The invention analyzes the possible service characteristics of the abnormal user behavior based on the operation log data of the enterprise service system, namely, the purpose of the abnormal user is more definite, for example, the data stealing user is to steal data, all the behaviors of the data stealing user are to steal data, there may be some interfering behaviors that avoid being discovered, but still always for the purpose of "stealing" data, so that the behaviors of the abnormal users are more targeted, relatively centralized, relatively long in duration, relatively stable, because machine or script assistance may be needed to achieve the purpose, the inventors can reduce false alarm by establishing a dynamic session behavior sequence, constructing user behavior characteristics for the purpose of abnormality detection, and then performing user abnormal behavior identification.
In addition, the embodiment of the invention has high accuracy and strong adaptability, and does not need to manually summarize rules and label data. And the system can be dynamically adjusted according to all user behaviors and is not influenced by service fluctuation, system environment and the like.
Example 2
Corresponding to embodiment 1 of the present invention, an analysis object behavior abnormality detection apparatus based on dynamic session is also provided in the embodiments of the present invention.
Fig. 4 is a schematic structural diagram of an analysis object behavior abnormality detection apparatus based on dynamic session according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes:
an acquisition module 401, configured to acquire an operation log of an analysis object in a business system, where the business system includes: one or a combination of a CRM system, a decision-making system and a resource system; the analysis object includes: one or a combination of a user name, an IP address, an MAC address and an IP section;
an obtaining module 402, configured to, for each analysis object, obtain, based on the operation log, an interval marker corresponding to each time interval according to whether an operation time interval corresponding to the analysis object is greater than a preset duration, and obtain an interval feature corresponding to the analysis object;
an anomaly identification module 403, configured to perform normalization processing on the interval features, and combine the normalized interval features into a feature vector for the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
In a specific implementation manner of the embodiment of the present invention, the acquisition module 401 is configured to:
when the analysis object has different analysis object names in different service systems, performing association processing on the operation log of the analysis object by using keywords;
for each type of analysis object, summarizing operation logs of the analysis object in different service systems in the type into a table to obtain a T _ opr table;
and taking the T _ opr table as an operation log.
In a specific implementation manner of the embodiment of the present invention, the obtaining module 402 is configured to:
for each analysis object, acquiring an analysis object operation time interval corresponding to the analysis object based on an operation log corresponding to the analysis object;
acquiring an interval mark corresponding to each time interval according to the relative size between the operation time interval of the analysis object and a preset time length;
acquiring interval features corresponding to the analysis object according to the interval marks by using a statistical method, wherein the interval features comprise: the maximum duration of the same session _ id, the ratio of the maximum durations of the same session _ id, the maximum number of operations of the same session _ id, the maximum short interval number of operations of the same session _ id, the Mean value Mean of the number of operations of the session _ id, the standard deviation Std of the number of operations of the session _ id, the variation coefficient CV of the number of operations of the session _ id, the maximum continuous number of times of the same session _ id, the continuous number of times of the same interval, the ratio of the continuous number of times of the same interval, the continuous number of times of the short interval, the ratio of the continuous number of times of the short interval and the ratio of the number of times of the short interval are one or a combination thereof.
In a specific implementation manner of the embodiment of the present invention, the abnormality identification module 403 is configured to:
mapping the feature vectors of the analysis objects into a high-dimensional space as points;
for each point, calculating the distance from the point to other points by using a distance algorithm, and constructing a dissimilarity matrix D with n x n dimensions according to the distance, wherein the distance algorithm comprises the following steps: one of manhattan distance, mahalanobis distance, and euclidean distance;
according to the dissimilarity degree matrix, using a formula,
Figure BDA0002347715050000171
calculating the relevance between the point and other points, and constructing a relevance matrix according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
according to the incidence matrix, using a formula,
Figure BDA0002347715050000172
calculating the association probability of the point relative to other points, and constructing an association probability matrix according to the association probability, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
based on the correlation probability matrix, using a formula,
Figure BDA0002347715050000173
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; pi is the symbol of multiplication of successive terms.
In a specific implementation manner of the embodiment of the present invention, the abnormality identification module 403 is configured to:
for each point, the point is calculated, using the formula,
Figure BDA0002347715050000174
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (8)

1. A dynamic session-based analysis object behavior anomaly detection method is characterized by comprising the following steps:
collecting an operation log of an analysis object in a business system, wherein the business system comprises: one or a combination of a CRM system, a decision-making system and a resource system; the analysis object includes: one or a combination of a user name, an IP address, an MAC address and an IP section;
based on the operation log, aiming at each analysis object, acquiring an interval mark corresponding to each time interval according to whether the operation time interval corresponding to the analysis object is greater than a preset time length or not, and acquiring interval characteristics corresponding to the analysis object; the method comprises the following steps:
for each analysis object, acquiring an analysis object operation time interval corresponding to the analysis object based on an operation log corresponding to the analysis object;
acquiring an interval mark corresponding to each time interval according to the relative size between the operation time interval of the analysis object and a preset time length;
acquiring interval features corresponding to the analysis object according to the interval marks by using a statistical method, wherein the interval features comprise: one or a combination of the longest duration of the same session _ id, the ratio of the longest duration of the same session _ id, the maximum number of operations of the same session _ id, the maximum short interval number of operations of the same session _ id, the Mean value Mean of the number of operations of the session _ id, the standard deviation Std of the number of operations of the session _ id, the variation coefficient CV of the number of operations of the session _ id, the maximum continuous number of times of the same session _ id, the continuous number of times of the same interval, the ratio of the continuous number of times of the same interval, the continuous number of times of the short interval, the ratio of the continuous number of times of the short interval and the ratio of the number of times of the short interval;
carrying out normalization processing on the interval features, and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
2. The method for detecting behavioral anomaly of analysis object based on dynamic session according to claim 1, wherein the collecting operation log of analysis object in service system includes:
when the analysis object has different analysis object names in different service systems, performing association processing on the operation log of the analysis object by using keywords;
for each type of analysis object, summarizing operation logs of the analysis object in different service systems in the type into a table to obtain a T _ opr table;
and taking the T _ opr table as an operation log.
3. The method as claimed in claim 1, wherein the step of obtaining the abnormal probability value corresponding to each analysis object by using the feature vector of each analysis object as an input of an SOS algorithm includes:
mapping the feature vectors of the analysis objects into a high-dimensional space as points;
for each point, calculating the distance from the point to other points by using a distance algorithm, and constructing a dissimilarity matrix D with n x n dimensions according to the distance, wherein the distance algorithm comprises the following steps: one of manhattan distance, mahalanobis distance, and euclidean distance;
according to the dissimilarity degree matrix, using a formula,
Figure FDA0003551646680000021
calculating the relevance between the point and other points, and constructing a relevance matrix according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
according to the incidence matrix, using a formula,
Figure FDA0003551646680000022
calculating the association probability of the point relative to other points, and constructing an association probability matrix according to the association probability, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
based on the correlation probability matrix, using a formula,
Figure FDA0003551646680000031
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; pi is the symbol of multiplication of successive terms.
4. The method as claimed in claim 3, wherein the calculating the distance from the point to each of the other points by using a distance algorithm comprises:
for each point, the point is calculated, using the formula,
Figure FDA0003551646680000032
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
5. An analysis object behavior abnormality detection device based on dynamic session, the device comprising:
the collection module is used for collecting an operation log of an analysis object in a business system, wherein the business system comprises: one or a combination of a CRM system, a decision-making system and a resource system; the analysis object includes: one or a combination of a user name, an IP address, an MAC address and an IP section;
the acquisition module is used for acquiring an interval mark corresponding to each time interval according to whether the operation time interval corresponding to the analysis object is greater than the preset time length or not and acquiring interval characteristics corresponding to the analysis object aiming at each analysis object; the obtaining module is configured to:
for each analysis object, acquiring an analysis object operation time interval corresponding to the analysis object based on an operation log corresponding to the analysis object;
acquiring an interval mark corresponding to each time interval according to the relative size between the operation time interval of the analysis object and a preset time length;
acquiring interval features corresponding to the analysis object according to the interval marks by using a statistical method, wherein the interval features comprise: one or a combination of the longest duration of the same session _ id, the ratio of the longest duration of the same session _ id, the maximum number of operations of the same session _ id, the maximum short interval number of operations of the same session _ id, the Mean value Mean of the number of operations of the session _ id, the standard deviation Std of the number of operations of the session _ id, the variation coefficient CV of the number of operations of the session _ id, the maximum continuous number of times of the same session _ id, the continuous number of times of the same interval, the ratio of the continuous number of times of the same interval, the continuous number of times of the short interval, the ratio of the continuous number of times of the short interval and the ratio of the number of times of the short interval;
the anomaly identification module is used for carrying out normalization processing on the interval features and combining the normalized interval features into a feature vector aiming at the analysis object; and taking the feature vector of each analysis object as the input of the SOS algorithm to obtain the abnormal probability value corresponding to each analysis object.
6. The apparatus of claim 5, wherein the acquisition module is configured to:
when the analysis object has different analysis object names in different service systems, performing association processing on the operation log of the analysis object by using keywords;
for each type of analysis object, summarizing operation logs of the analysis object in different service systems in the type into a table to obtain a T _ opr table;
and taking the T _ opr table as an operation log.
7. The apparatus of claim 5, wherein the anomaly identification module is configured to:
mapping the feature vectors of the analysis objects into a high-dimensional space as points;
for each point, calculating the distance from the point to other points by using a distance algorithm, and constructing a dissimilarity matrix D with n x n dimensions according to the distance, wherein the distance algorithm comprises the following steps: one of manhattan distance, mahalanobis distance, and euclidean distance;
according to the dissimilarity degree matrix, using a formula,
Figure FDA0003551646680000051
calculating the relevance between the point and other points, and constructing a relevance matrix according to the relevance of each point, wherein,
aijthe correlation degree between the ith point and other points is defined; exp () is an exponential function with a natural constant as the base; sigma2The variance of the ith point relative to other points;
according to the incidence matrix, using a formula,
Figure FDA0003551646680000052
calculating the association probability of the point relative to other points, and constructing an association probability matrix according to the association probability, wherein,
bijthe association probability of the ith point relative to the jth point; n is the number of the middle points in the high-dimensional space;
based on the correlation probability matrix, using a formula,
Figure FDA0003551646680000053
an anomaly probability value is calculated for each point, wherein,
p(xi) Is a point xiAn anomaly probability value; pi is the symbol of multiplication of successive terms.
8. The apparatus of claim 7, wherein the anomaly identification module is configured to:
for each point, the point is calculated, using the formula,
Figure FDA0003551646680000061
calculating the distance from the point to other points to obtain a dissimilarity matrix D with n x n dimensions, wherein,
dijis the Euclidean distance from point i to point j; sigma is a summation symbol; x is the number ofjkThe value of the feature in the k dimension for the j point; x is the number ofikThe value of the feature in the k dimension for the ith point; m represents the number of features, and n represents the number of operation logs.
CN201911401991.3A 2019-12-31 2019-12-31 Analysis object behavior abnormity detection method and device based on dynamic session Active CN110933115B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911401991.3A CN110933115B (en) 2019-12-31 2019-12-31 Analysis object behavior abnormity detection method and device based on dynamic session

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911401991.3A CN110933115B (en) 2019-12-31 2019-12-31 Analysis object behavior abnormity detection method and device based on dynamic session

Publications (2)

Publication Number Publication Date
CN110933115A CN110933115A (en) 2020-03-27
CN110933115B true CN110933115B (en) 2022-04-29

Family

ID=69861488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911401991.3A Active CN110933115B (en) 2019-12-31 2019-12-31 Analysis object behavior abnormity detection method and device based on dynamic session

Country Status (1)

Country Link
CN (1) CN110933115B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538642B (en) * 2020-07-02 2020-10-02 杭州海康威视数字技术股份有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN111913864B (en) * 2020-08-14 2023-10-13 上海观安信息技术股份有限公司 Method and device for discovering abnormal operation behavior based on business operation combination
CN113344133B (en) * 2021-06-30 2023-04-18 上海观安信息技术股份有限公司 Method and system for detecting abnormal fluctuation of time sequence behaviors
CN113360899B (en) * 2021-07-06 2023-11-21 上海观安信息技术股份有限公司 Machine behavior recognition method and system
CN113722199B (en) * 2021-09-07 2024-01-30 上海观安信息技术股份有限公司 Abnormal behavior detection method, device, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof
CN107888616A (en) * 2017-12-06 2018-04-06 北京知道创宇信息技术有限公司 The detection method of construction method and Webshell the attack website of disaggregated model based on URI
CN109561045A (en) * 2017-09-25 2019-04-02 北京京东尚科信息技术有限公司 Data interception method and device, storage medium and electronic equipment
CN109587350A (en) * 2018-11-16 2019-04-05 国家计算机网络与信息安全管理中心 A kind of sequence variation detection method of the telecommunication fraud phone based on sliding time window polymerization
CN109831462A (en) * 2019-03-29 2019-05-31 新华三信息安全技术有限公司 A kind of method for detecting virus and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104486298B (en) * 2014-11-27 2018-03-09 小米科技有限责任公司 Identify the method and device of user behavior

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104113519A (en) * 2013-04-16 2014-10-22 阿里巴巴集团控股有限公司 Network attack detection method and device thereof
CN109561045A (en) * 2017-09-25 2019-04-02 北京京东尚科信息技术有限公司 Data interception method and device, storage medium and electronic equipment
CN107888616A (en) * 2017-12-06 2018-04-06 北京知道创宇信息技术有限公司 The detection method of construction method and Webshell the attack website of disaggregated model based on URI
CN109587350A (en) * 2018-11-16 2019-04-05 国家计算机网络与信息安全管理中心 A kind of sequence variation detection method of the telecommunication fraud phone based on sliding time window polymerization
CN109831462A (en) * 2019-03-29 2019-05-31 新华三信息安全技术有限公司 A kind of method for detecting virus and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
pache-Flink-Docs-ZH-translation_sos.md;tuhaihe;《GitHub》;20170505;第1-4页 *

Also Published As

Publication number Publication date
CN110933115A (en) 2020-03-27

Similar Documents

Publication Publication Date Title
CN110933115B (en) Analysis object behavior abnormity detection method and device based on dynamic session
CN110222525B (en) Database operation auditing method and device, electronic equipment and storage medium
CN107579956B (en) User behavior detection method and device
CN107154950B (en) Method and system for detecting log stream abnormity
CN108471429B (en) Network attack warning method and system
CN111277570A (en) Data security monitoring method and device, electronic equipment and readable medium
US9679131B2 (en) Method and apparatus for computer intrusion detection
CN112149749B (en) Abnormal behavior detection method, device, electronic equipment and readable storage medium
CN114915479B (en) Web attack stage analysis method and system based on Web log
CN108833185B (en) Network attack route restoration method and system
CN112714093A (en) Account abnormity detection method, device and system and storage medium
CN115021997B (en) Network intrusion detection system based on machine learning
CN112839014B (en) Method, system, equipment and medium for establishing abnormal visitor identification model
CN114615016A (en) Enterprise network security assessment method and device, mobile terminal and storage medium
EP3794481A1 (en) Creation and verification of behavioral baselines for the detection of cybersecurity anomalies using machine learning techniques
CN113901441A (en) User abnormal request detection method, device, equipment and storage medium
CN110516170B (en) Method and device for checking abnormal web access
CN117312098B (en) Log abnormity alarm method and device
CN107766737B (en) Database auditing method
CN115296904B (en) Domain name reflection attack detection method and device, electronic equipment and storage medium
CN110909380B (en) Abnormal file access behavior monitoring method and device
CN112565228A (en) Client network analysis method and device
CN110912753A (en) Cloud security event real-time detection system and method based on machine learning
CN113691498B (en) Electric power internet of things terminal safety state evaluation method and device and storage medium
CN111800409B (en) Interface attack detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant