CN114116853A - Data security analysis method and device based on time sequence correlation analysis - Google Patents

Data security analysis method and device based on time sequence correlation analysis Download PDF

Info

Publication number
CN114116853A
CN114116853A CN202111490801.7A CN202111490801A CN114116853A CN 114116853 A CN114116853 A CN 114116853A CN 202111490801 A CN202111490801 A CN 202111490801A CN 114116853 A CN114116853 A CN 114116853A
Authority
CN
China
Prior art keywords
sub
graph
network
edges
network graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111490801.7A
Other languages
Chinese (zh)
Inventor
张黎
穆新宇
程树华
叶柳鹤
陈广辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Flash It Co ltd
Original Assignee
Flash It Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Flash It Co ltd filed Critical Flash It Co ltd
Priority to CN202111490801.7A priority Critical patent/CN114116853A/en
Publication of CN114116853A publication Critical patent/CN114116853A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks
    • G06F18/295Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data security analysis method and device based on time sequence correlation analysis, which are used for collecting a database operation behavior log data set and extracting nodes, edges and weights from the data set; constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting N experience network graphs to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. By adopting the method, the database operation behavior can be monitored through time sequence correlation analysis, so that the data safety can be effectively protected in time.

Description

Data security analysis method and device based on time sequence correlation analysis
Technical Field
The invention relates to the technical field of data security, in particular to a data security analysis method and device based on time sequence correlation analysis.
Background
At present, the rapid development of big data accelerates the information and resource flow of the society, improves the efficiency of social operation, and simultaneously hides huge potential safety hazards of data. The collection of large amounts of seemingly harmless data can create a huge hazard, and the subject of the hazard is not only an individual but also a system, even a country and a society, and therefore, data security is of great importance.
At present, a large amount of data is stored through a database system, and the operation behavior of the database is considered as an effective means for data security protection. In the database operation, close correlation exists between a subject and an object, and a behavior rule can be found from the time perspective by considering time sequence factors, so that the time sequence correlation analysis method occupies a place in the field of data security.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a data security analysis method and device based on time sequence correlation analysis.
The embodiment of the invention provides a data security analysis method based on time sequence correlation analysis, which comprises the following steps:
collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set;
according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph;
sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated;
constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
In one embodiment, the method further comprises:
and generating a detection period according to the data amount self-learning, and dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected.
In one embodiment, the method further comprises:
the sequence length N of the time series network diagram can be adjusted as a system parameter.
In one embodiment, the method further comprises:
Figure DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE004
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE006
is a finger diagram
Figure DEST_PATH_IMAGE008
In the affiliated network diagram
Figure DEST_PATH_IMAGE010
The ratio of the total amount of the organic acid to the total amount of the organic acid,
Figure DEST_PATH_IMAGE012
means that
Figure 100002_DEST_PATH_IMAGE008A
The sum of the weights of all edges in (1),
Figure DEST_PATH_IMAGE014
means that
Figure 100002_DEST_PATH_IMAGE010A
The sum of the weights of all edges in (1);
Figure DEST_PATH_IMAGE016
is referred to as the first in the connection matrix
Figure DEST_PATH_IMAGE018
Go to the first
Figure DEST_PATH_IMAGE020
Elements of a column, representing subgraphs
Figure DEST_PATH_IMAGE022
And
Figure DEST_PATH_IMAGE024
the possibility of connection of (a) to (b),
Figure DEST_PATH_IMAGE026
is a finger diagram
Figure 100002_DEST_PATH_IMAGE027
And
Figure 100002_DEST_PATH_IMAGE024A
the sum of the weights of the edges that are connected,
Figure 100002_DEST_PATH_IMAGE029
is a finger diagram
Figure DEST_PATH_IMAGE027A
The sum of the weights of all edges in (1).
In one embodiment, the method further comprises:
Figure 100002_DEST_PATH_IMAGE031
Figure 100002_DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE035
predicted according to Markov models
Figure 100002_DEST_PATH_IMAGE010AA
The vector formed by the probabilities of the sub-graphs,
Figure 100002_DEST_PATH_IMAGE037
is a sub-picture in
Figure 100002_DEST_PATH_IMAGE010AAA
The occupancy vector of (1) is,
Figure 100002_DEST_PATH_IMAGE039
refers to a connection matrix between sub-graphs;
Figure 100002_DEST_PATH_IMAGE041
refer to according to a network diagram
Figure 100002_DEST_PATH_IMAGE010AAAA
Predicted edges in network graph to be detected
Figure 100002_DEST_PATH_IMAGE043
The probability of (a) of (b) being,
Figure 100002_DEST_PATH_IMAGE045
,
Figure 100002_DEST_PATH_IMAGE047
respectively refer to the edges
Figure DEST_PATH_IMAGE043A
Two ends are connected with
Figure 100002_DEST_PATH_IMAGE010_5A
To the corresponding sub-graph in (1).
In one embodiment, the method further comprises:
Figure 100002_DEST_PATH_IMAGE049
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE051
refers to the edge in the network diagram to be detected
Figure DEST_PATH_IMAGE043AA
The final probability of (a) is determined,
Figure 100002_DEST_PATH_IMAGE053
refers to the attenuation factor.
In one embodiment, the method further comprises:
Figure 100002_DEST_PATH_IMAGE055
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE057
is referred to as an operation path
Figure 100002_DEST_PATH_IMAGE059
The value of the risk of (c) is,
Figure 100002_DEST_PATH_IMAGE061
means that
Figure 100002_DEST_PATH_IMAGE059A
The average probability of the middle edge is,
Figure 100002_DEST_PATH_IMAGE063
means that
Figure 100002_DEST_PATH_IMAGE059AA
The number of middle edges.
The embodiment of the invention provides a data security analysis device based on time sequence correlation analysis, which comprises:
the collection module is used for collecting a database operation behavior log data set and extracting nodes, edges and weights according to the data set;
the construction module is used for constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph;
the first calculation module is used for respectively generating sub-graphs of the time sequence network graph through a Louvian algorithm and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs;
the second calculation module is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
and the output module is used for defining the operation path, calculating the risk value of the path according to the probability generated by the edge, and detecting the high-risk operation path according to the risk value.
The embodiment of the invention provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the data security analysis method based on the time sequence correlation analysis.
An embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data security analysis method based on time sequence correlation analysis.
The data security analysis method and device based on time sequence correlation analysis, provided by the embodiment of the invention, are used for collecting a database operation behavior log data set and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. Therefore, the operation behavior of the database can be monitored through time sequence correlation analysis, and effective protection on data safety is achieved in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a data security analysis method based on time-series correlation analysis according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data security analysis apparatus based on timing correlation analysis according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a data security analysis method based on time sequence correlation analysis according to an embodiment of the present invention, and as shown in fig. 1, an embodiment of the present invention provides a data security analysis method based on time sequence correlation analysis, including:
step S101, collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set.
Specifically, a log data set for recording database operation behaviors is collected, and data such as nodes, edges, weights and the like are extracted from the log data set, wherein the collected log data set mainly relates to recorded data of events in actual services, and generally comprises information such as event occurrence time, event subjects and event objects, the event subjects include but are not limited to a client ip, a client mac and an account, and the event objects include but are not limited to a server ip, a server mac and a database id. In addition, both nodes and edges have lifecycles, i.e., nodes and edges will grow or disappear over time, and in general, different types of nodes and edges have different lifecycles.
And S102, constructing an experience network graph and a network graph to be detected by adopting a T +1 task scheduling mode according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph.
Specifically, after determining data such as nodes, edges, weights and the like, the system self-learns and generates a detection period t according to the data amount, and divides the nodes, the edges and the weights into two parts according to the detection period and a time dimension, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected, and the two parts are bipartite graphs. The T +1 task scheduling mode is that tasks are constructed once a day, and experience network graphs of the latest N days are accumulated to form a time sequence network graph, wherein the sequence length N of the time sequence network graph can be adjusted as a system parameter.
And step S103, performing sub-graph generation on the time sequence network graph through a Louvian algorithm, and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs.
Particularly, a Louvian algorithm is used for generating the subgraph, and the accuracy of generating the subgraph by the algorithm is high and the operation speed is high. The occupation ratio of the subgraph in the network graph is as follows:
Figure DEST_PATH_IMAGE064
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE065
is a finger diagram
Figure DEST_PATH_IMAGE066
In the affiliated network diagram
Figure DEST_PATH_IMAGE067
The ratio of the total amount of the organic acid to the total amount of the organic acid,
Figure DEST_PATH_IMAGE068
means that
Figure DEST_PATH_IMAGE066A
The sum of the weights of all edges in (1),
Figure DEST_PATH_IMAGE069
means that
Figure DEST_PATH_IMAGE067A
The sum of the weights of all edges in (1);
in addition, the connection matrix is a square matrix and is used for representing the probability of connection existing between sub-graphs in an empirical network diagram, and the dimension is the number of the sub-graphs, wherein the elements are as follows:
Figure 100002_DEST_PATH_IMAGE004A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE070
is referred to as the first in the connection matrix
Figure 100002_DEST_PATH_IMAGE018A
Go to the first
Figure DEST_PATH_IMAGE071
Elements of a column, representing subgraphs
Figure DEST_PATH_IMAGE027AA
And
Figure 100002_DEST_PATH_IMAGE024AA
the possibility of connection of (a) to (b),
Figure DEST_PATH_IMAGE072
is a finger diagram
Figure DEST_PATH_IMAGE027AAA
And
Figure DEST_PATH_IMAGE024AAA
the sum of the weights of the edges that are connected,
Figure DEST_PATH_IMAGE029A
is a finger diagram
Figure DEST_PATH_IMAGE027AAAA
The sum of the weights of all edges in (1).
And step S104, constructing a Markov model, and calculating the probability of any edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix.
Specifically, the markov model can predict the probability of the state appearing in the future, the prediction result is only related to the current state and the state transition process, and the probability of each sub-graph appearing in the stage to be detected is predicted by using the markov model as follows:
Figure DEST_PATH_IMAGE031A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE035A
predicted according to Markov models
Figure 100002_DEST_PATH_IMAGE010_6A
The vector formed by the probabilities of the sub-graphs,
Figure DEST_PATH_IMAGE037A
is a sub-picture in
Figure DEST_PATH_IMAGE010_7A
The occupancy vector in (1), i.e. the current state,
Figure DEST_PATH_IMAGE039A
refers to the connection matrix between the subgraphs, i.e. the state transition matrix.
Any node in the network to be detected has a corresponding sub-graph in the experience network graph, and if a certain node does not exist in the experience network graph, the corresponding sub-graph is determined by using a voting method according to the conditions of the neighbor nodes. The edge probability predicted by an empirical network graph in the sequence refers to the probability that sub-graphs corresponding to nodes at two ends of the edge appear at the same time:
Figure DEST_PATH_IMAGE033A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE041A
refer to according to a network diagram
Figure DEST_PATH_IMAGE010_8A
Predicted edges in network graph to be detected
Figure DEST_PATH_IMAGE043AAA
The probability of (a) of (b) being,
Figure 100002_DEST_PATH_IMAGE045A
,
Figure DEST_PATH_IMAGE073
respectively refer to the edges
Figure DEST_PATH_IMAGE043AAAA
Two ends are connected with
Figure DEST_PATH_IMAGE010_9A
To the corresponding sub-graph in (1).
The final probability of the edge in the network graph to be detected is determined by the past N empirical network graphs, and the influence degree is smaller as the past time is longer, so that the attenuation factor weighting method is used for fusing:
Figure DEST_PATH_IMAGE049A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE051A
refers to the edge in the network diagram to be detected
Figure DEST_PATH_IMAGE043_5A
The final probability of (a) is determined,
Figure DEST_PATH_IMAGE053A
refers to the attenuation factor, in the range (0, 1).
Step S105, defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
Specifically, an operation path is defined to start with an event subject and end with an event object, the number of edges included in the path is at least 3, and the generation time of the next edge is later than that of the previous edge. The risk values for the path are:
Figure DEST_PATH_IMAGE055A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE057A
is referred to as an operation path
Figure DEST_PATH_IMAGE059AAA
The value of the risk of (c) is,
Figure DEST_PATH_IMAGE061A
means that
Figure DEST_PATH_IMAGE059AAAA
The average probability of the middle edge is,
Figure DEST_PATH_IMAGE063A
means that
Figure DEST_PATH_IMAGE059_5A
The number of middle edges. In the operation path, the greater the probability of any edge, the greater the probability of the path generating, the longer the length of the path, the smaller the probability of the path generating, and the probability and risk value of the path generating show the trend of eliminating the length. And setting a risk threshold, wherein the high risk operation path is determined when the risk threshold is exceeded.
The data security analysis method based on the time sequence correlation analysis, provided by the embodiment of the invention, comprises the steps of collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. Therefore, the operation behavior of the database can be monitored through time sequence correlation analysis, and effective protection on data safety is achieved in time.
Fig. 2 is a data security analysis apparatus based on time sequence correlation analysis according to an embodiment of the present invention, including: a collection module S201, a construction module S202, a first calculation module S203, a second calculation module S204, and an output module S205, wherein:
the collection module S201 is configured to collect a database operation behavior log data set, and extract nodes, edges, and weights according to the data set.
And the constructing module S202 is used for constructing the experience network graph and the network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph.
The first calculating module S203 is configured to perform sub-graph generation on the time-series network graph through a Louvian algorithm, and calculate an occupancy rate of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix between the sub-graphs.
And the second calculating module S204 is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix.
And the output module S205 is used for defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
In one embodiment, the apparatus may further comprise:
and the self-learning module is used for self-learning to generate a detection period according to the data amount, dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, and respectively constructing an experience network graph and a network graph to be detected.
In one embodiment, the apparatus may further comprise:
and the generation module is used for respectively generating sub-graphs of each graph in the time sequence network graph according to a Louvian algorithm, and nodes contained in each sub-graph are not repeated.
And the acquisition module is used for acquiring the occupancy of each sub-graph in the network graph and the connection matrix among the sub-graphs through the weight of the edges in the sub-graphs.
In one embodiment, the apparatus may further comprise:
and the second construction module is used for constructing a Markov model, corresponding the occupancy of the subgraph and the connection matrix to the current state and the state transition matrix, and predicting the possibility of the future various states, namely the probability of the subgraphs appearing in the future.
And the third calculation module is used for calculating the probability of the edge through the probability of the subgraph in which the nodes at the two ends are positioned, and calculating a risk value through the edge probability contained in the operation path, wherein the risk value is in negative correlation with the probability of the edge and is in positive correlation with the path length.
In one embodiment, the apparatus may further comprise:
and the setting module is used for setting a risk threshold of the operation path, and the risk threshold can be flexibly set according to the acceptance degree of the actual service.
And the detection module is used for detecting a high-risk operation path according to the risk threshold, and the operation path exceeding the risk threshold needs to be processed in time.
For specific limitations of the data security analysis device based on the time-series correlation analysis, reference may be made to the above limitations of the data security analysis method based on the time-series correlation analysis, and details are not repeated here. The modules in the data security analysis device based on the time sequence correlation analysis can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a memory (memory)302, a communication Interface (Communications Interface)303 and a communication bus 304, wherein the processor 301, the memory 302 and the communication Interface 303 complete communication with each other through the communication bus 304. The processor 301 may call logic instructions in the memory 302 to perform the following method: collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A data security analysis method based on time sequence correlation analysis is characterized by comprising the following steps:
collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set;
according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph;
sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated;
constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
2. The data security analysis method based on the time series correlation analysis according to claim 1, wherein the constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights comprises:
and generating a detection period according to the data amount self-learning, and dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected.
3. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the sub-graph generation is performed on the time-series network graph, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and the connection matrix between the sub-graphs are calculated, and the method comprises the following steps:
Figure DEST_PATH_IMAGE002A
Figure DEST_PATH_IMAGE004A
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE006A
is a finger diagram
Figure DEST_PATH_IMAGE008A
In the affiliated network diagram
Figure DEST_PATH_IMAGE010A
The ratio of the total amount of the organic acid to the total amount of the organic acid,
Figure DEST_PATH_IMAGE012A
means that
Figure DEST_PATH_IMAGE008AA
The sum of the weights of all edges in (1),
Figure DEST_PATH_IMAGE014A
means that
Figure DEST_PATH_IMAGE010AA
The sum of the weights of all edges in (1);
Figure DEST_PATH_IMAGE016A
is referred to as the first in the connection matrix
Figure DEST_PATH_IMAGE018A
Go to the first
Figure DEST_PATH_IMAGE020A
Elements of a column, representing subgraphs
Figure DEST_PATH_IMAGE022A
And
Figure DEST_PATH_IMAGE024A
the possibility of connection of (a) to (b),
Figure DEST_PATH_IMAGE026A
is a finger diagram
Figure DEST_PATH_IMAGE027
And
Figure DEST_PATH_IMAGE024AA
the sum of the weights of the edges that are connected,
Figure DEST_PATH_IMAGE029
is a finger diagram
Figure DEST_PATH_IMAGE022AA
The sum of the weights of all edges in (1).
4. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the constructing of the markov model, the calculating of the probability of generating any edge in the network diagram to be detected by using the attenuation factor weighting method in combination with the sub-graph occupancy and the connection matrix, comprises:
Figure DEST_PATH_IMAGE031
Figure DEST_PATH_IMAGE033
Figure DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE037
predicted according to Markov models
Figure DEST_PATH_IMAGE010AAA
The vector formed by the probabilities of the sub-graphs,
Figure DEST_PATH_IMAGE039
is a sub-picture in
Figure DEST_PATH_IMAGE010AAAA
The occupancy vector of (1) is,
Figure DEST_PATH_IMAGE041
refers to a connection matrix between sub-graphs;
Figure DEST_PATH_IMAGE043
refer to according to a network diagram
Figure DEST_PATH_IMAGE010_5A
Predicted edges in network graph to be detected
Figure DEST_PATH_IMAGE045
The probability of (a) of (b) being,
Figure DEST_PATH_IMAGE047
,
Figure DEST_PATH_IMAGE049
respectively refer to the edges
Figure DEST_PATH_IMAGE045A
Two ends are connected with
Figure DEST_PATH_IMAGE010_6A
The corresponding subgraph in (1);
Figure DEST_PATH_IMAGE051
refers to the edge in the network diagram to be detected
Figure DEST_PATH_IMAGE045AA
The final probability of (a) is determined,
Figure DEST_PATH_IMAGE053
refers to the attenuation factor.
5. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the calculating the risk value of the path through the probability of the edge generation comprises:
Figure DEST_PATH_IMAGE055
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE057
is referred to as an operation path
Figure DEST_PATH_IMAGE059
The value of the risk of (c) is,
Figure DEST_PATH_IMAGE061
means that
Figure DEST_PATH_IMAGE059A
The average probability of the middle edge is,
Figure DEST_PATH_IMAGE063
means that
Figure DEST_PATH_IMAGE059AA
The number of middle edges.
6. The data security analysis method based on the time-series correlation analysis as claimed in claim 1, wherein detecting the high-risk operation path by the risk threshold comprises:
and according to the set risk threshold value, combining the risk value of the operation path, and comparing and detecting the high-risk operation path.
7. A data security analysis apparatus based on time series correlation analysis, the apparatus comprising:
the collection module is used for collecting a database operation behavior log data set and extracting nodes, edges and weights according to the data set;
the construction module is used for constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph;
the first calculation module is used for respectively generating sub-graphs of the time sequence network graph through a Louvian algorithm and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs;
the second calculation module is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
and the output module is used for defining the operation path, calculating the risk value of the path according to the probability generated by the edge, and detecting the high-risk operation path according to the risk value.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the data security analysis method based on time series correlation analysis according to any one of claims 1 to 6.
9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the data security analysis method based on temporal correlation analysis according to any one of claims 1 to 6.
CN202111490801.7A 2021-12-08 2021-12-08 Data security analysis method and device based on time sequence correlation analysis Pending CN114116853A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111490801.7A CN114116853A (en) 2021-12-08 2021-12-08 Data security analysis method and device based on time sequence correlation analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111490801.7A CN114116853A (en) 2021-12-08 2021-12-08 Data security analysis method and device based on time sequence correlation analysis

Publications (1)

Publication Number Publication Date
CN114116853A true CN114116853A (en) 2022-03-01

Family

ID=80367524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111490801.7A Pending CN114116853A (en) 2021-12-08 2021-12-08 Data security analysis method and device based on time sequence correlation analysis

Country Status (1)

Country Link
CN (1) CN114116853A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473509A (en) * 2023-12-26 2024-01-30 信联科技(南京)有限公司 Data security risk assessment method and system for data processing activities

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473509A (en) * 2023-12-26 2024-01-30 信联科技(南京)有限公司 Data security risk assessment method and system for data processing activities
CN117473509B (en) * 2023-12-26 2024-04-02 信联科技(南京)有限公司 Data security risk assessment method and system for data processing activities

Similar Documents

Publication Publication Date Title
CN110147387B (en) Root cause analysis method, root cause analysis device, root cause analysis equipment and storage medium
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
US20160055044A1 (en) Fault analysis method, fault analysis system, and storage medium
CN110166344B (en) Identity identification method, device and related equipment
CN110995482A (en) Alarm analysis method and device, computer equipment and computer readable storage medium
US11777824B2 (en) Anomaly detection method and apparatus
CN111935140B (en) Abnormal message identification method and device
CN110263869B (en) Method and device for predicting duration of Spark task
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN113992340B (en) User abnormal behavior identification method, device, equipment and storage medium
CN108696486B (en) Abnormal operation behavior detection processing method and device
CN113822355A (en) Composite attack prediction method and device based on improved hidden Markov model
CN111276247B (en) Flight parameter data health assessment method and equipment based on big data processing
CN114116853A (en) Data security analysis method and device based on time sequence correlation analysis
CN108076032B (en) Abnormal behavior user identification method and device
EP3009942A1 (en) Social contact message monitoring method and device
CN114202206A (en) System abnormal root cause analysis method and device
CN117035374B (en) Force cooperative scheduling method, system and medium for coping with emergency
CN110769003B (en) Network security early warning method, system, equipment and readable storage medium
CN111552842A (en) Data processing method, device and storage medium
CN115567305A (en) Sequential network attack prediction analysis method based on deep learning
CN115987594A (en) Abnormity detection method, device and equipment for network security log
CN114760190A (en) Service-oriented converged network performance anomaly detection method
CN112422669A (en) Multi-association equipment data real-time extraction method and related device
CN110489568B (en) Method and device for generating event graph, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination