CN114116853A - Data security analysis method and device based on time sequence correlation analysis - Google Patents
Data security analysis method and device based on time sequence correlation analysis Download PDFInfo
- Publication number
- CN114116853A CN114116853A CN202111490801.7A CN202111490801A CN114116853A CN 114116853 A CN114116853 A CN 114116853A CN 202111490801 A CN202111490801 A CN 202111490801A CN 114116853 A CN114116853 A CN 114116853A
- Authority
- CN
- China
- Prior art keywords
- sub
- graph
- network
- edges
- network graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2474—Sequence data queries, e.g. querying versioned data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/29—Graphical models, e.g. Bayesian networks
- G06F18/295—Markov models or related models, e.g. semi-Markov models; Markov random fields; Networks embedding Markov models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a data security analysis method and device based on time sequence correlation analysis, which are used for collecting a database operation behavior log data set and extracting nodes, edges and weights from the data set; constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting N experience network graphs to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. By adopting the method, the database operation behavior can be monitored through time sequence correlation analysis, so that the data safety can be effectively protected in time.
Description
Technical Field
The invention relates to the technical field of data security, in particular to a data security analysis method and device based on time sequence correlation analysis.
Background
At present, the rapid development of big data accelerates the information and resource flow of the society, improves the efficiency of social operation, and simultaneously hides huge potential safety hazards of data. The collection of large amounts of seemingly harmless data can create a huge hazard, and the subject of the hazard is not only an individual but also a system, even a country and a society, and therefore, data security is of great importance.
At present, a large amount of data is stored through a database system, and the operation behavior of the database is considered as an effective means for data security protection. In the database operation, close correlation exists between a subject and an object, and a behavior rule can be found from the time perspective by considering time sequence factors, so that the time sequence correlation analysis method occupies a place in the field of data security.
Disclosure of Invention
Aiming at the problems in the prior art, the embodiment of the invention provides a data security analysis method and device based on time sequence correlation analysis.
The embodiment of the invention provides a data security analysis method based on time sequence correlation analysis, which comprises the following steps:
collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set;
according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph;
sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated;
constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
In one embodiment, the method further comprises:
and generating a detection period according to the data amount self-learning, and dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected.
In one embodiment, the method further comprises:
the sequence length N of the time series network diagram can be adjusted as a system parameter.
In one embodiment, the method further comprises:
wherein the content of the first and second substances,is a finger diagramIn the affiliated network diagramThe ratio of the total amount of the organic acid to the total amount of the organic acid,means thatThe sum of the weights of all edges in (1),means thatThe sum of the weights of all edges in (1);is referred to as the first in the connection matrixGo to the firstElements of a column, representing subgraphsAndthe possibility of connection of (a) to (b),is a finger diagramAndthe sum of the weights of the edges that are connected,is a finger diagramThe sum of the weights of all edges in (1).
In one embodiment, the method further comprises:
wherein the content of the first and second substances,predicted according to Markov modelsThe vector formed by the probabilities of the sub-graphs,is a sub-picture inThe occupancy vector of (1) is,refers to a connection matrix between sub-graphs;refer to according to a network diagramPredicted edges in network graph to be detectedThe probability of (a) of (b) being,,respectively refer to the edgesTwo ends are connected withTo the corresponding sub-graph in (1).
In one embodiment, the method further comprises:
wherein the content of the first and second substances,refers to the edge in the network diagram to be detectedThe final probability of (a) is determined,refers to the attenuation factor.
In one embodiment, the method further comprises:
wherein the content of the first and second substances,is referred to as an operation pathThe value of the risk of (c) is,means thatThe average probability of the middle edge is,means thatThe number of middle edges.
The embodiment of the invention provides a data security analysis device based on time sequence correlation analysis, which comprises:
the collection module is used for collecting a database operation behavior log data set and extracting nodes, edges and weights according to the data set;
the construction module is used for constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph;
the first calculation module is used for respectively generating sub-graphs of the time sequence network graph through a Louvian algorithm and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs;
the second calculation module is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
and the output module is used for defining the operation path, calculating the risk value of the path according to the probability generated by the edge, and detecting the high-risk operation path according to the risk value.
The embodiment of the invention provides electronic equipment, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the data security analysis method based on the time sequence correlation analysis.
An embodiment of the present invention provides a non-transitory computer readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data security analysis method based on time sequence correlation analysis.
The data security analysis method and device based on time sequence correlation analysis, provided by the embodiment of the invention, are used for collecting a database operation behavior log data set and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. Therefore, the operation behavior of the database can be monitored through time sequence correlation analysis, and effective protection on data safety is achieved in time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart of a data security analysis method based on time-series correlation analysis according to an embodiment of the present invention;
FIG. 2 is a block diagram of a data security analysis apparatus based on timing correlation analysis according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a data security analysis method based on time sequence correlation analysis according to an embodiment of the present invention, and as shown in fig. 1, an embodiment of the present invention provides a data security analysis method based on time sequence correlation analysis, including:
step S101, collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set.
Specifically, a log data set for recording database operation behaviors is collected, and data such as nodes, edges, weights and the like are extracted from the log data set, wherein the collected log data set mainly relates to recorded data of events in actual services, and generally comprises information such as event occurrence time, event subjects and event objects, the event subjects include but are not limited to a client ip, a client mac and an account, and the event objects include but are not limited to a server ip, a server mac and a database id. In addition, both nodes and edges have lifecycles, i.e., nodes and edges will grow or disappear over time, and in general, different types of nodes and edges have different lifecycles.
And S102, constructing an experience network graph and a network graph to be detected by adopting a T +1 task scheduling mode according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph.
Specifically, after determining data such as nodes, edges, weights and the like, the system self-learns and generates a detection period t according to the data amount, and divides the nodes, the edges and the weights into two parts according to the detection period and a time dimension, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected, and the two parts are bipartite graphs. The T +1 task scheduling mode is that tasks are constructed once a day, and experience network graphs of the latest N days are accumulated to form a time sequence network graph, wherein the sequence length N of the time sequence network graph can be adjusted as a system parameter.
And step S103, performing sub-graph generation on the time sequence network graph through a Louvian algorithm, and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs.
Particularly, a Louvian algorithm is used for generating the subgraph, and the accuracy of generating the subgraph by the algorithm is high and the operation speed is high. The occupation ratio of the subgraph in the network graph is as follows:
wherein the content of the first and second substances,is a finger diagramIn the affiliated network diagramThe ratio of the total amount of the organic acid to the total amount of the organic acid,means thatThe sum of the weights of all edges in (1),means thatThe sum of the weights of all edges in (1);
in addition, the connection matrix is a square matrix and is used for representing the probability of connection existing between sub-graphs in an empirical network diagram, and the dimension is the number of the sub-graphs, wherein the elements are as follows:
wherein the content of the first and second substances,is referred to as the first in the connection matrixGo to the firstElements of a column, representing subgraphsAndthe possibility of connection of (a) to (b),is a finger diagramAndthe sum of the weights of the edges that are connected,is a finger diagramThe sum of the weights of all edges in (1).
And step S104, constructing a Markov model, and calculating the probability of any edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix.
Specifically, the markov model can predict the probability of the state appearing in the future, the prediction result is only related to the current state and the state transition process, and the probability of each sub-graph appearing in the stage to be detected is predicted by using the markov model as follows:
wherein the content of the first and second substances,predicted according to Markov modelsThe vector formed by the probabilities of the sub-graphs,is a sub-picture inThe occupancy vector in (1), i.e. the current state,refers to the connection matrix between the subgraphs, i.e. the state transition matrix.
Any node in the network to be detected has a corresponding sub-graph in the experience network graph, and if a certain node does not exist in the experience network graph, the corresponding sub-graph is determined by using a voting method according to the conditions of the neighbor nodes. The edge probability predicted by an empirical network graph in the sequence refers to the probability that sub-graphs corresponding to nodes at two ends of the edge appear at the same time:
wherein the content of the first and second substances,refer to according to a network diagramPredicted edges in network graph to be detectedThe probability of (a) of (b) being,,respectively refer to the edgesTwo ends are connected withTo the corresponding sub-graph in (1).
The final probability of the edge in the network graph to be detected is determined by the past N empirical network graphs, and the influence degree is smaller as the past time is longer, so that the attenuation factor weighting method is used for fusing:
wherein the content of the first and second substances,refers to the edge in the network diagram to be detectedThe final probability of (a) is determined,refers to the attenuation factor, in the range (0, 1).
Step S105, defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
Specifically, an operation path is defined to start with an event subject and end with an event object, the number of edges included in the path is at least 3, and the generation time of the next edge is later than that of the previous edge. The risk values for the path are:
wherein the content of the first and second substances,is referred to as an operation pathThe value of the risk of (c) is,means thatThe average probability of the middle edge is,means thatThe number of middle edges. In the operation path, the greater the probability of any edge, the greater the probability of the path generating, the longer the length of the path, the smaller the probability of the path generating, and the probability and risk value of the path generating show the trend of eliminating the length. And setting a risk threshold, wherein the high risk operation path is determined when the risk threshold is exceeded.
The data security analysis method based on the time sequence correlation analysis, provided by the embodiment of the invention, comprises the steps of collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value. Therefore, the operation behavior of the database can be monitored through time sequence correlation analysis, and effective protection on data safety is achieved in time.
Fig. 2 is a data security analysis apparatus based on time sequence correlation analysis according to an embodiment of the present invention, including: a collection module S201, a construction module S202, a first calculation module S203, a second calculation module S204, and an output module S205, wherein:
the collection module S201 is configured to collect a database operation behavior log data set, and extract nodes, edges, and weights according to the data set.
And the constructing module S202 is used for constructing the experience network graph and the network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph.
The first calculating module S203 is configured to perform sub-graph generation on the time-series network graph through a Louvian algorithm, and calculate an occupancy rate of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix between the sub-graphs.
And the second calculating module S204 is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix.
And the output module S205 is used for defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
In one embodiment, the apparatus may further comprise:
and the self-learning module is used for self-learning to generate a detection period according to the data amount, dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, and respectively constructing an experience network graph and a network graph to be detected.
In one embodiment, the apparatus may further comprise:
and the generation module is used for respectively generating sub-graphs of each graph in the time sequence network graph according to a Louvian algorithm, and nodes contained in each sub-graph are not repeated.
And the acquisition module is used for acquiring the occupancy of each sub-graph in the network graph and the connection matrix among the sub-graphs through the weight of the edges in the sub-graphs.
In one embodiment, the apparatus may further comprise:
and the second construction module is used for constructing a Markov model, corresponding the occupancy of the subgraph and the connection matrix to the current state and the state transition matrix, and predicting the possibility of the future various states, namely the probability of the subgraphs appearing in the future.
And the third calculation module is used for calculating the probability of the edge through the probability of the subgraph in which the nodes at the two ends are positioned, and calculating a risk value through the edge probability contained in the operation path, wherein the risk value is in negative correlation with the probability of the edge and is in positive correlation with the path length.
In one embodiment, the apparatus may further comprise:
and the setting module is used for setting a risk threshold of the operation path, and the risk threshold can be flexibly set according to the acceptance degree of the actual service.
And the detection module is used for detecting a high-risk operation path according to the risk threshold, and the operation path exceeding the risk threshold needs to be processed in time.
For specific limitations of the data security analysis device based on the time-series correlation analysis, reference may be made to the above limitations of the data security analysis method based on the time-series correlation analysis, and details are not repeated here. The modules in the data security analysis device based on the time sequence correlation analysis can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)301, a memory (memory)302, a communication Interface (Communications Interface)303 and a communication bus 304, wherein the processor 301, the memory 302 and the communication Interface 303 complete communication with each other through the communication bus 304. The processor 301 may call logic instructions in the memory 302 to perform the following method: collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
Furthermore, the logic instructions in the memory 302 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the transmission method provided in the foregoing embodiments when executed by a processor, and for example, the method includes: collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set; according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph; sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated; constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix; defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A data security analysis method based on time sequence correlation analysis is characterized by comprising the following steps:
collecting a database operation behavior log data set, and extracting nodes, edges and weights from the data set;
according to the nodes, the edges and the weights, an experience network graph and a network graph to be detected are constructed in a T +1 task scheduling mode, and the latest N experience network graphs are selected to form a time sequence network graph;
sub-graph generation is carried out on the time sequence network graph through a Louvian algorithm, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs are calculated;
constructing a Markov model, and calculating the probability of generating any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
defining an operation path, calculating a risk value of the path according to the probability generated by the edge, and detecting a high-risk operation path according to the risk value.
2. The data security analysis method based on the time series correlation analysis according to claim 1, wherein the constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights comprises:
and generating a detection period according to the data amount self-learning, and dividing the nodes, the edges and the weights into two parts according to the time dimension corresponding to the detection period, wherein the two parts are respectively used for constructing an experience network graph and a network graph to be detected.
3. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the sub-graph generation is performed on the time-series network graph, and the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and the connection matrix between the sub-graphs are calculated, and the method comprises the following steps:
wherein the content of the first and second substances,is a finger diagramIn the affiliated network diagramThe ratio of the total amount of the organic acid to the total amount of the organic acid,means thatThe sum of the weights of all edges in (1),means thatThe sum of the weights of all edges in (1);is referred to as the first in the connection matrixGo to the firstElements of a column, representing subgraphsAndthe possibility of connection of (a) to (b),is a finger diagramAndthe sum of the weights of the edges that are connected,is a finger diagramThe sum of the weights of all edges in (1).
4. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the constructing of the markov model, the calculating of the probability of generating any edge in the network diagram to be detected by using the attenuation factor weighting method in combination with the sub-graph occupancy and the connection matrix, comprises:
wherein the content of the first and second substances,predicted according to Markov modelsThe vector formed by the probabilities of the sub-graphs,is a sub-picture inThe occupancy vector of (1) is,refers to a connection matrix between sub-graphs;refer to according to a network diagramPredicted edges in network graph to be detectedThe probability of (a) of (b) being,,respectively refer to the edgesTwo ends are connected withThe corresponding subgraph in (1);refers to the edge in the network diagram to be detectedThe final probability of (a) is determined,refers to the attenuation factor.
5. The data security analysis method based on the time-series correlation analysis of claim 1, wherein the calculating the risk value of the path through the probability of the edge generation comprises:
6. The data security analysis method based on the time-series correlation analysis as claimed in claim 1, wherein detecting the high-risk operation path by the risk threshold comprises:
and according to the set risk threshold value, combining the risk value of the operation path, and comparing and detecting the high-risk operation path.
7. A data security analysis apparatus based on time series correlation analysis, the apparatus comprising:
the collection module is used for collecting a database operation behavior log data set and extracting nodes, edges and weights according to the data set;
the construction module is used for constructing an experience network graph and a network graph to be detected according to the nodes, the edges and the weights, and selecting the latest N experience network graphs to form a time sequence network graph;
the first calculation module is used for respectively generating sub-graphs of the time sequence network graph through a Louvian algorithm and calculating the occupation ratio of each sub-graph in the network graph to which the sub-graph belongs and a connection matrix among the sub-graphs;
the second calculation module is used for constructing a Markov model, and calculating the probability generated by any one edge in the network diagram to be detected by using an attenuation factor weighting method in combination with the sub-diagram occupancy and the connection matrix;
and the output module is used for defining the operation path, calculating the risk value of the path according to the probability generated by the edge, and detecting the high-risk operation path according to the risk value.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the program to implement the steps of the data security analysis method based on time series correlation analysis according to any one of claims 1 to 6.
9. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the data security analysis method based on temporal correlation analysis according to any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111490801.7A CN114116853A (en) | 2021-12-08 | 2021-12-08 | Data security analysis method and device based on time sequence correlation analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111490801.7A CN114116853A (en) | 2021-12-08 | 2021-12-08 | Data security analysis method and device based on time sequence correlation analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114116853A true CN114116853A (en) | 2022-03-01 |
Family
ID=80367524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111490801.7A Pending CN114116853A (en) | 2021-12-08 | 2021-12-08 | Data security analysis method and device based on time sequence correlation analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114116853A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117473509A (en) * | 2023-12-26 | 2024-01-30 | 信联科技(南京)有限公司 | Data security risk assessment method and system for data processing activities |
-
2021
- 2021-12-08 CN CN202111490801.7A patent/CN114116853A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117473509A (en) * | 2023-12-26 | 2024-01-30 | 信联科技(南京)有限公司 | Data security risk assessment method and system for data processing activities |
CN117473509B (en) * | 2023-12-26 | 2024-04-02 | 信联科技(南京)有限公司 | Data security risk assessment method and system for data processing activities |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110147387B (en) | Root cause analysis method, root cause analysis device, root cause analysis equipment and storage medium | |
CN111355697B (en) | Detection method, device, equipment and storage medium for botnet domain name family | |
US20160055044A1 (en) | Fault analysis method, fault analysis system, and storage medium | |
CN110166344B (en) | Identity identification method, device and related equipment | |
CN110995482A (en) | Alarm analysis method and device, computer equipment and computer readable storage medium | |
US11777824B2 (en) | Anomaly detection method and apparatus | |
CN111935140B (en) | Abnormal message identification method and device | |
CN110263869B (en) | Method and device for predicting duration of Spark task | |
CN111368887B (en) | Training method of thunderstorm weather prediction model and thunderstorm weather prediction method | |
CN113992340B (en) | User abnormal behavior identification method, device, equipment and storage medium | |
CN108696486B (en) | Abnormal operation behavior detection processing method and device | |
CN113822355A (en) | Composite attack prediction method and device based on improved hidden Markov model | |
CN111276247B (en) | Flight parameter data health assessment method and equipment based on big data processing | |
CN114116853A (en) | Data security analysis method and device based on time sequence correlation analysis | |
CN108076032B (en) | Abnormal behavior user identification method and device | |
EP3009942A1 (en) | Social contact message monitoring method and device | |
CN114202206A (en) | System abnormal root cause analysis method and device | |
CN117035374B (en) | Force cooperative scheduling method, system and medium for coping with emergency | |
CN110769003B (en) | Network security early warning method, system, equipment and readable storage medium | |
CN111552842A (en) | Data processing method, device and storage medium | |
CN115567305A (en) | Sequential network attack prediction analysis method based on deep learning | |
CN115987594A (en) | Abnormity detection method, device and equipment for network security log | |
CN114760190A (en) | Service-oriented converged network performance anomaly detection method | |
CN112422669A (en) | Multi-association equipment data real-time extraction method and related device | |
CN110489568B (en) | Method and device for generating event graph, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |