CN110781520A

CN110781520A - Sensitive table group discovery method and system

Info

Publication number: CN110781520A
Application number: CN201911057182.5A
Authority: CN
Inventors: 陶景龙; 梁淑云; 刘胜; 马影; 王启凡; 魏国富; 徐�明; 殷钱安; 余贤喆; 周晓勇
Original assignee: Information and Data Security Solutions Co Ltd
Current assignee: Information and Data Security Solutions Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-02-11

Abstract

The invention provides a method and a system for sensitive list mass distribution, which comprises the following steps: s101, determining marked sensitive data in a database; s102, acquiring a database operation log, analyzing sql statements in the operation log, acquiring a table and column names, and establishing a blood relationship table; s103, constructing a chain-type establishing relational graph among all information tables in the database; and S104, based on the chain-type created relation graph established in S103 and the sensitive data marked in S101, searching a sensitive table group which has direct or indirect blood relationship with the sensitive data. The method provided by the invention highlights the group relation of the sensitive table on the premise of high accuracy, and greatly improves the storage and management efficiency of sensitive data of enterprises, organizations or individuals to the database.

Description

Sensitive table group discovery method and system

Technical Field

The invention relates to the technical field of computer data security, in particular to a sensitive table group discovery method and system.

Background

Sensitive data generally refers to information data with high confidentiality for enterprises, organizations or individuals, and in different industries, the categories of the sensitive data are different, but the management mode of the sensitive data basically adopts a database management system. Certain operations of users may spread a large amount of sensitive data stored in the database, that is, some sensitive data are copied from the information table to other tables, so that many new information tables with no marked secrecy degree are generated, and the newly created information tables may be queried, copied, exported and the like by other users, so that the sensitive data is leaked. Therefore, the management and control of the information table groups derived from the sensitive information table are very important, and the existing sensitive table group discovery mainly adopts two modes: manual combing and tool scanning.

The manual carding mainly depends on the understanding of related staff to system services to card out the information tables associated with sensitive data services, and then the manual carding is communicated with a database administrator to identify the storage positions and table names of the information tables associated with the sensitive data services, such as personal communication records, personal information wide tables, beautiful numbers and the like. The tool scanning is to analyze the marked sensitive data by a data analysis method to obtain corresponding content characteristics, then to apply a content analysis tool to find data with the content characteristics of the sensitive data in a specified scanning area, and then to identify the storage position and the table name of the scanned information table with content similar to that of the sensitive data.

The manual carding can be manually judged only according to the comprehension capacity of sensitive data services or data storage positions and table names, the method for finding the sensitive data is low in efficiency, consumes human resources, cannot reflect table group relations, and is limited in finding range.

Tool scanning is generally performed by using a database scanning tool to scan a designated location, and whether sensitive data exists is summarized according to content features of the marked sensitive data, where the content features generally include: Chinese-English length ratio, character type ratio, key words, text case, regular expression and the like. The accuracy of the sensitive data discovery method is high, but the method occupies a large amount of performance resources of a scanned database, and the database in a production environment is not allowed to be scanned frequently or even cannot support all data scanning, so that omission is easy, the real-time performance is poor, and the discovery method cannot establish the relationship between tables or reflect the sensitive table group relationship.

In summary, the sensitive data group discovery method in the prior art cannot accurately and efficiently find the sensitive data, and cannot effectively represent the relationship between the tables and the sensitive table group. Therefore, in order to effectively and safely supervise the sensitive data of the database, a scheme capable of discovering the sensitive table needs to be found out, and the population relation of the sensitive data table is shown in a complete and systematic manner.

Disclosure of Invention

The invention aims to solve the technical problems that in the prior art, a manual marking method is low in working efficiency, small in marking range, poor in tool scanning real-time performance and resource occupation, and provides a method and a system which can accurately and efficiently find a sensitive data table and can embody the relation between tables and sensitive table groups.

The invention solves the technical problems through the following technical means:

a method of sensitivity table population discovery, comprising: the method comprises the following steps:

s101, determining marked sensitive data in a database;

s102, acquiring a database operation log, analyzing sql statements in the operation log, acquiring a table and column names, and establishing a blood relationship table;

s103, constructing a chain-type establishing relational graph among all information tables in the database;

and S104, based on the chain-type created relation graph established in S103 and the sensitive data marked in S101, searching a sensitive table group which has direct or indirect blood relationship with the sensitive data.

Based on the operation logs of the database users and the related marked sensitive data, the analysis perspective is carried out on the operation logs of the database users by using an sql analysis tool to generate a blood relationship table, a chain-type creation relationship diagram is constructed in a graph form, and the sensitive data is associated, so that unmarked sensitive table groups in the database are found

Preferably, in step S101, specifically: and determining labeled sensitive data by a system maintenance person or a related service person, wherein the labeling form is as follows: table name-column name.

Preferably, the operation of establishing the blood relationship table in step S102 is to:

s1021: extracting the target table name according to the create or insert keyword, and sorting the target table name into a target table name Td set;

s1022: the original table name can be positioned according to the from key words and is sorted into an original table name Ts set;

s1023: then, combing the column dependency relationship between a target list name set Td and an original list name set Ts in an operation log according to a select keyword, and recording column names and column aliases, wherein the column names are column names in an original list, the column aliases are target list names, and the target list names are consistent with the original list names if no column aliases exist; thus, the relationship between Ts and Td is obtained, i.e. the relationship between the target table and the original table is mapped as: original list name-original column name- > target list name-target column name;

s1024: and storing all the relationships between the original table and the target table into an analysis table, thereby establishing a blood relationship table of the data in the database.

Preferably, the creating the relationship graph in a chained manner in step S103 specifically includes: and according to the blood relationship table in the step S1024, constructing a chain creation relationship diagram of each information table in the database by using the graph database, and storing the graph relationship in the index database.

Preferably, the step S104 of searching for the sensitive table group having a direct or indirect blood relationship with the sensitive data specifically includes: according to the sensitive data in the step S101, all tables in the chain-type created relationship graph are traversed by using a traversal algorithm, and sensitive table populations having direct or indirect relationships with the sensitive data and created dependencies therebetween are screened.

The invention also provides a system for sensitive table group discovery, which comprises

The sensitive data determining module is used for determining the marked sensitive data in the database;

the system comprises a blood relationship table establishing module, a database operation log analyzing module and a database operation log analyzing module, wherein the blood relationship table establishing module is used for acquiring a table and a column name and establishing a blood relationship table;

the chain creation relational graph construction module is used for constructing a chain creation relational graph among all information tables in the database;

and the sensitive table group searching module is used for searching a sensitive table group which has a direct or indirect blood relationship with the sensitive data based on the established chain-type created relation graph and the marked sensitive data.

Preferably, the marked sensitive data is determined by a system maintenance person or a related service person, and the marking form is as follows: table name-column name.

Preferably, the establishing of the blood relationship table specifically includes:

extracting the target table name according to the create or insert keyword, and sorting the target table name into a target table name Td set;

the original table name can be positioned according to the from key words and is sorted into an original table name Ts set;

then, combing the column dependency relationship between a target list name set Td and an original list name set Ts in an operation log according to a select keyword, and recording column names and column aliases, wherein the column names are column names in an original list, the column aliases are target list names, and the target list names are consistent with the original list names if no column aliases exist; thus, the relationship between Ts and Td is obtained, i.e. the relationship between the target table and the original table is mapped as: original list name-original column name- > target list name-target column name;

and storing all the relationships between the original table and the target table into an analysis table, thereby establishing a blood relationship table of the data in the database.

Preferably, the chain creation relationship diagram specifically includes: and according to the blood relationship table, constructing a chain creation relationship diagram of each information table in a database by using a diagram database, and storing the relationship diagram in a diagram database.

Preferably, according to the sensitive data, all tables in the chain-type creation relationship graph are traversed by using a traversal algorithm, and sensitive table populations having direct or indirect relationship with the sensitive data and creation dependency relationships among the sensitive table populations are screened out.

The invention has the advantages that: the invention provides a new method for quickly and efficiently searching a sensitive table group, namely, based on a database user operation log and related marked sensitive data, wherein an sql analysis tool is used for carrying out analysis and perspective on the database user operation log to generate a blood relation table, a chain-type created relation table is constructed in a graph form, and the sensitive data is associated, so that an unmarked sensitive table group in a database is found, and the group relation of the sensitive table is highlighted on the premise of high accuracy, so that the storage and management efficiency of sensitive data of an enterprise, an organization or an individual on the database is greatly improved.

Drawings

FIG. 1 is a block flow diagram of a method for discovering a sensitivity table population according to an embodiment of the present invention;

FIG. 2 is a graph of a chain-type creation relationship constructed from a graph database according to Table 1 in an embodiment of the present invention;

FIG. 3 is a graph illustrating the relationship between the sensitivity table population and the sensitivity data, which is obtained by searching for a sensitivity table population having a direct or indirect blood relationship with the sensitivity data based on the labeled sensitivity data shown in FIG. 2;

FIG. 4 is a block diagram of a system for sensitive table group discovery according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

A method for discovering a sensitive table group is shown in figure 1, and comprises the following specific steps:

s101, determining marked sensitive data;

s102, analyzing sql statements in the operation log, acquiring a table and column names, and establishing a blood relationship table;

The method is based on a database user operation log and related marked sensitive data, wherein an sql analysis tool is used for carrying out analysis perspective on the database user operation log to generate a blood relationship table, a chain type creation relationship diagram is constructed in a graph form, and the sensitive data is associated, so that unmarked sensitive table groups in a database are found

The contents of each step are specifically described as follows:

the method in S101 comprises the following steps:

the sensitive data marked is determined by the maintenance personnel and the related business personnel (sensitive data: should be marked in the form of table name-column name, for example, T1-c1, T1-c2, T1-c3)

The method in S102 comprises the following steps:

analyzing a Structured Query Language (SQL) statement in an operation log of the database, analyzing the SQL statement, and extracting statements containing two operation instructions with a data copying function, namely a create table and an insert intro from the SQL statement. Then, using a Python to analyze the statement of the SQL statement by using an SQL statement tool, wherein the SQL statement tool has the functions of extracting SQL single statement, formatting SQL statement, analyzing SQL and the like, and the analysis function can analyze key words;

s1021: extracting a target table name according to a create or insert keyword, and arranging the target table name into a < Td > set;

s1022: the original table name can be positioned according to the from keyword, and the original table name is arranged into a < Ts > set;

s1023: then, combing the column dependency relationship between the target table name set Td and the original table name set Ts in the operation log according to the select key word, and recording the column names and the column aliases (the column names are the column names in the original table, the column aliases are the target table column names, and the target table column names are consistent with the original column names if no column aliases exist), so as to obtain the bloody edge relationship between Ts and Td, namely the relationship between the target table and the original table is mapped as: original list name, original column name- > target list name, target column name;

example 1: example of Sql tool parsing:

inputting: "create table T2 as select c1 as c 1' from T1"

And (3) outputting: t1, c1, T2 and c1 ', wherein the target table name T2 is identified according to create, the original table name T1 is identified according to from, and the original column name c1 and the target column name c 1' are identified according to select.

Example 2: example of Sql tool parsing:

inputting: "create table T3 as select c1 as c 2' from T1;

insert into T3(c2’)select c2 from T1；”

and (3) outputting: t1, c2, T3 and c2 ', wherein a target table name T3 is identified according to insert, an original table name T1 is identified according to from, and an original column name c2 and a target column name c 2' are identified according to select.

S1024: storing all the relationships between the original table and the target table into an analysis table, thereby establishing a Blood-relationship table of data in a database, which is marked as Blood-relationship, and the content form of the Blood-relationship table is as shown in table 1: table of relationship between blood vessels.

TABLE 1 relationship of blood

Original table name	Original column name	Dependency relationships	Target table name	Target column name
					T1	c1	->	T2	c1
T1	c2	->	T3	c3
					T1	c4	->	T3	c2
T1	c5	->	T3	c4
					T1	c5	->	T3	c5
T2	c1	->	T3	c1
					T3	c1	->	T4	c1
T3	c2	->	T4	c2
					T3	c3	->	T4	c3
T3	c3	->	T4	c4
					T3	c4	->	T4	c5
T4	c1	->	T5	c1
					T4	c2	->	T6	c1
T4	c3	->	T7	C3
					T4	c4	->	T6	c4
T4	c5	->	T6	c5
					T5	c1	->	T7	c2
T6	c1	->	T7	c1

The method in S103 comprises the following steps:

and according to the blood relationship table obtained by the analysis of the S1024, constructing a Chain creation relationship diagram of each information table in the database by using a diagram database, recording the Chain creation relationship diagram as Chain-shaped, and storing the Chain creation relationship diagram in a diagram database by using a diagram relationship. The used graph database is a non-relational database, and can effectively display the information content among multiple tables and analyze the relation among the tables in multiple dimensions. The Chain-shaped content form is as shown in the attached figure 2: the relationship graph is created in a chain. The method is divided into an original table T1 and various generation tables T2, T3, T4, T5, T6 and T7, and the connecting line directions among the generation tables T2, T3, T4, T5, T6 and T7 represent subordination relations.

The method in S104 comprises the following steps:

according to the method, all tables in a Chain creation relationship graph Chain-shaped are traversed by using a traversal algorithm according to the 'marked Sensitive data' in the S101, Sensitive table groups which have direct or indirect relationship with the Sensitive data and creation subordination relationships among the Sensitive table groups are screened out, and the Sensitive table groups are marked as Sensitive-Chain-shaped, and the content form of the Sensitive-Chain-shaped is shown as the following figure 3: the sensitive table group creates a relational graph, wherein columns c1, c2 and c3 of an original table T1 are marked sensitive data, and c4 and c5 are marked non-sensitive data; the dashed connecting lines and directions represent the sensitive data transmission process; the solid line is the non-sensitive data transmission process.

As shown in fig. 3: column names of T6-c4 are similar to labeled non-sensitive data T1-c4, but the source data of the T6-c4 is T1-c2 and is labeled sensitive data, so that T6-c4 is also sensitive data; on the contrary, the column names of T7-c1 are similar to the labeled sensitive data T1-c1, but the source data is T1-c4 which is labeled non-sensitive data, so that T7-c1 is also non-sensitive data; therefore, in the transmission process, the identification of whether the data is sensitive data or not cannot be influenced by the change of the table name and the column name; the sensitive data discovered by the method of the invention are as follows: t2-c1, T3-c1, T3-c3, T4-c1, T4-c3, T4-c4, T5-c1, T6-c4, T7-c2 and T7-c 3.

As shown in FIG. 4, the embodiment further provides a system for sensitive table group discovery, which includes

The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method of sensitivity table population discovery, comprising: the method comprises the following steps:

s101, determining marked sensitive data in a database;

2. The method of claim 1, wherein the method comprises: in step S101, the method specifically includes: and determining labeled sensitive data by a system maintenance person or a related service person, wherein the labeling form is as follows: table name-column name.

3. The method of claim 1, wherein the method comprises: the operation of establishing the blood relationship table in step S102 is:

4. The method of claim 3, wherein the method further comprises: the chain creation relationship diagram in step S103 specifically includes: and according to the blood relationship table in the step S1024, constructing a chain creation relationship diagram of each information table in the database by using the graph database, and storing the graph relationship in the index database.

5. The method of claim 4, wherein the method further comprises: the step S104 of searching for a sensitive table group having a direct or indirect blood relationship with sensitive data specifically includes: according to the sensitive data in the step S101, all tables in the chain-type created relationship graph are traversed by using a traversal algorithm, and sensitive table populations having direct or indirect relationships with the sensitive data and created dependencies therebetween are screened.

6. A system for sensitive form population discovery, comprising: comprises that

7. The system of claim 6, wherein: and determining labeled sensitive data by a system maintenance person or a related service person, wherein the labeling form is as follows: table name-column name.

8. The system of claim 6, wherein: the establishment of the blood relationship table specifically comprises the following steps:

9. The system of claim 8, wherein: the chain creation relationship diagram specifically includes: and according to the blood relationship table, constructing a chain creation relationship diagram of each information table in a database by using a diagram database, and storing the relationship diagram in a diagram database.

10. The system of claim 9, wherein: and traversing all tables in the chain creation relationship graph by using a traversal algorithm according to the sensitive data, and screening a sensitive table population which has a direct or indirect relationship with the sensitive data and a creation dependency relationship between the sensitive table population and the sensitive data.