CN109766713B

CN109766713B - Method for realizing dynamic rapid desensitization of data based on proxy

Info

Publication number: CN109766713B
Application number: CN201811536698.3A
Authority: CN
Inventors: 杨国玉; 白西让
Original assignee: China Datang Corp Science and Technology Research Institute Co Ltd
Current assignee: China Datang Corp Science and Technology Research Institute Co Ltd
Priority date: 2018-12-15
Filing date: 2018-12-15
Publication date: 2021-01-12
Anticipated expiration: 2038-12-15
Also published as: CN109766713A

Abstract

The invention relates to a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps: step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters; step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers; and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category in the dynamic desensitization process so as to enable the efficiency of the dynamic desensitization to be highest. The invention can be used for desensitizing sensitive data, realizes rapid dynamic desensitization when the data is accessed, and lays a solid foundation for constructing a safe and credible data use environment.

Description

Method for realizing dynamic rapid desensitization of data based on proxy

Technical Field

The invention belongs to the technical field of information security, and particularly relates to a method for realizing dynamic and rapid desensitization of data based on an agent.

Background

Data desensitization refers to data deformation of some sensitive information through desensitization rules, and reliable protection of sensitive private data is achieved. Under the condition of relating to client security data or some business sensitive data, the real data is modified and provided for test use under the condition of not violating system rules, and data desensitization is required to be carried out on personal information such as identification numbers, mobile phone numbers, card numbers, client numbers and the like. One of the database security technologies, the database security technology mainly includes: the system comprises a database missing scanning system, a database encryption system, a database firewall system, a data desensitization system and a database security audit system. Database security risks include: dragging the warehouse, brushing the warehouse and bumping the warehouse.

Large data environments have been progressively applied to large enterprises. The lack of clear definition and management of ownership and use rights of sensitive data of an enterprise can cause the leakage of user privacy information and the leakage of data inside the enterprise, and directly cause double losses of reputation and economy of the enterprise. From the external perspective, namely the value of the data, the complex, sensitive and comprehensive data in a large data platform can certainly attract more potential attackers. Meanwhile, a large amount of data is collected, so that a hacker can obtain more data by attacking the hacker once successfully, and the attacking cost of the hacker is greatly reduced. Thus, large data will likely be a significant target for cyber attacks. The serious loss of the security capability of the big data platform and the ubiquitous risk result in that the big data platform is fragile, great risk is caused to the data security of an enterprise, and the big data platform is a risk point which is difficult to ignore for the enterprise.

In a big data environment, data is mostly in a storage form of NoSql, and various types of data are not stored after desensitization. When the data is accessed, the data is desensitized while sensitive detection is carried out on the accessed data, and the method is an important guarantee for realizing data security access in a big data environment.

Disclosure of Invention

The invention aims to provide a method for realizing dynamic rapid desensitization of data based on an agent, which is used in the field of data security and desensitization and realizes rapid dynamic desensitization when the data is accessed.

The invention provides a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps:

step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters;

step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers;

and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data types and the data quantity under each sensitive data type in the dynamic desensitization process so as to enable the efficiency of the dynamic desensitization to be highest.

Further, the step 1 comprises:

dividing the data into whole, and distinguishing characters, numbers and English letters;

counting the length of each section based on the division result, combining the length with the division result, and taking the division result as a key for splitting the dictionary;

and storing the data into the key corresponding to the format of the data to obtain a splitting dictionary set.

Further, the step 3 comprises:

counting the number of the sensitive fields, and recording as M; counting the total amount of data under each sensitive field, and accumulating the result to be recorded as N;

putting the data corresponding to each sensitive field into a library to be processed;

initializing M/2 asynchronous threads, and setting the following states for the threads: each thread only processes N/M pieces of data when processing the sensitive data, and does not take other types when the sensitive data is insufficient; and set it to idle state;

when a certain thread is in an idle state, a sensitive field is taken from the library to be processed for desensitization treatment, and the sensitive field and the data thereof are moved out of the library to be processed until all the data under the sensitive field are processed.

By means of the scheme, the method for realizing the dynamic and rapid desensitization of the data based on the agent can be used for desensitization work of sensitive data, rapid and dynamic desensitization is realized when the data is accessed, and a solid foundation is laid for constructing a safe and credible data use environment.

The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.

Drawings

FIG. 1 is an overall flow chart of an implementation method of dynamic rapid desensitization of data based on an agent according to the present invention;

FIG. 2 is a flow chart of a data splitting algorithm of the method for implementing dynamic rapid desensitization of data based on an agent according to the present invention;

FIG. 3 is a flow chart of a data classification algorithm of the method for implementing dynamic rapid desensitization of data based on an agent according to the present invention;

FIG. 4 is a flow chart of a data desensitization algorithm of the agent-based data dynamic rapid desensitization implementation method of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.

The embodiment provides a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps:

step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters.

Step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers.

The method for realizing the dynamic and rapid desensitization of the data based on the agent can be used for desensitization work of sensitive data, realizes rapid and dynamic desensitization when the data is accessed, and lays a solid foundation for constructing a safe and credible data use environment.

The present invention is described in further detail below.

As shown in the overall flow chart of fig. 1, the method includes a dynamic splitting, classifying and desensitizing process of data.

Referring to fig. 2, the dynamic data splitting algorithm is a method for splitting data, and aims to split data to achieve fast classification and targeted desensitization. Namely, in the data, the data with uniform format is separated separately, including: 11-bit numeric data, 2-bit kanji data, 3-bit kanji data, text data of more than 10 characters, and the like. Is fully prepared for the subsequent targeted desensitization treatment. The method comprises the following specific steps:

(1) the data is divided into three types, namely, characters, numbers and English letters;

(2) counting the length of each section according to the division result, combining the length with the division result, such as 3-bit Chinese characters, 11-bit numbers, 10-below English letters and the like, and taking the division result as a key for splitting a dictionary;

(3) and storing the data into the key corresponding to the format of the data to obtain a splitting dictionary set.

Referring to fig. 3, the classification algorithm of data is to classify and identify the result of data splitting, i.e. a dictionary set, including common sensitive information: identity card number, mobile phone number, bank card number, name, social security number, etc. and marking them.

Referring to fig. 4, the desensitization algorithm of data refers to a desensitization algorithm that is used to perform dynamic desensitization on classified sensitive data in a targeted manner. And carrying out effective load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category, so that the efficiency of dynamic desensitization is highest. The method comprises the following specific steps:

(1) counting the number of the sensitive fields, and recording as M; counting the total amount of data under each sensitive field, and accumulating the result to be recorded as N;

(2) putting the data corresponding to each sensitive field into a library to be processed;

(3) initializing M/2 asynchronous threads, and setting the following states for the threads: each thread only processes N/M pieces of data when processing the sensitive data, and does not take other types when the sensitive data is insufficient; and set it to idle state;

(4) when a certain thread is in an idle state, a sensitive field is taken from the library to be processed for desensitization treatment, and the sensitive field and the data thereof are moved out of the library to be processed until all the data under the sensitive field are processed.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A method for realizing dynamic rapid desensitization of data based on an agent is characterized by comprising the following steps:

step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters, and comprises the following steps:

storing the data under a key corresponding to the format of the data to obtain a splitting dictionary set;

and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category in the dynamic desensitization process so as to ensure that the efficiency of the dynamic desensitization is highest, wherein the method comprises the following steps: