CN109766713B - Method for realizing dynamic rapid desensitization of data based on proxy - Google Patents

Method for realizing dynamic rapid desensitization of data based on proxy Download PDF

Info

Publication number
CN109766713B
CN109766713B CN201811536698.3A CN201811536698A CN109766713B CN 109766713 B CN109766713 B CN 109766713B CN 201811536698 A CN201811536698 A CN 201811536698A CN 109766713 B CN109766713 B CN 109766713B
Authority
CN
China
Prior art keywords
data
sensitive
desensitization
dynamic
numbers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811536698.3A
Other languages
Chinese (zh)
Other versions
CN109766713A (en
Inventor
杨国玉
白西让
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Datang Corp Science and Technology Research Institute Co Ltd
Original Assignee
China Datang Corp Science and Technology Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Datang Corp Science and Technology Research Institute Co Ltd filed Critical China Datang Corp Science and Technology Research Institute Co Ltd
Priority to CN201811536698.3A priority Critical patent/CN109766713B/en
Publication of CN109766713A publication Critical patent/CN109766713A/en
Application granted granted Critical
Publication of CN109766713B publication Critical patent/CN109766713B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Storage Device Security (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps: step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters; step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers; and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category in the dynamic desensitization process so as to enable the efficiency of the dynamic desensitization to be highest. The invention can be used for desensitizing sensitive data, realizes rapid dynamic desensitization when the data is accessed, and lays a solid foundation for constructing a safe and credible data use environment.

Description

Method for realizing dynamic rapid desensitization of data based on proxy
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a method for realizing dynamic and rapid desensitization of data based on an agent.
Background
Data desensitization refers to data deformation of some sensitive information through desensitization rules, and reliable protection of sensitive private data is achieved. Under the condition of relating to client security data or some business sensitive data, the real data is modified and provided for test use under the condition of not violating system rules, and data desensitization is required to be carried out on personal information such as identification numbers, mobile phone numbers, card numbers, client numbers and the like. One of the database security technologies, the database security technology mainly includes: the system comprises a database missing scanning system, a database encryption system, a database firewall system, a data desensitization system and a database security audit system. Database security risks include: dragging the warehouse, brushing the warehouse and bumping the warehouse.
Large data environments have been progressively applied to large enterprises. The lack of clear definition and management of ownership and use rights of sensitive data of an enterprise can cause the leakage of user privacy information and the leakage of data inside the enterprise, and directly cause double losses of reputation and economy of the enterprise. From the external perspective, namely the value of the data, the complex, sensitive and comprehensive data in a large data platform can certainly attract more potential attackers. Meanwhile, a large amount of data is collected, so that a hacker can obtain more data by attacking the hacker once successfully, and the attacking cost of the hacker is greatly reduced. Thus, large data will likely be a significant target for cyber attacks. The serious loss of the security capability of the big data platform and the ubiquitous risk result in that the big data platform is fragile, great risk is caused to the data security of an enterprise, and the big data platform is a risk point which is difficult to ignore for the enterprise.
In a big data environment, data is mostly in a storage form of NoSql, and various types of data are not stored after desensitization. When the data is accessed, the data is desensitized while sensitive detection is carried out on the accessed data, and the method is an important guarantee for realizing data security access in a big data environment.
Disclosure of Invention
The invention aims to provide a method for realizing dynamic rapid desensitization of data based on an agent, which is used in the field of data security and desensitization and realizes rapid dynamic desensitization when the data is accessed.
The invention provides a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps:
step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters;
step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers;
and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data types and the data quantity under each sensitive data type in the dynamic desensitization process so as to enable the efficiency of the dynamic desensitization to be highest.
Further, the step 1 comprises:
dividing the data into whole, and distinguishing characters, numbers and English letters;
counting the length of each section based on the division result, combining the length with the division result, and taking the division result as a key for splitting the dictionary;
and storing the data into the key corresponding to the format of the data to obtain a splitting dictionary set.
Further, the step 3 comprises:
counting the number of the sensitive fields, and recording as M; counting the total amount of data under each sensitive field, and accumulating the result to be recorded as N;
putting the data corresponding to each sensitive field into a library to be processed;
initializing M/2 asynchronous threads, and setting the following states for the threads: each thread only processes N/M pieces of data when processing the sensitive data, and does not take other types when the sensitive data is insufficient; and set it to idle state;
when a certain thread is in an idle state, a sensitive field is taken from the library to be processed for desensitization treatment, and the sensitive field and the data thereof are moved out of the library to be processed until all the data under the sensitive field are processed.
By means of the scheme, the method for realizing the dynamic and rapid desensitization of the data based on the agent can be used for desensitization work of sensitive data, rapid and dynamic desensitization is realized when the data is accessed, and a solid foundation is laid for constructing a safe and credible data use environment.
The foregoing description is only an overview of the technical solutions of the present invention, and in order to make the technical solutions of the present invention more clearly understood and to implement them in accordance with the contents of the description, the following detailed description is given with reference to the preferred embodiments of the present invention and the accompanying drawings.
Drawings
FIG. 1 is an overall flow chart of an implementation method of dynamic rapid desensitization of data based on an agent according to the present invention;
FIG. 2 is a flow chart of a data splitting algorithm of the method for implementing dynamic rapid desensitization of data based on an agent according to the present invention;
FIG. 3 is a flow chart of a data classification algorithm of the method for implementing dynamic rapid desensitization of data based on an agent according to the present invention;
FIG. 4 is a flow chart of a data desensitization algorithm of the agent-based data dynamic rapid desensitization implementation method of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention.
The embodiment provides a method for realizing dynamic and rapid desensitization of data based on an agent, which comprises the following steps:
step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters.
Step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers.
And 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data types and the data quantity under each sensitive data type in the dynamic desensitization process so as to enable the efficiency of the dynamic desensitization to be highest.
The method for realizing the dynamic and rapid desensitization of the data based on the agent can be used for desensitization work of sensitive data, realizes rapid and dynamic desensitization when the data is accessed, and lays a solid foundation for constructing a safe and credible data use environment.
The present invention is described in further detail below.
As shown in the overall flow chart of fig. 1, the method includes a dynamic splitting, classifying and desensitizing process of data.
Referring to fig. 2, the dynamic data splitting algorithm is a method for splitting data, and aims to split data to achieve fast classification and targeted desensitization. Namely, in the data, the data with uniform format is separated separately, including: 11-bit numeric data, 2-bit kanji data, 3-bit kanji data, text data of more than 10 characters, and the like. Is fully prepared for the subsequent targeted desensitization treatment. The method comprises the following specific steps:
(1) the data is divided into three types, namely, characters, numbers and English letters;
(2) counting the length of each section according to the division result, combining the length with the division result, such as 3-bit Chinese characters, 11-bit numbers, 10-below English letters and the like, and taking the division result as a key for splitting a dictionary;
(3) and storing the data into the key corresponding to the format of the data to obtain a splitting dictionary set.
Referring to fig. 3, the classification algorithm of data is to classify and identify the result of data splitting, i.e. a dictionary set, including common sensitive information: identity card number, mobile phone number, bank card number, name, social security number, etc. and marking them.
Referring to fig. 4, the desensitization algorithm of data refers to a desensitization algorithm that is used to perform dynamic desensitization on classified sensitive data in a targeted manner. And carrying out effective load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category, so that the efficiency of dynamic desensitization is highest. The method comprises the following specific steps:
(1) counting the number of the sensitive fields, and recording as M; counting the total amount of data under each sensitive field, and accumulating the result to be recorded as N;
(2) putting the data corresponding to each sensitive field into a library to be processed;
(3) initializing M/2 asynchronous threads, and setting the following states for the threads: each thread only processes N/M pieces of data when processing the sensitive data, and does not take other types when the sensitive data is insufficient; and set it to idle state;
(4) when a certain thread is in an idle state, a sensitive field is taken from the library to be processed for desensitization treatment, and the sensitive field and the data thereof are moved out of the library to be processed until all the data under the sensitive field are processed.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, it should be noted that, for those skilled in the art, many modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (1)

1. A method for realizing dynamic rapid desensitization of data based on an agent is characterized by comprising the following steps:
step 1, splitting data with uniform format in the data separately to obtain a split dictionary set; the data with uniform format comprises a plurality of 11-bit digital data, 2-bit Chinese character data, 3-bit Chinese character data and text data with more than 10 characters, and comprises the following steps:
dividing the data into whole, and distinguishing characters, numbers and English letters;
counting the length of each section based on the division result, combining the length with the division result, and taking the division result as a key for splitting the dictionary;
storing the data under a key corresponding to the format of the data to obtain a splitting dictionary set;
step 2, classifying and identifying the sensitive information in the split dictionary set to obtain sensitive data; the sensitive information comprises a plurality of identification numbers, mobile phone numbers, bank card numbers, names and social security numbers;
and 3, carrying out dynamic desensitization on the sensitive data based on a desensitization algorithm, and carrying out load balancing treatment on the sensitive data categories and the data quantity under each sensitive data category in the dynamic desensitization process so as to ensure that the efficiency of the dynamic desensitization is highest, wherein the method comprises the following steps:
counting the number of the sensitive fields, and recording as M; counting the total amount of data under each sensitive field, and accumulating the result to be recorded as N;
putting the data corresponding to each sensitive field into a library to be processed;
initializing M/2 asynchronous threads, and setting the following states for the threads: each thread only processes N/M pieces of data when processing the sensitive data, and does not take other types when the sensitive data is insufficient; and set it to idle state;
when a certain thread is in an idle state, a sensitive field is taken from the library to be processed for desensitization treatment, and the sensitive field and the data thereof are moved out of the library to be processed until all the data under the sensitive field are processed.
CN201811536698.3A 2018-12-15 2018-12-15 Method for realizing dynamic rapid desensitization of data based on proxy Active CN109766713B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811536698.3A CN109766713B (en) 2018-12-15 2018-12-15 Method for realizing dynamic rapid desensitization of data based on proxy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811536698.3A CN109766713B (en) 2018-12-15 2018-12-15 Method for realizing dynamic rapid desensitization of data based on proxy

Publications (2)

Publication Number Publication Date
CN109766713A CN109766713A (en) 2019-05-17
CN109766713B true CN109766713B (en) 2021-01-12

Family

ID=66451897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811536698.3A Active CN109766713B (en) 2018-12-15 2018-12-15 Method for realizing dynamic rapid desensitization of data based on proxy

Country Status (1)

Country Link
CN (1) CN109766713B (en)

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006510741A (en) * 2002-11-05 2006-03-30 ウェルスタット バイオロジックス コーポレイション Treatment of carcinoid neoplasms with therapeutic viruses
CN102638578B (en) * 2012-03-29 2016-05-04 深圳市高正软件有限公司 A kind of method of data synchronization and system based on mobile device
CN102724035B (en) * 2012-06-15 2015-04-01 中国电力科学研究院 Encryption and decryption method for encrypt card
CN104038314B (en) * 2014-05-09 2018-02-02 中煤电气有限公司 A kind of new safety supervision networking dynamic data RTTS and method
CN104731976B (en) * 2015-04-14 2018-03-30 海量云图(北京)数据技术有限公司 The discovery of private data and sorting technique in tables of data
CN107247741A (en) * 2017-05-14 2017-10-13 四川盛世天成信息技术有限公司 A kind of concentrating type textual magnanimity sensitive data processing method and system
CN107609418B (en) * 2017-08-31 2019-12-10 深圳市牛鼎丰科技有限公司 Desensitization method and device of text data, storage device and computer device

Also Published As

Publication number Publication date
CN109766713A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
Cheng et al. Enterprise data breach: causes, challenges, prevention, and future directions
CN107577939B (en) Data leakage prevention method based on keyword technology
US11188657B2 (en) Method and system for managing electronic documents based on sensitivity of information
US9160766B2 (en) Systems and methods for protecting organizations against spear phishing attacks
US10079835B1 (en) Systems and methods for data loss prevention of unidentifiable and unsupported object types
US9652597B2 (en) Systems and methods for detecting information leakage by an organizational insider
Paul Joseph et al. An analysis of digital forensics in cyber security
Alneyadi et al. Detecting data semantic: a data leakage prevention approach
US11036800B1 (en) Systems and methods for clustering data to improve data analytics
Kim et al. Analysis of cyber attacks and security intelligence
CN111931239A (en) Data leakage prevention system for database security protection
Kulkarni et al. Personally identifiable information (pii) detection in the unstructured large text corpus using natural language processing and unsupervised learning technique
Queiroz et al. Eavesdropping hackers: Detecting software vulnerability communication on social media using text mining
CN109766713B (en) Method for realizing dynamic rapid desensitization of data based on proxy
US9852288B2 (en) Securing data on a computing system
Jalil et al. A review of phishing URL detection using machine learning classifiers
Shrestha et al. High-performance classification of phishing URLs using a multi-modal approach with MapReduce
Kim et al. A study on analyzing risk scenarios about vulnerabilities of security monitoring system: focused on information leakage by insider
CN115544558A (en) Sensitive information detection method and device, computer equipment and storage medium
Lu et al. Research on the security of data cross-border circulation in cyberspace
Ibrishimova Cyber incident classification: issues and challenges
Canelón et al. Unstructured data for cybersecurity and internal control
Adharsh et al. Prevention of Data Breach by Machine Learning Techniques
Patil et al. Root causes, ongoing difficulties, proactive prevention techniques, and emerging trends of enterprise data breaches
Lin et al. Introduction to computer forensics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant