CN114722137A

CN114722137A - Security policy configuration method and device based on sensitive data identification and electronic equipment

Info

Publication number: CN114722137A
Application number: CN202110005330.XA
Authority: CN
Inventors: 张秀蕾; 粟栗; 刘芳; 徐世权; 米婧; 杨亭亭
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2021-01-05
Filing date: 2021-01-05
Publication date: 2022-07-08

Abstract

The embodiment of the application provides a security policy configuration method and device based on sensitive data identification and electronic equipment, and belongs to the technical field of data security. The method comprises the following steps: generating a current sensitive identification rule by utilizing a machine learning model based on the historical sensitive identification rule and the historical sensitive data; the sensitive identification rule is used for identifying sensitive data; identifying first sensitive data and second sensitive data from source data based on a current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitive label; acquiring a first security policy of first sensitive data from a security policy library; and determining a second security policy of the second sensitive data according to the first security policy. The method and the device can solve the problems of low sensitive data identification efficiency, low accuracy and untimely safety strategy updating.

Description

Security policy configuration method and device based on sensitive data identification and electronic equipment

Technical Field

The embodiment of the application relates to the technical field of data security, in particular to a security policy configuration method and device based on sensitive data identification and electronic equipment.

Background

With more and more services developed based on emerging technologies such as big data, data has become a brand new production element and is the power of digital economy development. With the prominent data value, sensitive data attract a large number of attackers, and the requirement of the country and the responsibility of enterprises are to make data security protection. In order to ensure data security, multiple security policies such as data desensitization, data access control, data auditing and the like are generally required to protect sensitive data. The basis of the data protection capability is to fully know the sensitivity degree, the storage position and the like of the data, namely, the sensitive data can be efficiently and accurately identified. The problems that sensitive data assets are not clear and data security management strategies are not updated timely are common problems in data security management.

The above information disclosed in this background section is only for enhancement of understanding of the background of the application and therefore it may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

The embodiment of the application aims to provide a security policy configuration method and device based on sensitive data identification and electronic equipment, and can solve the problems that in the prior art, sensitive data identification efficiency is low, accuracy is low, and security policy updating is not timely.

In order to solve the technical problem, the present application is implemented as follows:

in a first aspect, an embodiment of the present application provides a security policy configuration method based on sensitive data identification, including:

generating a current sensitive identification rule by utilizing a machine learning model based on the historical sensitive identification rule and the historical sensitive data; the sensitive identification rule is used for identifying sensitive data;

identifying first sensitive data and second sensitive data from source data based on a current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitive tag;

acquiring a first security policy of first sensitive data from a security policy library; wherein the first security policy comprises: a first access strategy, a first desensitization strategy and a first auditing strategy;

determining a second security policy of the second sensitive data according to the first security policy; wherein the second security policy comprises: a second access policy, a second desensitization policy, a second audit policy.

In an exemplary embodiment of the present application, the determining a second security policy of the second sensitive data according to the first security policy includes:

if the second security policy of the second sensitive data does not exist in the security policy library, configuring the second security policy of the second sensitive data as the first security policy of the first sensitive data; or the like, or, alternatively,

and if the second security policy of the second sensitive data exists in the security policy library, optimizing the second security policy of the second sensitive data according to the first security policy of the first sensitive data.

In an exemplary embodiment of the present application, said optimizing a second security policy of said second sensitive data according to a first security policy of said first sensitive data comprises:

modifying a second access policy of the second sensitive data according to a first access policy of the first sensitive data; or the like, or, alternatively,

adding a first desensitization policy to the first sensitive data in a second desensitization policy to the second sensitive data; or the like, or, alternatively,

and adding the first auditing strategy of the first sensitive data in the second auditing strategy of the second sensitive data.

In an exemplary embodiment of the present application, the modifying the second access policy of the second sensitive data according to the first access policy of the first sensitive data includes:

if a first user same as the first sensitive data exists in the second sensitive data, setting the access authority of the first user in the second sensitive data to be consistent with the access authority of the first user in the first sensitive data; and/or the presence of a gas in the gas,

and if a second user different from the first sensitive data exists in the second sensitive data, setting the access authority of the second user in the second sensitive data to be kept unchanged.

In an exemplary embodiment of the application, the generating a current sensitive identification rule by using a machine learning model based on the sensitive identification rule of the history and the sensitive data of the history comprises:

training the historical sensitive data by using a machine learning model to generate a new sensitive identification rule;

and adding the new sensitive identification rule in the historical sensitive identification rule to generate the current sensitive identification rule.

In an exemplary embodiment of the application, the training the historical sensitive data by using a machine learning model to generate a new sensitive recognition rule includes:

acquiring the historical sensitive data;

extracting keywords from the historical sensitive data by using a machine learning model;

converting the extracted keywords into word vectors and constructing a semantic space; wherein, the word vectors in the semantic space have semantic relation;

clustering word vectors with similar semantic relations in the word vectors to generate a sub-semantic space;

acquiring a frequent item set based on the sub-semantic space;

acquiring a potential association rule according to the frequent item set;

and taking the analyzed potential association rule as a new sensitive identification rule.

In an exemplary embodiment of the application, the extracting keywords from the history sensitive data by using a machine learning model includes:

presetting a sensitive word;

extracting relevant words of the preset sensitive words from the historical sensitive data by using a machine learning model;

and if the associated word meets a preset condition, determining that the associated word is the keyword.

In an exemplary embodiment of the application, if the associated word meets a preset condition, the determining that the associated word is the keyword includes:

if the number of the associated words is larger than or equal to the preset number, determining that the associated words are the keywords; or the like, or, alternatively,

and if the proportion of the associated words is larger than or equal to a preset proportion, determining the associated words as the keywords.

In an exemplary embodiment of the application, the identifying the first sensitive data and the second sensitive data from the source data based on the current sensitive identification rule includes:

acquiring source data;

creating a data index for the source data;

querying a data index matched with the current sensitive identification rule in the data indexes;

generating sensitive data based on the successfully matched data indexes; wherein the sensitive data comprises the first sensitive data and second sensitive data.

In an exemplary embodiment of the present application, the method further comprises:

acquiring the label attribute of the first sensitive data; wherein the label attribute comprises a label category attribute and a label level attribute;

generating a sensitive label of the first sensitive data according to the label attribute of the first sensitive data;

acquiring the label attribute of the second sensitive data;

and generating a sensitive label of the second sensitive data according to the label attribute of the second sensitive data.

In an exemplary embodiment of the present application, the generating sensitive data based on the data index with successful matching includes:

presetting a matching degree threshold;

screening successfully matched data indexes based on a matching degree threshold;

and generating sensitive data according to the screened data indexes.

In a second aspect, an embodiment of the present application provides a security policy configuration apparatus based on sensitive data identification, including:

the first generation module is used for generating a current sensitive identification rule based on a historical sensitive identification rule and historical sensitive data by using a machine learning model; the sensitive identification rule is used for identifying sensitive data;

the identification module is used for identifying first sensitive data and second sensitive data from source data based on a current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitive tag;

the first acquisition module is used for acquiring a first security policy of the first sensitive data from a security policy library; wherein the first security policy comprises: a first access strategy, a first desensitization strategy and a first auditing strategy;

the determining module is used for determining a second security policy of the second sensitive data according to the first security policy; wherein the second security policy comprises: a second access policy, a second desensitization policy, a second audit policy.

In an exemplary embodiment of the present application, the determining module includes:

the configuration sub-module is used for configuring a second security policy of the second sensitive data as a first security policy of the first sensitive data if the second security policy of the second sensitive data does not exist in the security policy library; or the like, or, alternatively,

and the optimization submodule is used for optimizing a second security policy of the second sensitive data according to the first security policy of the first sensitive data if the second security policy of the second sensitive data exists in the security policy library.

In an exemplary embodiment of the present application, the optimization submodule includes:

the modifying unit is used for modifying a second access policy of the second sensitive data according to a first access policy of the first sensitive data; or the like, or, alternatively,

a first adding unit for adding a first desensitization strategy of the first sensitive data in a second desensitization strategy of the second sensitive data; or the like, or, alternatively,

and the second adding unit is used for adding the first auditing strategy of the first sensitive data in the second auditing strategy of the second sensitive data.

In an exemplary embodiment of the present application, the modifying unit includes:

the synchronization subunit is configured to, if a first user same as the first user in the first sensitive data exists in the second sensitive data, set an access right of the first user in the second sensitive data, where the access right is consistent with an access right of the first user in the first sensitive data; and/or the presence of a gas in the gas,

and the keeping subunit is used for setting the access authority of the second user in the second sensitive data to be kept unchanged if the second user which is different from the second user in the first sensitive data exists in the second sensitive data.

In an exemplary embodiment of the present application, the first generating module includes:

the training submodule is used for training the historical sensitive data by utilizing a machine learning model to generate a new sensitive identification rule;

and the first generation submodule is used for adding the new sensitive identification rule in the historical sensitive identification rule to generate the current sensitive identification rule.

In an exemplary embodiment of the present application, the training submodule includes:

the first acquisition unit is used for acquiring the historical sensitive data;

the extraction unit is used for extracting key words from the historical sensitive data by utilizing a machine learning model;

the construction unit is used for converting the extracted keywords into word vectors and constructing a semantic space; wherein, the word vectors in the semantic space have semantic relation;

the first generation unit is used for clustering the word vectors with similar semantic relations in the word vectors to generate a sub-semantic space;

a second obtaining unit, configured to obtain a frequent item set based on the sub-semantic space;

a third obtaining unit, configured to obtain a potential association rule according to the frequent item set;

and the second generation unit is used for taking the analyzed potential association rule as a new sensitive identification rule.

In an exemplary embodiment of the present application, the extraction unit includes:

the preset subunit is used for presetting the sensitive words;

the extraction subunit is used for extracting the relevant words of the preset sensitive words from the historical sensitive data by using a machine learning model;

and the generation subunit is used for determining that the associated word is the keyword if the associated word meets a preset condition.

In an exemplary embodiment of the present application, the extracting unit is further configured to:

In an exemplary embodiment of the present application, the identification module includes:

the acquisition submodule is used for acquiring source data;

the creating submodule is used for creating a data index for the source data;

the query submodule is used for querying the data index matched with the current sensitive identification rule in the data index;

the second generation submodule is used for generating sensitive data based on the data index successfully matched; wherein the sensitive data comprises the first sensitive data and second sensitive data.

In an exemplary embodiment of the present application, the apparatus further comprises:

the second acquisition module is used for acquiring the tag attribute of the first sensitive data; wherein the label attribute comprises a label category attribute and a label level attribute;

the second generation module is used for generating the sensitive label of the first sensitive data according to the label attribute of the first sensitive data;

the third acquisition module is used for acquiring the label attribute of the second sensitive data;

and the third generating module is used for generating the sensitive label of the second sensitive data according to the label attribute of the second sensitive data.

In an exemplary embodiment of the present application, the second generation submodule includes:

the setting unit is used for presetting a matching degree threshold;

the screening unit is used for screening the successfully matched data indexes based on the matching degree threshold value;

and the third generating unit is used for generating sensitive data according to the screened data index.

In a third aspect, an embodiment of the present application provides an electronic device, including one or more processors; storage means for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement the steps in the security policy configuration method based on sensitive data identification as described above.

In a fourth aspect, the present application provides a readable storage medium, on which a program or instructions are stored, where the program or instructions, when executed by a processor, implement the steps in the security policy configuration method based on sensitive data identification as described above.

The beneficial effects of the above technical scheme of this application are as follows:

according to the embodiment of the application, omission and deficiency of the security policy of the sensitive data are made up through the automatic identification of the sensitive data and the sensitive rule expansion technology, and the synchronization of the security policy of the sensitive data is realized. The sensitive identification rule base is expanded through machine learning, the accuracy of sensitive data identification is improved, and virtuous circle of sensitive data identification is realized.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are only some embodiments of the present application, and other drawings may be derived from those drawings by those skilled in the art without inventive effort.

FIG. 1 is a flow diagram illustrating a security policy configuration method based on sensitive data identification in accordance with an exemplary embodiment.

Fig. 2 is a schematic diagram illustrating a generation of a current sensitive identification rule of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 3 is a diagram illustrating sensitive data identification of a security policy configuration method based on sensitive data identification according to an example embodiment.

Fig. 4 is a schematic structural diagram illustrating a sensitive identification rule base of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 5 is a schematic structural diagram illustrating a sensitive database of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 6 is a schematic diagram illustrating a security policy repository structure of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 7 is a security policy optimization flow diagram illustrating a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 8 is a diagram illustrating security policy optimization results of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating a security policy configuration apparatus based on sensitive data identification according to an example embodiment.

Fig. 10 is a block diagram illustrating another security policy configuration apparatus based on sensitive data identification according to an example embodiment.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.

The inventor of the application finds that the prior art has two problems in sensitive data identification and security policy optimization of sensitive data: (1) the data protection is mainly to protect and monitor the sensitive data according to predefined security measures and classification rules by automatically discovering the sensitive data, and does not relate to optimization and synchronization aiming at the imperfection of the security strategy of the sensitive data; (2) the method provides the method for automatically finding the sensitive data based on the natural language analysis, but does not consider the practical situations that the data quantity is huge, the data source is from an important production environment, the sensitive identification rule is incomplete and the like in the sensitive data identification implementation process, and the existing technology can cause the problems of high system overhead, low identification efficiency and accuracy and the like.

In view of various difficulties in the prior art, the scheme provides a security policy configuration method based on sensitive data identification, which can make up for omission and deficiency of sensitive data security policies and realize sensitive data security policy synchronization through sensitive data automatic identification and sensitive rule expansion technologies. The sensitive identification rule base is expanded through machine learning, the accuracy of sensitive data identification is improved, and virtuous circle of sensitive data identification is realized.

FIG. 1 is a flow diagram illustrating a security policy configuration method based on sensitive data identification in accordance with an exemplary embodiment; the security policy configuration method based on sensitive data identification at least comprises the following steps:

s102: and generating the current sensitive identification rule by utilizing a machine learning model based on the historical sensitive identification rule and the historical sensitive data.

Wherein the sensitive identification rule is used for identifying sensitive data.

Optionally, step S102 includes at least:

Wherein, training the historical sensitive data by using a machine learning model to generate a new sensitive recognition rule, further comprising:

acquiring the historical sensitive data;

acquiring a frequent item set based on the sub-semantic space;

acquiring a potential association rule according to the frequent item set;

Wherein, the extracting keywords from the history sensitive data by using the machine learning model further comprises:

presetting a sensitive word;

Wherein, if the associated word satisfies a preset condition, determining that the associated word is the keyword, further may include:

S104: first sensitive data and second sensitive data are identified from the source data based on the current sensitive identification rule.

Wherein the first sensitive data and the second sensitive data have the same sensitivity label.

Optionally, step S104 includes at least:

acquiring source data;

creating a data index for the source data;

Wherein the method may further comprise:

acquiring the label attribute of the second sensitive data;

Wherein the generating sensitive data based on the successfully matched data index further comprises:

presetting a matching degree threshold;

and generating sensitive data according to the screened data indexes.

S106: a first security policy for the first sensitive data is obtained from a security policy repository.

Wherein the first security policy comprises: a first access policy, a first desensitization policy, a first audit policy.

S108: and determining a second security policy of the second sensitive data according to the first security policy.

Wherein the second security policy comprises: a second access policy, a second desensitization policy, a second audit policy.

Wherein the determining a second security policy for the second sensitive data according to the first security policy may further include:

Wherein optimizing a second security policy for the second sensitive data based on the first security policy for the first sensitive data may further comprise:

Wherein the modifying the second access policy of the second sensitive data according to the first access policy of the first sensitive data may further include:

if the same first user in the first sensitive data exists in the second sensitive data, setting the access authority of the first user in the second sensitive data to be consistent with the access authority of the first user in the first sensitive data; and/or the presence of a gas in the gas,

As an example, the security policy configuration method based on sensitive data identification of the present application is described in further detail.

Since the unrecognized sensitive words exist in the sensitive data in the sensitive database, and further, no association relationship is established between the unrecognized sensitive words and the associated words, no corresponding sensitive recognition rule can be used for recognizing the unrecognized sensitive words.

Therefore, sensitive data in the sensitive database needs to be trained to recognize more target sensitive words, and new sensitive recognition rules are established according to the recognized target sensitive words and added to the historical sensitive recognition rules to be expanded.

Referring to fig. 2, the flow of the sensitive rule expansion is as follows:

the sensitive rule expansion module utilizes the text information of the sensitive data content in the machine learning training sensitive database to dig out other short texts (namely sensitive words) capable of identifying more targets, and the identification accuracy rate of the sensitive data is improved through the association rule. The expanded rules can be used for sensitive data identification again, so that sensitive identification of benign circulation is realized, and the identification range of sensitive data is more comprehensive. The sensitive rule expansion module comprises the steps of extracting key words, expanding a sensitive word bank and mining rule association.

And further, preprocessing the acquired sensitive data text information and extracting keywords.

The text information preprocessing mainly carries out unified processing on interference items in the sensitive data text information, wherein the interference items comprise numbers, special symbols, complexity and simplicity. Extracting keywords, firstly, segmenting words through a word segmentation tool, and deleting useless words and stop words at the same time. Furthermore, numbers of the type such as telephone numbers, bank card numbers, etc. can be recognized by the regulation and processed using the unified word replacement. Then, the TF-IDF can be used for calculating the weight of the vocabulary, and the number of the keywords or the threshold value can be set in a self-defined mode according to the weight to obtain the appointed keywords.

And further, generating a sub-semantic space based on semantic space clustering, and expanding a sensitive word bank.

Converting the extracted keywords into Word vectors by using a Word to vector (Word 2Vec) idea, constructing a semantic space, and based on the sensitive label attribute and rule information, adopting an idea of cutting the semantic space to realize sensitive Word bank expansion aiming at the sensitive label rule. The semantic space is divided by using a clustering algorithm, semantically similar word vectors are clustered to generate a sub-semantic space, and words with similarity not less than a threshold (such as 0.97) in the sub-semantic space are generated into An 'OR' relationship (such as A1| A2| A3 … | An), namely, a near-meaning word dictionary is automatically generated. Wherein, table 1 is a partial vocabulary table of the sub-semantic space extended by the sensitive thesaurus.

TABLE 1

Further, sensitive recognition rules are extended using association rule algorithms.

The sensitive recognition rule composition may include N sensitive words and associations between the sensitive words, where the associations may be one or more combinations of "and (&)", "or (|)", "not (|)" in the form of:

(A1|A2)&(B1|B2|B3)&…！(F1|F2|F3)

the generated rules can be tested and evaluated against the training corpus of the sensitive database, so that the strategies of excessive and insufficient sensitive words in the training corpus can be deleted, and meanwhile, the rules are further screened against the inclusion, repetition and cross relation of the rules, and the sensitive identification rule expansion is realized more simply.

After the sensitive identification rule base is expanded, sensitive data can be identified by using the sensitive identification rule in the sensitive identification rule base.

Fig. 3 is a diagram illustrating sensitive data identification of a security policy configuration method based on sensitive data identification according to an example embodiment. Fig. 4 is a schematic structural diagram illustrating a sensitive identification rule base of a security policy configuration method based on sensitive data identification according to an exemplary embodiment. Fig. 5 is a schematic structural diagram illustrating a sensitive database of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

Referring to fig. 3, 4 and 5, the flow of sensitive data identification is described in detail.

The sensitive identification rule base contains the attribute of the sensitive label and the sensitive identification rule. The sensitive identification rule is mainly started from data types, different detection rules are set for different data, for example, a regular expression is set for electronic mail (e-mail) detection, sensitive words are set for Chinese detection, and a column name rule is configured based on a standard data dictionary. The attribute of the sensitive label refers to the classification and classification related specifications and the identification requirement of the sensitive data, and configures the category and the level of the sensitive label, namely the category and the level to which the data matched with the label belongs, as the basis for marking the sensitive data.

The sensitive data identification module realizes sensitive data identification through an index technology, such as a full-text search engine (Lucene), adds attributes to the index, improves the sensitive data identification efficiency and accuracy, and mainly comprises index creation, index query and sensitive data marking.

Creating a data index for the sensitively identified source data based on the Lucene framework, including creating a Document object (Document) for each data and adding attributes including, but not limited to, location attributes (such as a library, a table, a column, etc. where the data in the database is located), content attributes (i.e., a current data value), and quantity attributes (such as an amount of data contained in a column of the database); analyzing the document, and dividing the vocabulary into words which can be understood individually; and indexing the vocabulary units obtained by analyzing all the documents, and finally realizing that the corresponding Document and the related attribute can be found by only searching the indexed vocabulary units.

And performing sensitive identification rule matching based on the index, namely inquiring whether index content is hit or not according to specific rules of a sensitive identification rule base. Query (Query) abstract classes may be used to define Query objects based on the Lucene framework, supporting associative queries between conditions and/or not. The Lucene search result can be traversed through top level documents (TopDocs), a matching degree threshold value can be set for the sensitive data identification result, the matching is successful when the number of matched data/the number of current data attribute exceeds the threshold value, and the information can be marked as sensitive information.

And marking the sensitive data to obtain the relevant attributes of the data according to the indexes successfully matched with the identification rules, marking the level category of the sensitive data according to the attributes of the sensitive labels, and simultaneously storing the marking information into a sensitive database. If the same data is successfully matched for multiple times, comparison is carried out according to the level of the sensitive label attribute, and when the level of the label matched for the next time is higher than the label level of the stored result, the marking information is updated.

The sensitive database contains the sensitive data content, the sensitive data location (unique identification), and the tag, data level, and category of the sensitive data hit.

The sensitive data in the source data is identified through the sensitive data identification module, 1, 2 … … i groups of sensitive data are obtained, the content and the position of the sensitive data are obtained, and sensitive tags are set for the sensitive data according to the data category and the data level of the sensitive data, which can be, for example, S and S1.

And after the sensitive data are obtained, analyzing and synchronizing the security policy of the sensitive data.

The strategy analysis and synchronization mainly realize the analysis optimization and synchronization of the data security strategy based on the sensitive database and the configured security strategy library.

The security policy repository contains the unique identification (location) of the data, the class level of the data, and the security policy associated with the data. Fig. 6 is a schematic diagram illustrating a security policy repository structure of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

And analyzing and comparing whether the data hitting the same sensitive label is configured with a security policy or whether the security policy is complete or not through a sensitive database and a security policy library based on the data position as the unique identifier of the data. A specific exemplary flow of policy analysis and optimization is as follows, please refer to fig. 7, and fig. 7 is a schematic diagram illustrating a security policy optimization flow of a security policy configuration method based on sensitive data identification according to an exemplary embodiment.

As shown in fig. 7, when the data B already has a security policy, different types of security policy optimization may refer to the following criteria:

and (3) access policy: acquiring an access strategy related to the data A and an existing access strategy of the data B in a security strategy library, recommending the access strategy of the data A as an optimized access strategy of the data B for the access right of the same user, and reserving the access rights of other existing users of the data B in the strategy;

desensitization strategy: acquiring a desensitization strategy union set of data A and data B in a security strategy library, and recommending the desensitization strategy union set as an optimized desensitization strategy of the data B;

and (3) auditing strategy: and acquiring an audit strategy union set of the data A and the data B in the security strategy base, and recommending the audit strategy union set as an optimized audit strategy of the data B.

The data A in the current security policy library represents the Nth column in a database table A, the sensitivity level is 4 levels, the category is user identity information, and the sensitive label is a label S; the data B represents the Mth column in the database table B, the sensitivity level is 3, the category is user service information, and the sensitivity label is S1; the relevant security policy information for data a and B is shown in fig. 8. And (3) identifying the sensitive data after the sensitive word bank is expanded, and if the data B is hit by the sensitive tag S, optimizing a strategy result set of the data B according to the strategy analysis and optimization method as follows: the access strategy users 1 and 2 modify according to the access authority of the data A, the audit strategy adds the audit strategy of the data A, and the desensitization strategy adds the desensitization strategy of the data A.

Policy configuration is an important link related to data security, so in the policy synchronization module, an administrator is required to perform adjustment and verification to ensure that the appropriate policy is configured to the security policy repository.

The policy recommendation provides a function of pushing a policy optimization result set obtained by policy analysis and optimization to an administrator, and the recommendation content includes but is not limited to: data storage location, data level, category, security policy for which data is not configured or needs to be updated;

the strategy configuration provides a strategy configuration function, so that an administrator can combine the recommended strategy and the grade class of the data to carry out more comprehensive strategy for data synchronization of a specific position, and the effect of the strategy synchronization module is improved.

The security policy configuration method based on sensitive data identification has the following beneficial effects:

according to the security policy configuration method based on sensitive data identification, omission and deficiency of sensitive data security policies are made up through sensitive data automatic identification and sensitive rule expansion technologies, and sensitive data security policy synchronization is achieved.

According to the security policy configuration method based on sensitive data identification, sensitive data identification is carried out based on the index file, the problem that identification efficiency is reduced due to large data volume in the sensitive data identification process is solved, and the sensitive data identification efficiency is improved by utilizing the index searching characteristic; direct interaction with the data source is not required to be generated every time of identification, high load on the production environment of the data source in the identification process is relieved, and further serious adverse effects are avoided.

According to the security policy configuration method based on sensitive data identification, the sensitive identification rule base is expanded through machine learning, the accuracy rate of sensitive data identification is improved, and virtuous circle of sensitive data identification is realized.

It should be clearly understood that this application describes how to make and use particular examples, but the principles of this application are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.

It should be noted that, in the security policy configuration method based on sensitive data identification provided in the embodiment of the present application, the execution subject may be a security policy configuration device based on sensitive data identification, or a control module in the security policy configuration device based on sensitive data identification, configured to execute the security policy configuration method based on sensitive data identification. In the embodiment of the present application, a security policy configuration device based on sensitive data identification executes a security policy configuration method based on sensitive data identification as an example, and the security policy configuration device based on sensitive data identification provided in the embodiment of the present application is described.

Fig. 9 is a block diagram illustrating a security policy configuration apparatus based on sensitive data identification in accordance with an example embodiment.

The apparatus 900 comprises:

a first generating module 910, configured to generate a current sensitive identification rule based on a historical sensitive identification rule and historical sensitive data by using a machine learning model; the sensitive identification rule is used for identifying sensitive data;

an identifying module 920, configured to identify first sensitive data and second sensitive data from source data based on a current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitivity label;

a first obtaining module 930, configured to obtain a first security policy of the first sensitive data from a security policy library; wherein the first security policy comprises: a first access strategy, a first desensitization strategy and a first auditing strategy;

a determining module 940, configured to determine a second security policy of the second sensitive data according to the first security policy; wherein the second security policy comprises: a second access policy, a second desensitization policy, a second audit policy.

Optionally, the determining module 940 includes:

Optionally, the optimization submodule includes:

Optionally, the modifying unit includes:

the synchronization subunit is configured to set, if a first user same as the first user in the first sensitive data exists in the second sensitive data, an access right of the first user in the second sensitive data, which is consistent with an access right of the first user in the first sensitive data; and/or the presence of a gas in the gas,

and the keeping subunit is used for setting the access authority of a second user in the second sensitive data to be kept unchanged if the second user in the second sensitive data is different from that in the first sensitive data.

Optionally, the first generating module 910 includes:

Optionally, the training submodule comprises:

the first acquisition unit is used for acquiring the historical sensitive data;

Optionally, the extraction unit includes:

the preset subunit is used for presetting the sensitive words;

Optionally, the extracting unit is further configured to:

and if the ratio of the associated words is greater than or equal to a preset ratio, determining that the associated words are the keywords.

Optionally, the identifying module 920 includes:

the acquisition submodule is used for acquiring source data;

the creating submodule is used for creating a data index for the source data;

Optionally, the apparatus 900 further comprises:

Optionally, the second generation submodule includes:

the setting unit is used for presetting a matching degree threshold;

The terminal 1000 according to the embodiment of the present invention can implement each process of the terminal-side processing method based on the terminal capability, and can achieve the same technical effect, and for avoiding repetition, details are not described here.

As shown in fig. 10, the security policy configuration apparatus based on sensitive data identification mainly includes the following functional modules: the system comprises a sensitive identification rule base, a sensitive data identification module, a sensitive database, a sensitive rule expansion module, a strategy analysis module, an optimization strategy result set, a strategy synchronization module and a security strategy base.

And the sensitive data identification module is used for identifying the sensitive data according to the sensitive identification rule base by creating the index file so as to realize the positioning of the sensitive data.

And the sensitive rule expansion module realizes sensitive rule expansion through machine learning based on the identified sensitive data, and the expanded rule can be reconfigured for sensitive data identification to realize virtuous cycle of sensitive data identification and rule expansion.

And the strategy analysis module is used for analyzing and optimizing the security strategy based on the existing sensitive data security strategy library and the identified sensitive database.

And the strategy synchronization module recommends the optimized strategy result set after strategy analysis to an administrator, and the administrator adjusts the result and synchronously configures the result to the security strategy library.

An electronic device 1100 according to this embodiment of the present application is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 11, electronic device 1100 is embodied in the form of a general purpose computing device. The components of the electronic device 1100 may include, but are not limited to: at least one processing unit 1110, at least one memory unit 1120, a bus 1130 connecting the various system components including the memory unit 1120 and the processing unit 1110, a display unit 1140, and the like.

Wherein the storage unit stores program code that can be executed by the processing unit 1110, such that the processing unit 1110 performs the steps according to various exemplary embodiments of the present application described in the present specification. For example, the processing unit 1110 may perform the steps as shown in fig. 1.

The memory unit 1120 may include a readable medium in the form of a volatile memory unit, such as a random access memory unit (RAM)11201 and/or a cache memory unit 11202, and may further include a read only memory unit (ROM) 11203.

The storage unit 1120 may also include a program/utility 11204 having a set (at least one) of program modules 11205, such program modules 11205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 1130 may be representative of one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1100 can also communicate with one or more external devices 1100' (e.g., keyboard, pointing device, bluetooth device, etc.), such that a user can communicate with devices with which the electronic device 1100 interacts, and/or any devices with which the electronic device 1100 can communicate with one or more other computing devices (e.g., router, modem, etc.). Such communication may occur via an input/output (I/O) interface 1150. Also, the electronic device 1100 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1160. The network adapter 1160 may communicate with other modules of the electronic device 1100 via the bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1100, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, R machine learning D systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, or a network device, etc.) to execute the above method according to the embodiments of the present application.

The software product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of:

based on historical sensitive identification rules and historical sensitive data, generating current sensitive identification rules by using a machine learning model; wherein the sensitive identification rule is used for identifying sensitive data;

identifying first sensitive data and second sensitive data from source data based on a current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitivity label;

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the foregoing method embodiments, and can achieve the same technical effect, and in order to avoid repetition, the details are not repeated here.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A security policy configuration method based on sensitive data identification is characterized by comprising the following steps:

identifying first sensitive data and second sensitive data from source data based on the current sensitive identification rule; wherein the first sensitive data and the second sensitive data have the same sensitive tag;

acquiring a first security policy of the first sensitive data from a security policy library; wherein the first security policy comprises: a first access strategy, a first desensitization strategy and a first auditing strategy;

2. The method for configuring security policies based on sensitive data identification according to claim 1, wherein the determining the second security policy of the second sensitive data according to the first security policy comprises:

if the second security policy of the second sensitive data does not exist in the security policy library, configuring the second security policy of the second sensitive data as the first security policy of the first sensitive data; or

3. The method for configuring security policy based on sensitive data identification according to claim 2, wherein optimizing the second security policy of the second sensitive data according to the first security policy of the first sensitive data comprises:

4. The method for configuring security policies based on sensitive data identification according to claim 3, wherein modifying the second access policy of the second sensitive data according to the first access policy of the first sensitive data comprises:

5. The security policy configuration method based on sensitive data identification according to claim 1, wherein the generating the current sensitive identification rule by using the machine learning model based on the sensitive identification rule of history and the sensitive data of history comprises:

6. The security policy configuration method based on sensitive data identification according to claim 5, wherein training the historical sensitive data by using a machine learning model to generate a new sensitive identification rule comprises:

acquiring the historical sensitive data;

acquiring a frequent item set based on the sub-semantic space;

acquiring a potential association rule according to the frequent item set;

7. The method for security policy configuration based on sensitive data identification according to claim 6, wherein the extracting keywords from the historical sensitive data by using a machine learning model comprises:

presetting a sensitive word;

8. The security policy configuration method based on sensitive data identification according to claim 7, wherein the determining that the associated word is the keyword if the associated word satisfies a preset condition includes:

if the number of the associated words is larger than or equal to the preset number, determining that the associated words are the keywords; or

9. The method for security policy configuration based on sensitive data identification according to claim 1, wherein the identifying the first sensitive data and the second sensitive data from the source data based on the current sensitive identification rule comprises:

acquiring source data;

creating a data index for the source data;

10. The security policy configuration method based on sensitive data identification of claim 9, further comprising:

acquiring the label attribute of the second sensitive data;

11. The security policy configuration method based on sensitive data identification according to claim 9, wherein generating sensitive data based on the data index matching successfully comprises:

presetting a matching degree threshold;

and generating sensitive data according to the screened data indexes.

12. A security policy configuration apparatus based on sensitive data identification, comprising:

13. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the steps in the security policy configuration method based on sensitive data identification of any one of claims 1 to 11.

14. A readable storage medium, on which a program or instructions are stored, wherein the program or instructions, when executed by a processor, implement the steps in the security policy configuration method based on sensitive data identification according to any one of claims 1 to 11.