CN113515771A - Data sensitivity determination method, electronic device, and computer-readable storage medium - Google Patents

Data sensitivity determination method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN113515771A
CN113515771A CN202110295033.3A CN202110295033A CN113515771A CN 113515771 A CN113515771 A CN 113515771A CN 202110295033 A CN202110295033 A CN 202110295033A CN 113515771 A CN113515771 A CN 113515771A
Authority
CN
China
Prior art keywords
matrix
data
preset
weight
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110295033.3A
Other languages
Chinese (zh)
Inventor
周建宁
胡铁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aspire Digital Technologies Shenzhen Co Ltd
Original Assignee
Aspire Digital Technologies Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aspire Digital Technologies Shenzhen Co Ltd filed Critical Aspire Digital Technologies Shenzhen Co Ltd
Priority to CN202110295033.3A priority Critical patent/CN113515771A/en
Publication of CN113515771A publication Critical patent/CN113515771A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a data sensitivity judgment method, electronic equipment and a computer-readable storage medium, and relates to the field of data detection. The method comprises the steps of identifying initial data by obtaining the initial data to be judged to obtain a plurality of target fields and corresponding field weights of the initial data, obtaining preset sensitive categories according to the target fields and generating an initial matrix according to the target fields and the field weights; according to the acquired preset sensitivity category, the preset weight database and the initial matrix, the scoring matrix of the initial data is obtained through calculation, the sensitivity of the initial data is judged according to the scoring matrix, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.

Description

Data sensitivity determination method, electronic device, and computer-readable storage medium
Technical Field
The present disclosure relates to the field of data detection, and in particular, to a data sensitivity determination method, an electronic device, and a computer-readable storage medium.
Background
With the information process gradually deepening into daily life, personal information, business confidential information and national confidential information are mainly stored in a digital mode, and therefore harm caused by various information leakage events is increased.
At present, the main factors causing data leakage are numerous, and for the internet industry with the largest proportion, the main factors include technical factors and non-technical factors, wherein the non-technical factors mainly include: internal staff disclosure, poor internal information management, artificial information disclosure and the like.
In order to enhance data leakage caused by non-technical reasons, a general scheme is to implement modes such as sensitive data classification, information access right control, information desensitization, encryption, watermarking and the like, wherein the sensitive data intelligent identification and classification technology is taken as an example, only single data of sensitive information is analyzed, and cannot be comprehensively analyzed, so that the accuracy rate of sensitive information identification is low, and information leakage cannot be effectively prevented.
Disclosure of Invention
The present application is directed to solving at least one of the problems in the prior art. Therefore, the data sensitivity judgment method is provided, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.
The application also provides an electronic device with the data sensitivity determination method.
The application also provides a computer readable storage medium with the data sensitivity determination method.
The data sensitivity determination method according to the embodiment of the first aspect of the application comprises the following steps:
acquiring initial data to be judged;
identifying the initial data to obtain a plurality of target fields of the initial data and field weights corresponding to the target fields;
acquiring preset sensitive categories according to the target fields, and generating an initial matrix according to the target fields and the field weights;
calculating to obtain a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix;
and judging the sensitivity of the initial data according to the scoring matrix. The data sensitivity judging method according to the embodiment of the application has at least the following beneficial effects: the method comprises the steps of identifying initial data by obtaining the initial data to be judged to obtain a plurality of target fields and corresponding field weights of the initial data, obtaining preset sensitive categories according to the target fields and generating an initial matrix according to the target fields and the field weights; according to the acquired preset sensitivity category, the preset weight database and the initial matrix, the scoring matrix of the initial data is obtained through calculation, the sensitivity of the initial data is judged according to the scoring matrix, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.
According to some embodiments of the application, the initial data comprises easy data;
the identifying the initial data to obtain a target field of the initial data and a field weight corresponding to the target field includes:
and identifying the simple data based on the field characteristics of the target fields and the regular expression to obtain the target fields of the simple data and the field weights corresponding to the target fields.
According to some embodiments of the application, the initial data comprises complex data;
the identifying the initial data to obtain a target field of the initial data and a field weight corresponding to the target field includes:
and identifying the complex data based on a machine learning classification algorithm to obtain a plurality of target fields of the complex data and field weights corresponding to the target fields.
According to some embodiments of the present application, the weight database comprises a plurality of verification weight matrices;
correspondingly, the generating a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix includes:
acquiring a plurality of verification weight matrixes according to the preset sensitive category and the preset weight database;
and calculating to obtain a scoring matrix of the initial data according to the verification weight matrixes and the initial matrix.
According to some embodiments of the present application, the weight database comprises a plurality of verification weight matrices and a preset weight matrix;
correspondingly, the generating a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix includes:
acquiring a plurality of verification weight matrixes and preset weight matrixes according to the preset sensitive category and the preset weight database;
and calculating to obtain a scoring matrix of the initial data according to the verification weight matrixes, the preset weight matrix and the initial matrix.
According to some embodiments of the application, further comprising: and carrying out weight adjustment on the preset weight matrix according to the grading matrix.
According to some embodiments of the present application, the performing weight adjustment on the preset weight matrix according to the score matrix further includes:
setting an expected matrix according to the initial data;
obtaining an error matrix based on the expectation matrix and the scoring matrix;
calculating to obtain an increased vector matrix according to the error matrix and the preset weight matrix;
and calculating to obtain an adjusted preset weight matrix according to the incremental vector matrix and the scoring matrix.
According to some embodiments of the present application, said determining the sensitivity of the initial data from the scoring matrix comprises:
determining the optimal sensitive category corresponding to the initial data according to the scoring matrix;
and determining the sensitivity of the initial data according to the optimal sensitivity category and a preset sensitivity category database.
An electronic device according to a second aspect embodiment of the present application includes: at least one processor, and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions for execution by the at least one processor to cause the at least one processor to carry out the method of determining data sensitivity of the first aspect when executing the instructions.
According to the electronic equipment of this application, have at least following beneficial effect: by executing the data sensitivity judgment method in the first aspect, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.
According to a third aspect of the present application, there is provided a computer-readable storage medium storing computer-executable instructions for causing a computer to execute the data sensitivity determination method according to the first aspect.
The computer-readable storage medium according to the present application has at least the following advantageous effects: by executing the data sensitivity judgment method in the first aspect, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
FIG. 1 is a flowchart illustrating a data sensitivity determination method according to an embodiment of the present disclosure;
FIG. 2 is a diagram illustrating an exemplary implementation of a preset sensitivity category according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a step S400 of the data sensitivity determination method according to an embodiment of the present invention;
FIG. 4 is another flowchart illustrating the step S400 of the data sensitivity determination method according to the embodiment of the present application;
FIG. 5 is a flowchart illustrating a step S500 of the data sensitivity determination method according to an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating another embodiment of a data sensitivity determination method according to the present application;
FIG. 7 is a flowchart illustrating a specific step S600 of the data sensitivity determination method according to the embodiment of the present application;
fig. 8 is a schematic diagram illustrating a specific application of the data sensitivity determination method in the embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
It should be noted that the logical order is shown in the flowcharts, but in some cases, the steps shown or described may be performed in an order different from the flowcharts. The use of any and all examples, or exemplary language (e.g., "such as" and the like ") provided herein is intended merely to better illuminate embodiments of the application and does not pose a limitation on the scope of the application unless otherwise claimed. The terms greater than, less than, more than, etc. are understood to exclude the essential numbers, and the terms greater than, less than, and the like are understood to include the essential numbers. If the first and second are described for the purpose of distinguishing technical features, they are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated or implicitly indicating the precedence of the technical features indicated.
It is noted that, as used in the examples, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Furthermore, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art. The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any combination of one or more of the associated listed items.
With the information process gradually deepening into daily life, personal information, business confidential information and national confidential information are mainly stored in a digital mode, and therefore harm caused by various information leakage events is increased.
At present, the main factors causing data leakage are numerous, and for the internet industry with the largest proportion, the main factors include technical factors and non-technical factors, wherein the non-technical factors mainly include: internal staff disclosure, poor internal information management, artificial information disclosure and the like.
In order to enhance data leakage caused by non-technical reasons, a general scheme is to implement modes such as sensitive data classification, information access right control, information desensitization, encryption, watermarking and the like, wherein the sensitive data intelligent identification and classification technology is taken as an example, only single data of sensitive information is analyzed, and cannot be comprehensively analyzed, so that the accuracy rate of sensitive information identification is low, and information leakage cannot be effectively prevented.
Specifically, at present, identification of sensitive information of a database is analyzed based on a single field, but a database access control technology can rarely perform access control by taking the field as granularity, although part of database products can perform field-level authorization, maintenance and management workload is huge, and no organization can be implemented basically, so that identification accuracy of the sensitive information is low only for field identification of sensitive levels and classification, and the problem of information leakage cannot be effectively prevented.
Based on this, the embodiment of the application provides a data sensitivity determination method, an electronic device and a computer-readable storage medium, which can comprehensively analyze sensitive information, improve the identification accuracy of the sensitive information, and effectively avoid the problem of information leakage.
In a first aspect, an embodiment of the present application provides a data sensitivity determination method.
In some embodiments, referring to fig. 1, a flow chart of a data sensitivity determination method in an embodiment of the present application is shown. The method specifically comprises the following steps:
s100, acquiring initial data to be judged;
s200, identifying the initial data to obtain a plurality of target fields of the initial data and field weights corresponding to the target fields;
s300, acquiring a preset sensitive category according to the target fields, and generating an initial matrix according to the target fields and the field weights;
s400, calculating to obtain a scoring matrix of initial data according to a preset sensitive category, a preset weight database and an initial matrix;
and S500, judging the sensitivity of the initial data according to the scoring matrix.
In step S100, initial data that needs to be determined is obtained, where the initial data refers to data that may have sensitive information and is collected by a plurality of data collection methods, such as data in a data table in a database or form data of a website interface. In practical application, a plurality of data records of each data table in the database may be collected in a random collection manner as initial data to be determined, or interfaces of a website may be randomly accessed, and form data in the interfaces may be collected as initial data to be determined by obtaining DOM (Document Object Model) structure data in HTML (Hyper Text Markup Language) information.
In step S200, identifying the initial data to obtain a plurality of target fields in the initial data and field weights corresponding to the target fields, wherein the identification of the initial data is to identify whether sensitive information exists in the initial data or not; the target field refers to a sensitive field which may be sensitive information in the initial data; the field weight refers to the probability that the identified target field may be sensitive information, i.e., the weight size.
In some embodiments, when the initial data includes the simplified data, the simplified data is identified based on the field features of the target fields and the regular expression, and the target fields and the corresponding field weights of the simplified data are obtained. The simple data refers to data with obvious information characteristics of sensitive information in the initial data, such as simple data of a mobile phone number, an identity card number and the like, or data which is easy to identify, such as a field name of a data table and the like; the field characteristics refer to preset fields similar to the target subsections derived based on the target fields needing to be identified, such as field names related to names, passwords, mobile phone numbers and the like in a data table, and the field characteristics similar to the field characteristics can be field characteristics of usernames, passports, mobile phones and the like; the regular expression is constructed by combining the field characteristics so as to identify the target field. In addition, the field weight corresponding to the target field identified by the regular expression and the field characteristics is preset to be 1 by default, namely the target field is determined to be correct sensitive information. It should be noted that the number of field features in the embodiment of the present application is adjusted according to the number of target fields to be identified, and the present application is not particularly limited.
In some embodiments, when the initial data includes complex data, the complex data is identified based on a machine learning classification algorithm to obtain a plurality of target fields of the complex data and field weights corresponding to the target fields. The complex data refers to data with information characteristics of sensitive information which is difficult to identify in the initial data; the machine learning classification algorithm is a supervised learning method for modeling or predicting discrete random variables. The objective of classification learning is to learn a classification function or classification model, also commonly referred to as a classifier, from a given set of manually labeled classification training samples. When new data comes, prediction can be carried out according to the function, the new data item is mapped to one of the given classes, and the machine learning classification algorithm adopted by the application can be described by taking a Bert algorithm as an example.
In practical application, an algorithm model for identification may be trained through a machine learning classification algorithm, and the chinese pre-training library of amazonaws and the Bert machine learning classification algorithm are used for training, for example, about one hundred thousand of initial training data volumes and about 2.5 ten thousand of verification set data volumes are taken as an example, to perform model training to generate a corresponding algorithm model, where the algorithm model can identify a target field and a corresponding field weight in complex data for the complex data, and it should be noted that details are not described in detail for a model training process in the embodiment of the present application.
In some embodiments, field processing may be performed on the obtained initial data to be determined, and an identification mark beneficial to a machine learning classification algorithm is added to determine complex data in the initial data.
In steps S300 and S400, determining an acquired preset sensitive category according to the multiple target fields obtained by identification, and generating an initial matrix according to the multiple target fields and multiple field weights corresponding to the multiple target fields; and calculating to obtain a scoring matrix of the initial data according to the preset sensitive category, the preset weight database and the initial matrix. The preset sensitive category is obtained by a plurality of target fields, and the initial matrix is generated by the plurality of target fields and corresponding field weights; the preset weight database refers to different preset weights of a plurality of target fields in different preset sensitive categories, and the size of the preset weights can be set according to actual requirements or can be accumulated by previously acquired weights of the plurality of target fields in the preset sensitive categories.
The preset sensitive category is a category summarized for a plurality of target fields, that is, when a plurality of target fields are identified in the initial data, the combination of the plurality of target fields can determine the corresponding preset sensitive category, for example, when the preset sensitive category is user identity information, the preset sensitive category includes, but is not limited to, any one or more target fields such as a user name, an identification number, a passport, a mobile phone number, an address, a driver's license, and the like; the multiple target fields obtained by identifying the initial data are respectively names, identification numbers, mobile phone numbers, addresses and company names, and the combination of the multiple target fields can determine the corresponding preset sensitive type as user identity information; it should be noted that the combination of multiple target fields can determine multiple different preset sensitivity categories; different preset sensitivity categories have different sensitivities, that is, sensitivity levels of data, which can be understood with reference to fig. 2, for example, the preset sensitivity categories include an extreme sensitivity level, a high sensitivity level, a medium sensitivity level, a low sensitivity level, and a non-sensitivity level, different preset sensitivity categories exist under different sensitivity levels, and different preset sensitivity categories relate to different target fields.
It should be noted that the preset sensitivity category and the corresponding sensitivity in fig. 2 can be adjusted and set according to different service scenarios, which are not described and limited in detail in this embodiment.
For example, when the plurality of target fields to be identified are names, identification numbers, mobile phone numbers, addresses, and company names, and the corresponding field weights obtained by identification are 0.8, 0.0, 1.0, 0.87, and 0.0, the initial matrix of the corresponding configuration is:
[0.8 0.0 1.0 0.87 0.0]
the description is given for a preset weight database, the weight database comprises a plurality of verification weight matrixes, the verification weight matrixes are weight matrixes related to the preset sensitive categories and the scores, namely weight values, of corresponding target fields, and a plurality of corresponding verification weight matrixes can be determined according to the obtained preset sensitive categories and the target fields.
Correspondingly, in some embodiments, referring to fig. 3, in step S400, the method specifically includes the steps of:
s411, acquiring a plurality of verification weight matrixes according to a preset sensitive category and a preset weight database;
and S412, calculating to obtain a scoring matrix of the initial data according to the plurality of verification weight matrixes and the initial matrix.
In step S411, according to the identified multiple target fields, a corresponding preset sensitive category is determined in combination with fig. 2, and then, according to a preset weight database, a verification weight matrix corresponding to the preset sensitive category is obtained. For example, when the multiple target fields obtained by recognition are names, identification numbers, mobile phone numbers, addresses, and company names, the corresponding preset sensitive categories are user identity information, user private data, and corporate identity, and the three preset sensitive categories are stored in the corresponding verification weight matrix in combination with the weight database.
In step S412, a scoring matrix of the initial data is obtained by combining the obtained multiple verification weight matrices and the initial matrices corresponding to the multiple target fields, and a specific calculation manner may be calculated by using a proximity distance algorithm WAKNN, so that a scoring matrix of the multiple target fields is obtained, where the scoring matrix is a scoring for different preset sensitive categories, that is, a weighted value of which preset sensitive category the multiple different target fields belong to.
For example, taking the specific weight values of the initial matrix and the multiple verification weight matrices as examples, the scoring matrices of different preset sensitive categories can be obtained through calculation, and the different preset sensitive categories have corresponding different scoring weights.
In some embodiments, referring to fig. 4, in step S400, the method specifically includes the steps of:
s421, acquiring a plurality of verification weight matrixes and preset weight matrixes according to preset sensitive categories and a preset weight database;
and S422, calculating to obtain a scoring matrix of the initial data according to the verification weight matrixes, the preset weight matrix and the initial matrix.
In step S421, a corresponding preset sensitive category is determined by referring to fig. 2 according to the identified multiple target fields, and a verification weight matrix corresponding to the preset sensitive category and a corresponding preset weight matrix are obtained from a preset weight database. The preset weight matrix is a weight matrix which is obtained by setting the weight values of a plurality of target fields in different preset sensitive types according to actual conditions and requirements, and can be used for optimizing the calculation results of the verification weight matrix and the initial matrix to obtain a more accurate scoring matrix.
Taking the target fields obtained by identification as name, identification number, mobile phone number, address, and company name, and the corresponding preset sensitive categories are determined as user identity information, user private data, and corporate identity through fig. 2 as an example, a certain verification weight matrix obtained for the three preset sensitive categories is:
[0.8 0.8 0.3 0.3 0.2]
[0.8 0.4 0.8 0.8 0.2]
[0.2 0.6 0.6 0.7 0.8]
the meaning expressed in the three verification weight matrices is explained as follows: the weight values of the name, the identification card number, the mobile phone number, the address and the company name in the user identity information are respectively 0.8, 0.3 and 0.2; the weight values of the name, the identification number, the mobile phone number, the address and the company name in the private data of the user are respectively 0.8, 0.4, 0.8 and 0.2; the weight values of the names, the identification numbers, the mobile phone numbers, the addresses and the company names in the identity labels of the legal persons are respectively 0.2, 0.6, 0.7 and 0.8.
Correspondingly, the initial matrix of the plurality of object fields is as follows:
[0.8 0.0 1.0 0.87 0.0]
correspondingly, the preset weight matrix is as follows:
[0.99 0.99 0.40 0.30 0.02]
[0.90 0.18 0.85 0.82 0.02]
[0.02 0.06 0.32 0.75 0.85]
the meaning expressed in the preset weight matrix is explained as follows: the weight values of the name, the identification card number, the mobile phone number, the address and the company name in the user identity information are respectively 0.99, 0.4, 0.3 and 0.02; the weight values of the name, the identification number, the mobile phone number, the address and the company name in the private data of the user are respectively 0.9, 0.18, 0.85, 0.82 and 0.02; the weight values of the names, the identification numbers, the mobile phone numbers, the addresses and the company names in the identity labels of the legal persons are respectively 0.02, 0.06, 0.32, 0.75 and 0.85. The specific numerical value is obtained through analysis and experience accumulation, and the weight value of the numerical value can be adjusted.
It should be noted that, in practical applications, the verification weight matrix includes a plurality of verification weight matrices, and the specific number may be thousands or tens of thousands of verification weight matrices; the number of the selected preset weight matrix does not need to be too large, and can be small, so that the calculated scoring matrix is optimized, for example, in the embodiment of the present application, a preset weight matrix is taken as an example for explanation.
Based on the obtained verification weight matrixes, the initial matrix and the weight matrix, calculating by using a proximity distance algorithm WAKNN to obtain a scoring matrix as follows:
user identity information: [0.75]
And private data of the user: [0.98]
Identity marking of a legal person: [0.12]
The description is given for the above scoring matrix, that is, in combination with the obtained multiple target fields, for different preset sensitive categories, the weight value belonging to the user identity information, that is, the similarity value, is 0.75, the weight value belonging to the user private data, that is, the similarity value, is 0.98, and the weight value belonging to the corporate identity, that is, the similarity value, is 0.12. Therefore, if the weighted value is the user private data with the largest weight value, the initial data is classified as the user private data.
In step S500, according to the obtained scoring matrix, determining weight values of the initial data in different preset sensitivity categories, determining which preset sensitivity category the initial data belongs to according to the weight values, and determining a corresponding sensitivity, i.e., a sensitivity level, according to the different preset sensitivity categories.
In some embodiments, referring to fig. 5, in the embodiments of the present application, step S500 further includes:
s510, determining the optimal sensitive category corresponding to the initial data according to the scoring matrix;
s520, determining the sensitivity of the initial data according to the optimal sensitivity category and a preset sensitivity category database.
In step S510 and step S520, according to the calculated scoring matrix, specific weight values of the initial data for different preset sensitive categories are determined, the preset sensitive category with the largest weight value is selected as an optimal sensitive category, that is, a most-preferred preset sensitive category according to the weight values, and the sensitivity of the initial data is determined according to the sensitive positioning in the preset sensitive category database in the optimal sensitive category.
In some embodiments, referring to fig. 6, the method in the embodiments of the present application specifically includes the following steps:
s600, carrying out weight adjustment on the preset weight matrix according to the scoring matrix.
In step S600, a preset weight matrix initially set is weight-adjusted according to the calculated score matrix to obtain an adjusted preset weight matrix, so that the adjusted preset weight matrix is calculated to obtain a more accurate score matrix. The weight adjustment means that numerical value adjustment is performed on each weight value in a preset weight matrix, so that the weight value is ensured to be more approximate to a weight value of practical application. In practical application, the weight adjustment mode can be a numerical adjustment according to the scoring matrix and an expert experience scoring mode, and the adjusted weight matrix can be calculated in a nonlinear least square method.
In some implementation examples, referring to fig. 7, in this embodiment of the application, step S600 further includes:
s610, setting an expected matrix according to the initial data;
s620, acquiring an error matrix based on the expected matrix and the scoring matrix;
s630, calculating to obtain an increased vector matrix according to the error matrix and a preset weight matrix;
and S640, calculating according to the incremental vector matrix and the scoring matrix to obtain an adjusted preset weight matrix.
In step S610, according to the multiple target fields and corresponding preset sensitive categories identified in the initial data, based on a manual determination manner or an expert experience analysis manner, an expected matrix of the initial data for different preset sensitive categories is set, i.e. the expected scoring matrix, for example, when the plurality of target fields obtained by identifying the initial data are name, identification number, mobile phone number, address, company name, and the corresponding field weights are 0.8, 0.0, 1.0, 0.87, 0.0, the initial data center does not have the identity card number and the driving license number which are equal to the identity-related information and are not in accordance with the expected standard belonging to the user identity information, and determining that the mobile phone number and the address meet the standard of the private data of the user, and based on the standard, giving an expected matrix, namely the expected matrix in a mode of expert experience evaluation. For example, the expected matrix is as follows:
user identity information: [0.25]
And private data of the user: [0.98]
Identity marking of a legal person: [0.12]
In step S620, an error matrix is obtained according to the expected matrix and the score matrix calculated in step S400, where the error matrix refers to an error weight value of the expected matrix and the score matrix, and for example, in combination with the score matrix and the expected matrix of the above example, the calculated error matrix is as follows:
user identity information: [0.5]
And private data of the user: [0.0]
Identity marking of a legal person: [0.0]
In step S630, an incremental vector matrix is obtained by calculating the error matrix obtained by calculation and a preset weight matrix, where the calculation mode may be calculated by a nonlinear least square method, and a corresponding incremental vector matrix is obtained by multiple iterative calculations. For example, after calculation is performed by combining the error matrix and the preset weight matrix of the above example, a corresponding incremental vector matrix is obtained as follows:
[-0.0002 0.001 -0.3 -0.03 0.001]
[0.001 0.001 0.005 0.0 0.002]
[0.001 -0.001 0.003 0.001 0.001]
in step S640, the calculated incremental matrix and the preset weight matrix are added to obtain an adjusted preset weight matrix, for example, after calculation is performed by combining the incremental matrix and the preset weight matrix of the above example, the obtained adjusted preset weight matrix is as follows:
[0.9898 0.001 0.1 0.27 0.021]
[0.981 0.181 0.855 0.82 0.022]
[0.021 0.059 0.332 0.751 0.851]
in practical applications, a new scoring matrix corresponding to the initial data may be obtained by calculating the obtained adjusted preset weight matrix again in step S400, for example, after calculating with the adjusted preset weight matrix of the above example, the obtained new scoring matrix is as follows:
user identity information: [0.281]
And private data of the user: [0.985]
Identity marking of a legal person: [0.123]
The accuracy of the scoring matrix is higher than that of the scoring matrix obtained by original calculation, and the identification accuracy of the sensitivity of initial data is guaranteed.
In an application example which may be implemented, referring to fig. 8, initial data to be determined by a data table or a page form is obtained, the initial data is identified to obtain a plurality of corresponding target fields and field weights, a preset sensitive type and an initial matrix are determined according to the plurality of target fields and the corresponding field weights in combination with a sensitive type database, matrix calculation is performed in combination with the weight database to obtain a corresponding preset weight matrix and a verification weight matrix, a scoring matrix corresponding to the initial data is obtained after the matrix calculation, and the sensitivity of the initial data is determined according to the scoring matrix in combination with the sensitive type database. And carrying out weight adjustment on a preset weight matrix in the weight database according to the obtained scoring matrix and the corresponding sensitivity to obtain an adjusted preset weight matrix, then carrying out matrix calculation to obtain a new scoring matrix, and re-determining the sensitivity of the initial data.
In the embodiment of the application, the initial data to be judged is obtained and identified to obtain a plurality of target fields and corresponding field weights of the initial data, a preset sensitive category is obtained according to the target fields, and an initial matrix is generated according to the target fields and the field weights; according to the acquired preset sensitivity category, the preset weight database and the initial matrix, the scoring matrix of the initial data is obtained through calculation, the sensitivity of the initial data is judged according to the scoring matrix, the sensitive information can be comprehensively analyzed, the identification accuracy of the sensitive information is improved, and the problem of information leakage is effectively avoided.
In a second aspect, an embodiment of the present application further provides an electronic device, including: at least one processor, and a memory communicatively coupled to the at least one processor;
wherein the processor is configured to execute the data sensitivity determination method in the first aspect by calling a computer program stored in the memory.
The memory, which is a non-transitory computer readable storage medium, may be used to store a non-transitory software program and a non-transitory computer executable program, such as the data sensitivity determination method in the embodiment of the first aspect of the present application. The processor implements the data sensitivity determination method in the first embodiment described above by executing a non-transitory software program and instructions stored in the memory.
The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the stored data area may store data sensitivity determination methods performed in the embodiments of the first aspect described above. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The non-transitory software programs and instructions required to implement the data sensitivity determination method in the first aspect embodiment described above are stored in a memory and, when executed by one or more processors, perform the data sensitivity determination method in the first aspect embodiment described above.
In a third aspect, embodiments of the present application further provide a computer-readable storage medium storing computer-executable instructions for: performing the data sensitivity determination method in the first aspect embodiment;
in some embodiments, the computer-readable storage medium stores computer-executable instructions, which are executed by one or more control processors, for example, by one of the processors in the electronic device of the second aspect, and may cause the one or more processors to execute the data sensitivity determination method of the first aspect.
The above described embodiments of the device are merely illustrative, wherein the units illustrated as separate components may or may not be physically separate, i.e. may be located in one place, or may also be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
One of ordinary skill in the art will appreciate that all or some of the steps, systems, and methods disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
In the description herein, references to the description of the terms "some embodiments," "examples," "specific examples," or "some examples," etc., mean that a particular feature or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example.

Claims (10)

1. A data sensitivity determination method is characterized by comprising the following steps:
acquiring initial data to be judged;
identifying the initial data to obtain a plurality of target fields of the initial data and field weights corresponding to the target fields;
acquiring preset sensitive categories according to the target fields, and generating an initial matrix according to the target fields and the field weights;
calculating to obtain a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix;
and judging the sensitivity of the initial data according to the scoring matrix.
2. The data sensitivity determination method according to claim 1, wherein the initial data includes simplified data;
the identifying the initial data to obtain a target field of the initial data and a field weight corresponding to the target field includes:
and identifying the simple data based on the field characteristics of the target fields and the regular expression to obtain the target fields of the simple data and the field weights corresponding to the target fields.
3. The data sensitivity determination method according to claim 2, wherein the initial data includes complex data;
the identifying the initial data to obtain a target field of the initial data and a field weight corresponding to the target field includes:
and identifying the complex data based on a machine learning classification algorithm to obtain a plurality of target fields of the complex data and field weights corresponding to the target fields.
4. The data sensitivity determination method according to claim 2 or 3, wherein the weight database includes a plurality of verification weight matrices;
correspondingly, the generating a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix includes:
acquiring a plurality of verification weight matrixes according to the preset sensitive category and the preset weight database;
and calculating to obtain a scoring matrix of the initial data according to the verification weight matrixes and the initial matrix.
5. The data sensitivity determination method according to claim 2 or 3, wherein the weight database includes a plurality of verification weight matrices and preset weight matrices;
correspondingly, the generating a scoring matrix of the initial data according to the preset sensitive category, a preset weight database and the initial matrix includes:
acquiring a plurality of verification weight matrixes and preset weight matrixes according to the preset sensitive category and the preset weight database;
and calculating to obtain a scoring matrix of the initial data according to the verification weight matrixes, the preset weight matrix and the initial matrix.
6. The data sensitivity determination method according to claim 5, further comprising:
and carrying out weight adjustment on the preset weight matrix according to the grading matrix.
7. The data sensitivity determination method according to claim 6, wherein the weight adjustment of the preset weight matrix is performed according to the scoring matrix, and further comprising:
setting an expected matrix according to the initial data;
obtaining an error matrix based on the expectation matrix and the scoring matrix;
calculating to obtain an increased vector matrix according to the error matrix and the preset weight matrix;
and calculating according to the incremental vector matrix and the preset weight matrix to obtain an adjusted preset weight matrix.
8. The data sensitivity determination method according to claim 1 or 7, wherein the determining the sensitivity of the initial data according to the scoring matrix includes:
determining the optimal sensitive category corresponding to the initial data according to the scoring matrix;
and determining the sensitivity of the initial data according to the optimal sensitivity category and a preset sensitivity category database.
9. An electronic device, comprising:
at least one processor, and,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions for execution by the at least one processor to cause the at least one processor, when executing the instructions, to implement the data sensitivity determination method of any one of claims 1 to 8.
10. A computer-readable storage medium storing computer-executable instructions for causing a computer to perform the data sensitivity determination method according to any one of claims 1 to 8.
CN202110295033.3A 2021-03-19 2021-03-19 Data sensitivity determination method, electronic device, and computer-readable storage medium Pending CN113515771A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110295033.3A CN113515771A (en) 2021-03-19 2021-03-19 Data sensitivity determination method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110295033.3A CN113515771A (en) 2021-03-19 2021-03-19 Data sensitivity determination method, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN113515771A true CN113515771A (en) 2021-10-19

Family

ID=78061968

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110295033.3A Pending CN113515771A (en) 2021-03-19 2021-03-19 Data sensitivity determination method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113515771A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115168345A (en) * 2022-06-27 2022-10-11 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium
CN115168345B (en) * 2022-06-27 2023-04-18 天翼爱音乐文化科技有限公司 Database classification method, system, device and storage medium

Similar Documents

Publication Publication Date Title
CN109241418B (en) Abnormal user identification method and device based on random forest, equipment and medium
CN112613501A (en) Information auditing classification model construction method and information auditing method
CN112926654B (en) Pre-labeling model training and certificate pre-labeling method, device, equipment and medium
CN109816200B (en) Task pushing method, device, computer equipment and storage medium
CN111818198B (en) Domain name detection method, domain name detection device, equipment and medium
CN115828112B (en) Fault event response method and device, electronic equipment and storage medium
CN110674360B (en) Tracing method and system for data
CN115941322B (en) Attack detection method, device, equipment and storage medium based on artificial intelligence
CN112613569A (en) Image recognition method, and training method and device of image classification model
CN113515771A (en) Data sensitivity determination method, electronic device, and computer-readable storage medium
CN114372532A (en) Method, device, equipment, medium and product for determining label marking quality
CN113988226B (en) Data desensitization validity verification method and device, computer equipment and storage medium
CN114241411B (en) Counting model processing method and device based on target detection and computer equipment
CN116245630A (en) Anti-fraud detection method and device, electronic equipment and medium
CN115359468A (en) Target website identification method, device, equipment and medium
CN113283388A (en) Training method, device and equipment of living human face detection model and storage medium
CN114241253A (en) Model training method, system, server and storage medium for illegal content identification
CN114662099A (en) AI model-based application malicious behavior detection method and device
CN112417007A (en) Data analysis method and device, electronic equipment and storage medium
CN116403074B (en) Semi-automatic image labeling method and device based on active labeling
CN117272256A (en) Sensitive data detection method and device, computer equipment and storage medium
CN115240230A (en) Canine face detection model training method and device, and detection method and device
CN115202966A (en) Picture processing method and device, computer equipment and storage medium
CN113780338A (en) Confidence evaluation method, system, equipment and storage medium in big data analysis based on support vector machine
CN115237739A (en) Method, device and equipment for analyzing board card operating environment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination