CN113742781B - K anonymous clustering privacy protection method, system, computer equipment and terminal - Google Patents

K anonymous clustering privacy protection method, system, computer equipment and terminal Download PDF

Info

Publication number
CN113742781B
CN113742781B CN202111123601.8A CN202111123601A CN113742781B CN 113742781 B CN113742781 B CN 113742781B CN 202111123601 A CN202111123601 A CN 202111123601A CN 113742781 B CN113742781 B CN 113742781B
Authority
CN
China
Prior art keywords
data
identifier
attribute
quasi
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111123601.8A
Other languages
Chinese (zh)
Other versions
CN113742781A (en
Inventor
吴珺
朱嘉辉
王春枝
董佳明
周显敬
刘虎
李天意
朱天亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Zhuoer Information Technology Co ltd
Hubei University of Technology
Original Assignee
Wuhan Zhuoer Information Technology Co ltd
Hubei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Zhuoer Information Technology Co ltd, Hubei University of Technology filed Critical Wuhan Zhuoer Information Technology Co ltd
Priority to CN202111123601.8A priority Critical patent/CN113742781B/en
Publication of CN113742781A publication Critical patent/CN113742781A/en
Application granted granted Critical
Publication of CN113742781B publication Critical patent/CN113742781B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of information security, and discloses a K anonymous clustering privacy protection method, a system, computer equipment and a terminal, wherein the K anonymous clustering privacy protection method comprises the following steps: the main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; determining the number of clusters suitable for the data set by using an elbow method; judging whether the data are directly clustered or combined with other data values according to the threshold value a; clustering the data set; and carrying out K anonymization processing on clustered data according to the generalization structure of the quasi-identification attribute. The invention can reduce the dimension of medical data, avoid sinking local optimal value in the clustering process, reduce the information loss rate in the K anonymization process and protect the safety of private data.

Description

K anonymous clustering privacy protection method, system, computer equipment and terminal
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a K anonymous clustering privacy protection method, a K anonymous clustering privacy protection system, computer equipment and a K anonymous clustering privacy protection terminal.
Background
Currently, with the development of medical technology, medical data sharing is becoming more and more common, and the problem of medical data leakage is becoming more serious. The privacy protection problem is an important direction in the field of information security, and how to guarantee the security of information is a key for realizing personal privacy protection.
The early data privacy protection mode mainly sets different authorities in a database, protects privacy safety of individuals according to the different authorities, but certain high-authority individuals exist, and in order to obtain benefits, personal information is sold to other people, so that the personal information is revealed. With the gradual shaping of the concept of privacy protection, people pay more attention to privacy protection, and privacy protection technology is required to improve the protection of private information.
The K Anonymity privacy protection model is used for protecting information in the data release process, is different from privacy protection modes based on access control and the like, performs preprocessing on original data, then releases anonymized data sets, protects personal privacy data, and is applicable to the fields of medical treatment, job hunting and the like, obvious personal information needs to be hidden in the fields, and an information attacker cannot deduce specific personal privacy data according to the released data through link attack, so that the privacy data is effectively protected in the data release process. The traditional K anonymity model improves the strength of privacy protection mostly at the expense of information loss. Therefore, a new K anonymous clustering privacy protection method and system are needed to make up for the problems existing in the prior art.
Through the above analysis, the problems and defects existing in the prior art are as follows: the traditional K anonymity model mostly improves the strength of privacy protection at the expense of the amount of information loss. The data dimension in K anonymity is overlarge, so that the time cost for processing the data is increased, and more data loss is caused by the data in the whole dimension of K anonymity.
The difficulty of solving the problems and the defects is as follows: the dimension of the data set can be effectively reduced, and the information loss of the data in the K anonymization process can be effectively reduced.
The meaning of solving the problems and the defects is as follows: the time cost for processing the data is reduced through data dimension reduction, the information loss in the data K anonymization process is reduced, the originality of the data is more likely to be reserved, and support is provided for the follow-up data analysis work.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention provides a K anonymous clustering privacy protection method and system, computer equipment and a terminal, and particularly relates to a K anonymous clustering privacy protection method and system based on medical data utility.
The invention is realized in such a way that the K anonymous clustering privacy protection method comprises the following steps:
the main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; determining the number of clusters suitable for the data set by using an elbow method; judging whether the data are directly clustered or combined with other data values according to the threshold value a; clustering the data set; and carrying out K anonymization processing on clustered data according to the generalization structure of the quasi-identification attribute.
Further, the K anonymous cluster privacy protection method comprises the following steps:
step one, reducing the dimension of a medical data set T according to a principal component analysis method;
step two, determining the association degree of the quasi identifier and the sensitive attribute by using a gray level association analysis method;
determining the generalization hierarchy of the quasi-identifier attribute according to the association degree of the quasi-identifier and the sensitive attribute;
determining the number of optimal clusters of data according to the selected identifier, the quasi-identifier and the sensitive attribute and the elbow method;
step five, clustering the data set by taking L as the clustering cluster number according to the optimal cluster number L;
step six, K anonymizing the given size of a as a threshold value, listing records in the data set which are matched with the K anonymity into a K hidden name table, and counting T m The number recorded in the table.
Further, in the first step, the dimension reduction of the medical data set T according to the principal component analysis method includes:
(1) The principal component identifiers that may be present are expressed as:
wherein, p identifies the dimension of the attribute in each group of records, c represents the weight of the attribute in each group of records, Z represents the principal component, q represents the number of the principal components which can exist, and each principal component is mutually independent; z is Z 1 ,Z 2 ,…,Z n From different x 1 ,x 2 ,…,x p Quasi-identifiers.
(2) According to the load valueC ij And selecting the principal component with the smallest attribute dimension from the principal component set, selecting the proper QI attribute from the principal component with the smallest attribute dimension, and determining the identifier, the quasi-identifier and the sensitive attribute.
Further, in the second step, the determining the association degree between the quasi identifier and the sensitive attribute by using the gray level association analysis method includes:
(1) The sensitive attribute is taken as a reference sequence and expressed as:
Y=Y(k)|1,2,...,n;
wherein Y is a specific sensitive attribute.
(2) Determining the association degree of the sensitive attribute as comparison data, wherein the comparison data is expressed as:
X i =X i (k)|k=1,2,...,n,i=1,2,...,m;
wherein X is i (k) Represents the kth value in the ith comparison sequence, and m represents the number of QI attributes.
(3) The measurement units of different data are different, and the data are normalized by the following formula:
(4) After normalization processing, the gray scale association coefficient of the quasi identifier attribute and the sensitive attribute is calculated, and the gray scale association coefficient is determined by the following formula:
wherein, |y (k) -x i (k) The i is the distance between the reference sequence and the corresponding kth data in the ith comparison sequence, max represents the maximum distance, and min represents the minimum distance; ρ is called a resolution coefficient, and the value interval of ρ is (0, 1); when ρ is less than or equal to 0.5463, the resolution is higher, taking ρ=0.5.
(5) According to the association coefficient of each moment, determining the association degree, and determining the association degree by using the following formula:
wherein r is i Expressed as a degree of association, the closer the degree of association is to 1, the higher the association of the quasi-identifier attribute with the sensitive attribute, the stronger the association.
In the third step, the higher the association degree is, the stronger the association of the data, the finer the generalization hierarchical structure of the standard identifier, and for the standard identifier with low association degree, the generalization hierarchical structure is relatively fuzzy, so that the standard identifier generalization hierarchical structure can be determined.
Further, in the fourth step, the determining the number of the best clusters of the data according to the elbow method according to the selected identifier, the quasi-identifier and the sensitive attribute includes:
(1) Giving a cluster number range m of the data set T, carrying out local division on the data set according to the given cluster number range m, and calculating the Euclidean distance from the mass center of the cluster to each data point in the cluster from the cluster number of 2:
wherein x is i ,y i Correspondingly calculating data of different dimensions of two data points; and according to the nearest centroid principle, completing cluster division of data points according to the Euclidean distance.
(2) According to the division of the clusters, SSE of each cluster is calculated, the sum of squares of the current number of the clusters and the total error is taken as a coordinate, the coordinate axis is represented, and the SSE is calculated according to the following formula:
wherein C is i Represents the ith cluster, p represents the sample point in Ci, m i Represent C i The average value of all samples in the (a); the optimal cluster number L is determined from the elbow map of the medical dataset T.
In a fifth step, according to the optimal cluster number L, clustering the data set with L as the cluster number, including:
(1) Put all data into a queue { d } as one cluster 1 Mean value clustering of the clusters with the cluster number m=2, calculating SSE of each cluster, and placing the divided clusters into a queue { d } 1 ,d 2 ,d 3 }。
(2) And (3) selecting the minimum SSE from the queue to perform m=2 mean clustering, then placing the divided clusters into the queue, and repeating the step (1) until the number of the clusters is larger than L.
(3) According to the clustering step, the medical data set T is divided into m data sets (T 1 ,T 2 ,…,T m )。
In step six, the given a is K anonymous as a threshold value, records in the dataset which are consistent with the K anonymous are listed in a K-hidden list, and T is counted m The number recorded in the table includes:
(1) Find T m And (3) the standard identifier attribute A with the highest value number and highest association degree rises the generalization level of the standard identifier attribute A by one layer according to the generalization hierarchy structure.
(2) Statistics of the current T m Records conforming to the K anonymity rule and records not conforming to the K anonymity rule.
(3) Will T m Record conforming to K anonymity rule is listed in K-hidden name list, record not conforming to K anonymity rule is repeated in step (1) until T m The number of records in (a) is less than K.
(4) Combining records with the number of records less than the threshold value a after anonymizing the data sets K into a new data table T s K anonymization is performed according to step (1).
Another object of the present invention is to provide a K-anonymous clustering privacy protection system applying the K-anonymous clustering privacy protection method, where the K-anonymous clustering privacy protection system includes:
the data dimension reduction module is used for reducing dimension of the medical data set T according to the principal component analysis method;
the association degree determining module is used for determining the association degree of the quasi identifier and the sensitive attribute by using a gray level association analysis method;
the generalization level determining module is used for determining the generalization level of the quasi-identifier attribute according to the association degree of the quasi-identifier and the sensitive attribute;
an optimal cluster number determining module for determining the number of optimal clusters of data according to the selected identifier, the quasi identifier and the sensitive attribute and according to an elbow method;
the data aggregation module is used for clustering the data set by taking the L as the clustering cluster number according to the optimal cluster number L;
k anonymization module for K anonymizing according to the given a size as threshold, listing record in data set which accords with K anonymization into K hidden list, and counting T m The number recorded in the table.
It is a further object of the present invention to provide a computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:
the main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; determining the number of clusters suitable for the data set by using an elbow method; judging whether the data are directly clustered or combined with other data values according to the threshold value a; clustering the data set; and carrying out K anonymization processing on clustered data according to the generalization structure of the quasi-identification attribute.
The invention further aims to provide an information data processing terminal which is used for realizing the K anonymous clustering privacy protection system.
By combining all the technical schemes, the invention has the advantages and positive effects that: the K anonymous clustering privacy protection method provided by the invention can reduce the dimensionality of medical data, avoid sinking into a local optimal value in the clustering process, and reduce the information loss rate in the K anonymous process. The invention can also effectively reduce the risk of data leakage, reduce homogeneity attack and protect private data.
According to the invention, the dimension reduction of the medical data is completed through principal component analysis, the situation that the local optimal value falls into in the clustering process is avoided, and the data set with the least error square sum is continuously selected for clustering in the binary mean value clustering process through a clustering algorithm, so that the optimal processing of the global data is achieved. According to the method, the information loss rate of summary in the K anonymization process is reduced, the generalized hierarchical structure of the quasi identifier is controlled through the association degree by gray level association analysis, a data set which does not meet the K anonymization threshold is combined with other data sets which do not meet the K anonymization threshold, and then K anonymization is carried out to reduce the information loss rate.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of a K anonymous cluster privacy protection method provided by an embodiment of the invention.
Fig. 2 is a schematic diagram of a K anonymous clustering privacy protection method provided by an embodiment of the invention.
FIG. 3 is a block diagram of a K anonymous cluster privacy protection system provided by an embodiment of the invention;
in the figure: 1. a data dimension reduction module; 2. a relevancy determination module; 3. a generalization hierarchy determination module; 4. an optimal cluster number determining module; 5. a data aggregation class module; 6. k anonymizing module.
Fig. 4 is a flow chart of principal component analysis provided by an embodiment of the present invention.
Fig. 5 is a flowchart of gray scale correlation analysis provided in an embodiment of the present invention.
Fig. 6 is a generalized hierarchical structure diagram provided by an embodiment of the present invention.
FIG. 7 is a flowchart of an elbow method according to an embodiment of the present invention.
Fig. 8 is a flowchart of a clustering method provided by an embodiment of the present invention.
Fig. 9 is a flowchart of K anonymization provided by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
Aiming at the problems existing in the prior art, the invention provides a K anonymous clustering privacy protection method and a K anonymous clustering privacy protection system, and the K anonymous clustering privacy protection method and the K anonymous clustering privacy protection system are described in detail below with reference to the accompanying drawings.
As shown in fig. 1, the K anonymous clustering privacy protection method provided by the embodiment of the invention includes the following steps:
s101, reducing the dimension of a medical data set T according to a principal component analysis method;
s102, determining the association degree of the quasi identifier and the sensitive attribute by using a gray level association analysis method;
s103, determining the generalization hierarchy of the quasi-identifier attribute according to the association degree of the quasi-identifier and the sensitive attribute;
s104, determining the number of optimal clusters of the data according to the elbow method according to the selected identifier, the quasi-identifier and the sensitive attribute;
s105, clustering the data set by taking L as the clustering cluster number according to the optimal cluster number L;
s106, K anonymizing the given size of a as a threshold value, listing records in the data set which are matched with the K anonymity into a K hidden name table, and counting T m The number recorded in the table.
The schematic diagram of the K anonymous clustering privacy protection method provided by the embodiment of the invention is shown in figure 2.
As shown in fig. 3, the K anonymous cluster privacy protection system provided by the embodiment of the invention includes:
the data dimension reduction module 1 is used for reducing the dimension of the medical data set T according to the principal component analysis method;
a correlation determination module 2, configured to determine a correlation between the quasi identifier and the sensitive attribute using a gray correlation analysis method;
a generalization hierarchy determining module 3, configured to determine a generalization hierarchy of the quasi-identifier attribute according to a degree of association between the quasi-identifier and the sensitive attribute;
an optimal cluster number determining module 4 for determining the optimal cluster number of the data according to the selected identifier, the quasi identifier and the sensitive attribute and according to the elbow method;
the data set clustering module 5 is used for clustering the data sets by taking L as the clustering cluster number according to the optimal cluster number L;
a K anonymizing module 6 for K anonymizing according to the given size of a as a threshold value, listing the records in the data set which are in accordance with the K anonymization into a K hidden name table, and counting T m The number recorded in the table.
The technical scheme of the invention is further described below with reference to specific embodiments.
Example 1
The K anonymous clustering algorithm based on the medical data provided by the embodiment of the invention comprises the following steps: (1) The main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; (2) Calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; (3) Determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; (4) Determining the number of clusters suitable for the data set by using an elbow method; (5) Judging whether the data are directly clustered or combined with other data values according to the threshold value a; (6) clustering the data set; (7) K anonymization processing is carried out on clustered data according to the generalization structure of the standard identification attribute, so that the risk of data leakage can be effectively reduced after the processing, homogeneity attack is reduced, and private data is protected.
The method comprises the following steps: the K anonymous clustering privacy protection method based on medical data utility provided by the embodiment of the invention comprises the following steps:
step 1: and (5) performing dimension reduction on the medical data set T according to the principal component analysis method.
Step 1.1: the principal component identifiers that may be present are expressed as:
wherein p identifies the dimension of the attribute in each set of records, c represents the weight of the attribute in each set of records, Z represents the principal component, q represents the number of principal components that may be present, and each principal component is independent of the other. Z is Z 1 ,Z 2 ,…,Z n From different x 1 ,x 2 ,…,x p Quasi-identifiers.
Step 1.2: according to the load value C ij And selecting the principal component with the smallest attribute dimension from the principal component set, selecting the proper QI attribute from the principal component with the smallest attribute dimension, and determining the identifier, the quasi-identifier and the sensitive attribute.
Step 2: a gray scale correlation analysis method is used to determine the degree of correlation of the quasi-identifier to the sensitive attribute.
Step 2.1: taking the sensitive attribute as a reference sequence, y=y (k) |1, 2.
Step 2.2: the correlation degree with the sensitive attribute needs to be determined as comparison data, and the specific expression is as follows: x is X i =X i (k)|k=1,2,...,n,i=1,2,...,m,X i (k) Represents the kth value in the ith comparison sequence, and m represents the number of QI attributes.
Step 2.3: the measurement units of different data are different, and the data are normalized by the following formula:
step 2.4: after normalization processing, the gray scale association coefficient of the quasi identifier attribute and the sensitive attribute is calculated, and the gray scale association coefficient is determined by the following formula:
wherein, |y (k) -x i (k) I is the distance between the reference sequence and the corresponding kth data in the ith comparison sequence, max represents the maximum distance, and min represents the minimum distance. ρ is called a resolution coefficient, and the value interval of ρ is (0, 1) in general, and ρ=0.5 is usually taken when ρ is smaller than or equal to 0.5463 and the resolution is high.
Step 2.5: according to the association coefficient of each moment, determining the association degree, and determining the association degree by using the following formula:
wherein r is i Expressed as a degree of association, the closer the degree of association is to 1, the higher the association of the quasi-identifier attribute with the sensitive attribute, the stronger the association.
Step 3: the generalization hierarchy of the quasi-identifier attribute is determined according to the association degree of the quasi-identifier and the sensitive attribute, the higher the association degree is, the stronger the association of data is, the more detailed the generalization hierarchy of the quasi-identifier should be, and for the quasi-identifier with low association degree, the generalization hierarchy is relatively blurred, so that the generalization hierarchy of the quasi-identifier is determined.
Step 4: based on the selected identifier, the quasi-identifier, the sensitive property, the number of best clusters of data is determined based on the elbow method.
Step 4.1: giving a cluster number range m of the data set T, carrying out local division on the data set according to the given cluster number range m, calculating the Euclidean distance from the mass center of the cluster to each data point in the cluster according to the following formula from the cluster number to be 2:
wherein x is i ,y i Data of different dimensions of the two data points are correspondingly calculated. And according to the nearest centroid principle, completing cluster division of data points according to the Euclidean distance.
Step 4.2: according to the division of the clusters, SSE (Sum of Squared Error) of each cluster is calculated, the sum of squares of the current number of clusters and the total error is taken as a coordinate, the sum of squares is expressed in the coordinate axis, and an SSE calculation formula is as follows:
wherein C is i Represents the ith cluster, p represents the sample point in Ci, m i Represent C i Is the average of all samples in the sample. The optimal cluster number L is determined from the elbow map of the medical dataset T as shown in fig. 2.
Step 5: and clustering the data set by taking L as the clustering cluster number according to the optimal cluster number L.
Step 5.1: put all data into a queue { d } as one cluster 1 Mean value clustering of the cluster number m=2 is carried out on the cluster, SSE of each cluster is calculated, and the divided clusters are put into a queue { d } 1 ,d 2 ,d 3 }。
Step 5.2: selecting the minimum SSE from the queue to perform m=2 mean clustering, then placing the divided clusters into the queue, and repeating the steps until the number of clusters is greater than L.
Step 5.3: according to the above-mentioned clustering step, the division of the medical data set T into m data sets (T 1 ,T 2 ,…,T m )。
Step 6: the given a size is used as a threshold value to carry out K anonymity, records which are in the data set and are in accordance with the K anonymity are listed in a K-hidden name table, and T is counted m The number recorded in the table.
Step 6.1: find T m And (3) the standard identifier attribute A with the highest value number and highest association degree rises the generalization level of the standard identifier attribute A by one layer according to the generalization hierarchy structure.
Step 6.2: statistics of the current T m Records conforming to the K anonymity rule and records not conforming to the K anonymity rule.
Step 6.3: will T m Record conforming to K anonymity rule is listed in K-hidden name list, record not conforming to K anonymity rule is repeated in step 6.1 until T m The number of records in (a) is less than K.
Step 6.4: combining records with the number of records less than the threshold value a after anonymizing the data sets K into a new data table T s K anonymization is performed according to step 6.1.
Example 2
The K anonymous clustering privacy protection method based on medical data utility provided by the embodiment of the invention comprises the following steps:
as shown in FIG. 2, the implementation includes principal component analysis, gray scale correlation analysis, generalization, elbow method, clustering and K anonymization modules. The method comprises the following steps:
step 1: the principal component analysis is performed on the medical data, as shown in fig. 4, firstly, the medical data is subjected to the averaging process, covariance is calculated, eigenvalues and eigenvectors of a covariance matrix are calculated, the eigenvalues are ordered from large to small, the largest K eigenvectors are reserved, the data is converted into a new space constructed by the K eigenvectors, finally, the dimension reduction of the data is completed, and an identifier, a quasi identifier and a sensitive attribute are selected according to the load quantity.
Identifier attribute: generally refers to data that can directly identify an individual, such as name, phone number, identification number, etc., and for an identifier attribute, the attribute is deleted from the data table directly prior to the release of the data.
Quasi identifier: the minimum set of attributes for an individual may be linked to an external table and the attributes co-existing with the external table at the release data table, such as a zip code. Birthday, gender, etc., the specific personal information can be identified by link attack by combining these attribute sets with an external data table.
Sensitive properties: other user-aware attributes such as disease information, purchasing preferences, salaries, etc. are not desired at the time of data distribution, and information that needs to be protected before distribution.
Step 2: and (5) carrying out gray scale correlation analysis on the data set T obtained in the step (1), as shown in fig. 5.
Step 2.1: determining a reference sequence y=y (k) |1,2, a..n, the reference sequence Y corresponding to the sensitive attribute, and comparing the sequences X i =X i (k) I k=1, 2, n, i=1, 2, m, comparative series X i Corresponding to the attribute that needs to be determined to be associated with the sensitive attribute. X is X i (k) Represents the kth value in the ith comparison sequence, and m represents the number of QI attributes.
Step 2.2: the measurement units of different data are different, and the data are normalized by the following formula:
step 2.4: after normalization processing, the gray scale association coefficient of the quasi identifier attribute and the sensitive attribute is calculated, and the gray scale association coefficient is determined by the following formula:
wherein, |y (k) -x i (k) I is the distance between the reference sequence and the corresponding kth data in the ith comparison sequence, max represents the maximum distance, and min represents the minimum distance. ρ is called a resolution coefficient, and the value interval of ρ is (0, 1) in general, and ρ=0.5 is usually taken when ρ is smaller than or equal to 0.5463 and the resolution is high.
Step 2.5: according to the association coefficient of each moment, determining the association degree, and determining the association degree by using the following formula:
wherein r is i Expressed as a degree of association, the closer the degree of association is to 1, the higher the association of the quasi-identifier attribute with the sensitive attribute, the stronger the association.
Step 3: the generalization hierarchy of the quasi-identifier attribute is determined according to the association degree of the quasi-identifier and the sensitive attribute, the higher the association degree is, the stronger the association of data is, the more detailed the generalization hierarchy of the quasi-identifier should be, and for the quasi-identifier with low association degree, the generalization hierarchy is relatively blurred, so that the generalization hierarchy of the quasi-identifier is determined.
As shown in FIG. 6, when the association degree of the attribute and the sensitive attribute is lower, the generalized hierarchy is smaller, such as the generalized hierarchy on the left side of FIG. 6, and when the association degree of the attribute and the sensitive attribute is higher, the generalized hierarchy is finer, such as the generalized hierarchy on the right side of FIG. 6. When the generalized hierarchical structure is finer, the loss rate of information in the anonymization process is lower, and the originality of data is more protected.
Step 4: the optimal cluster number is determined for the medical dataset T using an elbow method, as shown in fig. 7.
Step 4.1: giving a cluster number range m of the data set T, carrying out local division on the data set according to the given cluster number range m, calculating the Euclidean distance from the mass center of the cluster to each data point in the cluster according to the following formula from the cluster number to be 2:
wherein x is i ,y i Data of different dimensions of the two data points are correspondingly calculated. And according to the nearest centroid principle, completing cluster division of data points according to the Euclidean distance.
Step 4.2: according to the division of the clusters, SSE (Sum of Squared Error) of each cluster is calculated, the sum of squares of the current number of clusters and the total error is taken as a coordinate, the sum of squares is expressed in the coordinate axis, and an SSE calculation formula is as follows:
wherein C is i Represents the ith cluster, p represents C i Sample points m in (1) i Represent C i Is the average of all samples in the sample. The optimal cluster number L is determined from the elbow map of the medical dataset T as shown in fig. 2.
Step 5: and clustering the data set according to the optimal cluster number L, wherein L is used as the cluster number, as shown in fig. 8.
Step 5.1: placing all data as one cluster into a queue d= { D 1 Mean value clustering of the cluster number m=2 is carried out on the cluster, SSE of each cluster is calculated, and the divided clusters are put into a queue D= { D 1 ,d 2 ,d 3 }。
Step 5.2: and selecting the minimum SSE from the queue to perform m=2 mean clustering, then placing the divided clusters into the queue, and taking the optimal cluster number as a clustering threshold until the threshold is met.
Step 5.3: according to the above-mentioned clustering step, the division of the medical data set T into m data sets (T 1 ,T 2 ,…,T m )。
Step 6: the data is K anonymized as shown in FIG. 9.
Step 6.1: given the size of a, K anonymization is performed as a threshold. Listing each subset of data set which has been matched with K anonymity into a K-prime list, and counting each T m The number recorded in the table.
Step 6.2: find each T m And (3) the standard identifier attribute A with the highest value number and highest association degree rises the generalization hierarchy of the standard identifier attribute A by one layer from the bottom according to the generalization hierarchy structure.
Step 6.3: statistics of the current T m Records conforming to the K anonymity rule and records not conforming to the K anonymity rule.
Step 6.4: will T m Record conforming to K anonymity rule is listed in K-hidden name list, record not conforming to K anonymity rule is repeated in step 6.2 until T m The number of records in (a) is less than K.
Step 6.5: combining records with the number of records less than the threshold value a after anonymizing the data sets K into a new data table T s K anonymization is performed according to step 6.2.
The technical scheme of the invention is further described below in connection with specific experimental data.
The 14 attributes in the raw dataset include age, sex, chest pain type, resting blood pressure, plasma steroid content, fasting blood glucose, resting electrocardiographic results, highest heart rate, exercise-induced angina, exercise-induced ST decline value, slope of electrocardiographic ST at maximum exercise, number of main vessels measured using fluorescence, THAL (thalassemia), and whether heart disease is present. As in table 1.
TABLE 1
After the principal component analysis is carried out on 13 attributes of the original data set, according to the size of the correlation coefficient, the principal component with the smallest dimension is selected, 5 attributes of gender, plasma steroid quantity, resting electrocardiogram result, highest heart rate and exercise type angina are used as standard identifiers, 1 attribute of whether heart disease is suffered or not is used as a sensitive attribute, and the dimension of the original 13 standard identification attributes is reduced to 5 standard identification attributes. See Table 2
TABLE 2
After determining these 5 quasi-identifier attributes, gray scale correlation analysis is used to determine the degree of correlation of quasi-identifiers to sensitive attributes. According to the standard identifier attribute with higher association degree shown in fig. 6, the generalized hierarchical structure is divided more finely, and the standard identifier attribute with low association degree has a blurred generalized hierarchical structure.
In the description of the present invention, unless otherwise indicated, the meaning of "a plurality" is two or more; the terms "upper," "lower," "left," "right," "inner," "outer," "front," "rear," "head," "tail," and the like are used as an orientation or positional relationship based on that shown in the drawings, merely to facilitate description of the invention and to simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and therefore should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When used in whole or in part, is implemented in the form of a computer program product comprising one or more computer instructions. When loaded or executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL), or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.
The foregoing is merely illustrative of specific embodiments of the present invention, and the scope of the invention is not limited thereto, but any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention will be apparent to those skilled in the art within the scope of the present invention.

Claims (8)

1. The K anonymous clustering privacy protection method is characterized by comprising the following steps of:
the main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; determining the number of clusters suitable for the data set by using an elbow method; judging whether the data are directly clustered or combined with other data values according to the threshold value a; clustering the data set; k anonymizing the clustered data according to the generalization structure of the quasi-identification attribute;
the K anonymous clustering privacy protection method comprises the following steps of:
step one, reducing the dimension of a medical data set T according to a principal component analysis method;
step two, determining the association degree of the quasi identifier and the sensitive attribute by using a gray level association analysis method;
determining the generalization hierarchy of the quasi-identifier attribute according to the association degree of the quasi-identifier and the sensitive attribute;
determining the number of optimal clusters of data according to the selected identifier, the quasi-identifier and the sensitive attribute and the elbow method;
step five, clustering the data set by taking L as the clustering cluster number according to the optimal cluster number L;
step six, K anonymizing is carried out according to the size of a as a threshold value, records which are in accordance with the K anonymity in the data set are listed in a K-hidden name table, and the number of the records in the K-anonymity table is counted;
in the sixth step, according to the size of a, performing K anonymity as a threshold, and listing records in the data set which are in accordance with K anonymity into a K-hidden name table, and counting the number of records in the K anonymity table, including:
(1) Finding the standard identifier attribute A with the highest value number and highest association degree in the K hidden name table, and rising the generalization level of the standard identifier attribute A by one layer according to the generalization hierarchical structure;
(2) Counting records of the current K anonymity table conforming to the K anonymity rule and records not conforming to the K anonymity rule;
(3) Listing records conforming to the K anonymity rule in the K anonymity list into the K anonymity list, and repeating the step (1) on records not conforming to the K anonymity rule until the number of records in the K anonymity list is smaller than K;
(4) Combining records with the number of records less than the threshold value a after anonymizing the data sets K into a new data table T s K anonymization is performed according to step (1).
2. The K-anonymous clustering privacy protection method as set forth in claim 1, wherein in the first step, the dimension reduction of the medical data set T according to the principal component analysis method includes:
(1) The principal component identifiers that may be present are expressed as:
wherein, p identifies the dimension of the attribute in each group of records, c represents the weight of the attribute in each group of records, Z represents the principal component, q represents the number of the principal components which can exist, and each principal component is mutually independent; z is Z 1 ,Z 2 ,…,Z q From different x 1 ,x 2 ,…,x p Quasi-identifier composition;
(2) According to the load value C ij And selecting the principal component with the smallest attribute dimension from the principal component set, selecting the proper QI attribute from the principal component with the smallest attribute dimension, and determining the identifier, the quasi-identifier and the sensitive attribute.
3. The K-anonymous cluster privacy protection method as set forth in claim 1, wherein in the second step, the determining the association degree of the quasi-identifier and the sensitive attribute using the gray-scale association analysis method comprises:
(1) The sensitive attribute is taken as a reference sequence and expressed as:
Y=Y(k)|k=1,2,....,n
wherein Y is a specific sensitive attribute;
(2) Determining the association degree of the sensitive attribute as comparison data, wherein the comparison data is expressed as:
X i =X i (k)|k=1,2,...,n,i=1,2,...,m;
wherein X is i (k) Represents the kth value in the ith comparison sequence, m represents the number of QI attributes;
(3) The measurement units of different data are different, and the data are normalized by the following formula:
(4) After normalization processing, the gray scale association coefficient of the quasi identifier attribute and the sensitive attribute is calculated, and the gray scale association coefficient is determined by the following formula:
wherein, |y (k) -x i (k) The i is the distance between the reference sequence and the corresponding kth data in the ith comparison sequence, max represents the maximum distance, and min represents the minimum distance; ρ is called a resolution coefficient, and the value interval of ρ is (0, 1); when ρ is less than or equal to 0.5463, the resolution is higher, and ρ=0.5 is taken;
(5) According to the association coefficient of each moment, determining the association degree, and determining the association degree by using the following formula:
wherein r is i Expressed as a degree of association, when the degree of association is closer to 1, the higher the association of the quasi identifier attribute and the sensitive attribute is, the stronger the association is;
in the third step, the higher the association degree is, the stronger the association of the data, the finer the generalization hierarchical structure of the standard identifier, and for the standard identifier with low association degree, the generalization hierarchical structure is relatively fuzzy, so that the standard identifier generalization hierarchical structure can be determined.
4. The K-anonymous cluster privacy protection method as set forth in claim 1, wherein in the fourth step, the determining the number of best clusters of data according to the elbow method based on the selected identifier, the quasi identifier and the sensitive attribute comprises:
(1) Giving a cluster number range m of the data set T, carrying out local division on the data set according to the given cluster number range m, and calculating the Euclidean distance from the mass center of the cluster to each data point in the cluster from the cluster number of 2:
wherein x is i ,y i Correspondingly calculating data of different dimensions of two data points; according to the nearest centroid principle, completing cluster division of data points according to Euclidean distance;
(2) According to the division of the clusters, SSE of each cluster is calculated, the sum of squares of the current number of the clusters and the total error is taken as a coordinate, the coordinate axis is represented, and the SSE is calculated according to the following formula:
wherein C is i Represents the ith cluster, p represents the sample point in Ci, m i Represent C i The average value of all samples in the (a); the optimal cluster number L is determined from the elbow map of the medical dataset T.
5. The K anonymous clustering privacy protection method as set forth in claim 1, wherein in the fifth step, clustering the data set with L as the number of clusters according to the optimal number of clusters L, comprises:
(1) All data is regarded as one clusterPut into queue { d } 1 Mean value clustering of the clusters with the cluster number m=2, calculating SSE of each cluster, and placing the divided clusters into a queue { d } 1 ,d 2 ,d 3 };
(2) Selecting the minimum SSE from the queue to perform m=2 mean clustering, then placing the divided clusters into the queue, and repeating the step (1) until the number of the clusters is larger than L;
(3) According to the clustering step, the medical data set T is divided into m data sets (T 1 ,T 2 ,…,T m )。
6. A K-anonymous cluster privacy protection system applying the K-anonymous cluster privacy protection method as defined in any one of claims 1 to 5, characterized in that the K-anonymous cluster privacy protection system comprises:
the data dimension reduction module is used for reducing dimension of the medical data set T according to the principal component analysis method;
the association degree determining module is used for determining the association degree of the quasi identifier and the sensitive attribute by using a gray level association analysis method;
the generalization level determining module is used for determining the generalization level of the quasi-identifier attribute according to the association degree of the quasi-identifier and the sensitive attribute;
an optimal cluster number determining module for determining the number of optimal clusters of data according to the selected identifier, the quasi identifier and the sensitive attribute and according to an elbow method;
the data aggregation module is used for clustering the data set by taking the L as the clustering cluster number according to the optimal cluster number L;
k anonymization module for K anonymizing according to the given a size as threshold, listing record in data set which accords with K anonymization into K hidden list, and counting T m The number recorded in the table.
7. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the K-anonymous cluster privacy protection method of any of claims 1-5, comprising the steps of:
the main component analysis method is used for completing the dimension reduction of the data and determining sensitive attribute, quasi-identifier attribute and identifier attribute; calculating the association degree of the sensitive attribute and the quasi-identifier attribute by using a gray level association analysis method on the data subjected to dimension reduction; determining the generalized hierarchical structure of the quasi identifier according to the association degree of the sensitive attribute and the quasi identifier; determining the number of clusters suitable for the data set by using an elbow method; judging whether the data are directly clustered or combined with other data values according to the threshold value a; clustering the data set; and carrying out K anonymization processing on clustered data according to the generalization structure of the quasi-identification attribute.
8. An information data processing terminal, wherein the information data processing terminal is configured to implement the K anonymous cluster privacy protection system as set forth in claim 6.
CN202111123601.8A 2021-09-24 2021-09-24 K anonymous clustering privacy protection method, system, computer equipment and terminal Active CN113742781B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111123601.8A CN113742781B (en) 2021-09-24 2021-09-24 K anonymous clustering privacy protection method, system, computer equipment and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111123601.8A CN113742781B (en) 2021-09-24 2021-09-24 K anonymous clustering privacy protection method, system, computer equipment and terminal

Publications (2)

Publication Number Publication Date
CN113742781A CN113742781A (en) 2021-12-03
CN113742781B true CN113742781B (en) 2024-04-05

Family

ID=78740824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111123601.8A Active CN113742781B (en) 2021-09-24 2021-09-24 K anonymous clustering privacy protection method, system, computer equipment and terminal

Country Status (1)

Country Link
CN (1) CN113742781B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117196996B (en) * 2023-10-17 2024-06-04 山东鸿业信息科技有限公司 Interface-free interaction management method and system for data resources

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964034A (en) * 2010-09-30 2011-02-02 浙江大学 Privacy protection method for mode information loss minimized sequence data
WO2013072930A2 (en) * 2011-09-28 2013-05-23 Tata Consultancy Services Limited System and method for database privacy protection
CN105512566A (en) * 2015-11-27 2016-04-20 电子科技大学 Health data privacy protection method based on K-anonymity
CN106021541A (en) * 2016-05-26 2016-10-12 徐州医科大学 Secondary k-anonymity privacy protection algorithm for differentiating quasi-identifier attributes
CN108363928A (en) * 2018-02-08 2018-08-03 广西师范大学 The adaptive differential method for secret protection being associated in medical data
CN110555316A (en) * 2019-08-15 2019-12-10 石家庄铁道大学 privacy protection table data sharing algorithm based on cluster anonymity
CN110598447A (en) * 2019-09-17 2019-12-20 西北大学 T-close privacy protection method meeting epsilon-difference privacy
CN110968889A (en) * 2018-09-30 2020-04-07 中兴通讯股份有限公司 Data protection method, equipment, device and computer storage medium
CN111079179A (en) * 2019-12-16 2020-04-28 北京天融信网络安全技术有限公司 Data processing method and device, electronic equipment and readable storage medium
CN111859441A (en) * 2019-04-30 2020-10-30 郑州大学 Anonymous method and storage medium for missing data
CN112131606A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Dynamic data difference privacy histogram publishing method based on K-means + + combined elbow method autonomous clustering technology
CN113051619A (en) * 2021-04-30 2021-06-29 河南科技大学 K-anonymity-based traditional Chinese medicine prescription data privacy protection method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7475085B2 (en) * 2006-04-04 2009-01-06 International Business Machines Corporation Method and apparatus for privacy preserving data mining by restricting attribute choice
US9135320B2 (en) * 2012-06-13 2015-09-15 Opera Solutions, Llc System and method for data anonymization using hierarchical data clustering and perturbation

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964034A (en) * 2010-09-30 2011-02-02 浙江大学 Privacy protection method for mode information loss minimized sequence data
WO2013072930A2 (en) * 2011-09-28 2013-05-23 Tata Consultancy Services Limited System and method for database privacy protection
CN105512566A (en) * 2015-11-27 2016-04-20 电子科技大学 Health data privacy protection method based on K-anonymity
CN106021541A (en) * 2016-05-26 2016-10-12 徐州医科大学 Secondary k-anonymity privacy protection algorithm for differentiating quasi-identifier attributes
CN108363928A (en) * 2018-02-08 2018-08-03 广西师范大学 The adaptive differential method for secret protection being associated in medical data
CN110968889A (en) * 2018-09-30 2020-04-07 中兴通讯股份有限公司 Data protection method, equipment, device and computer storage medium
CN111859441A (en) * 2019-04-30 2020-10-30 郑州大学 Anonymous method and storage medium for missing data
CN110555316A (en) * 2019-08-15 2019-12-10 石家庄铁道大学 privacy protection table data sharing algorithm based on cluster anonymity
CN110598447A (en) * 2019-09-17 2019-12-20 西北大学 T-close privacy protection method meeting epsilon-difference privacy
CN111079179A (en) * 2019-12-16 2020-04-28 北京天融信网络安全技术有限公司 Data processing method and device, electronic equipment and readable storage medium
CN112131606A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 Dynamic data difference privacy histogram publishing method based on K-means + + combined elbow method autonomous clustering technology
CN113051619A (en) * 2021-04-30 2021-06-29 河南科技大学 K-anonymity-based traditional Chinese medicine prescription data privacy protection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Protecting privacy and security of genomic data in i2b2with homomorphic encryption and differential privacy;RAISARO J L et al;《IEEE/ ACM Transactions on Computational Biology and Bioinformatics》;第15卷(第05期);全文 *
面向大数据的多维粒矩阵关联分析及应用;吴珺等;《计算机科学》;第44卷(第S2期);全文 *

Also Published As

Publication number Publication date
CN113742781A (en) 2021-12-03

Similar Documents

Publication Publication Date Title
CN104732154A (en) Method And System For Anonymizing Data
TW201926105A (en) Abnormal data access identification method and apparatus
CN111785384A (en) Abnormal data identification method based on artificial intelligence and related equipment
US20130018921A1 (en) Need-to-know information access using quantified risk
US20140101172A1 (en) Configurable Dynamic Matching System
CN113742781B (en) K anonymous clustering privacy protection method, system, computer equipment and terminal
CN110991530A (en) Missing data processing method and device, electronic equipment and storage medium
CN112632612B (en) Medical data publishing anonymization method
CN111079179A (en) Data processing method and device, electronic equipment and readable storage medium
CN113111063A (en) Medical patient main index discovery method applied to multiple data sources
CN114861224B (en) Medical data system based on risk and UCON access control model
CN116186757A (en) Method for publishing condition feature selection differential privacy data with enhanced utility
CN111859441A (en) Anonymous method and storage medium for missing data
CN110957046B (en) Medical health case knowledge matching method and system
Sei et al. Re-identification in differentially private incomplete datasets
CN107194278B (en) A kind of data generaliza-tion method based on Skyline
CN114219667A (en) Medical data processing method, device, equipment and medium for insurance service
US11314897B2 (en) Data identification method, apparatus, device, and readable medium
CN115762704A (en) Prescription auditing method, device, equipment and storage medium
CN115221555A (en) Health medical big data privacy protection method based on risk adaptive access control
CN111241581B (en) Multi-sensitive attribute privacy protection method and system based on sensitivity layering
Vadrevu et al. A hybrid approach for personal differential privacy preservation in homogeneous and heterogeneous health data sharing
Li et al. Differential privacy algorithm based on personalized anonymity
Loukides et al. Privacy-preserving publication of diagnosis codes for effective biomedical analysis
CN110175220A (en) A kind of file similarity measure method and system based on the distribution of keyword positional structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant