CN107871087A - The personalized difference method for secret protection that high dimensional data is issued under distributed environment - Google Patents

The personalized difference method for secret protection that high dimensional data is issued under distributed environment Download PDF

Info

Publication number
CN107871087A
CN107871087A CN201711092850.9A CN201711092850A CN107871087A CN 107871087 A CN107871087 A CN 107871087A CN 201711092850 A CN201711092850 A CN 201711092850A CN 107871087 A CN107871087 A CN 107871087A
Authority
CN
China
Prior art keywords
attribute
data
bayesian network
sensitive
mutual information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711092850.9A
Other languages
Chinese (zh)
Other versions
CN107871087B (en
Inventor
李先贤
赵华兴
王利娥
刘鹏
于东然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN201711092850.9A priority Critical patent/CN107871087B/en
Publication of CN107871087A publication Critical patent/CN107871087A/en
Application granted granted Critical
Publication of CN107871087B publication Critical patent/CN107871087B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Storage Device Security (AREA)

Abstract

The present invention disclose a kind of personalized difference method for secret protection of the issue of high dimensional data under distributed environment, and by the correlation between mutual information quantified property, the mutual information of respective attributes pair is calculated using mutual information formula.Approximate k degree Bayesian network is built according to mutual information, Bayesian network can be good at embodying the dependence between attribute and attribute.According to the personalized distribution privacy budget of the quantity of Sensitive Attributes and the non-quick attribute for meeting condition.According to the privacy budget distributed, carry out plus make an uproar to data each participant processing, is carried out plus is made an uproar using random response mechanism.Adding the data after making an uproar to be sent to manager, integrated data collection, then externally issue are synthesized by manager's aggregation.The present invention ensure that privacy requirement when issuing data, and the data of processing are largely reduced, and so as to which the change to data reduces, improve the utility of data, is advantageous to data analysis person and carries out correlation analysis.

Description

The personalized difference method for secret protection that high dimensional data is issued under distributed environment
Technical field
The present invention relates to network data security technical field, and in particular to high dimensional data is issued under a kind of distributed environment Personalized difference method for secret protection.
Background technology
In recent years, as the fast development of internet and information technology, secret protection data publication are of considerable interest, People for protect individual privacy, realize data sharing idea understand it is more and more deep.Many specific data owners The initial data (such as user data of the medical data of certain hospital, certain social network sites etc.) for needing to be possessed oneself is issued out Go, so that other mechanisms are researched and analysed or for other purposes.It may be included in the initial data to be issued substantial amounts of Sensitive individual privacy information (such as wages, disease condition, individual deposit etc.), directly issue may cause privacy information Leakage, therefore data owner needs initial data carrying out secret protection and then issued again.These secret protection technologies are substantially 3 classes can be divided into:1) distribution technology based on restrictive condition:According to the characteristic of initial data, selectively issue is local special Property data.Guard method or model have:K- anonymities method for secret protection, l- diversity indexes, t- approximate models, m- consistency Deng.2) distribution technology based on data distortion:Make private data distortion but keep some characteristics of legacy data.Guard method Or model has:Randomized technique, block techniques, switching technology, condensation technique etc..3) distribution technology based on data encryption:It is The method of sensitive data is hidden in data mining process using encryption technology.Data encryption technology now development very into Ripe, guard method or model have:Multi-party computations (SMC), des encryption technology, rsa encryption technique etc..
Difference privacy (differential privacy) is that Dwork carries in the privacy leakage problem for staqtistical data base A kind of new privacy definition gone out.It is the distribution technology based on data distortion, and the technology using addition noise makes data distortion But keep some data or data attribute constant simultaneously.Difference secret protection technology can ensure, when the addition in data set or Person deletes a data record and does not interfere with inquiry output result, therefore even if in the worst case, mesh known to attacker All sensitive datas outside mark record, still can ensure that target record will not be compromised.The protection technique is hidden unlike tradition Private protection technique is the same, and it provides very strong theoretical guarantee on the privacy concern of data publication, i.e., not to the back of the body of attacker Scape knowledge does any ad hoc hypothesis.The technology is not only that the issue of private data provides higher safety guarantee, and in reality It is widely used in trampling.But existing difference secret protection technology can not handle high dimensional data under distributed environment well RELEASE PROBLEM, particularly when including substantial amounts of attribute field in the data of issue, existing technology is injected greatly into data The noise information of amount, it is possible to cause the data of issue to lose due effectiveness.
The content of the invention
To be solved by this invention is the deficiency present in existing privacy dissemination method, there is provided high under a kind of distributed environment The personalized difference method for secret protection of dimension data issue.
To solve the above problems, the present invention is achieved by the following technical solutions:
The personalized difference method for secret protection that high dimensional data is issued under distributed environment, including step are as follows:
The each participant of step 1. by local data carry out plus make an uproar processing after be sent to manager;
Step 2. manager collects the data that each participant is sent, and the attribute of all participants is integrated, and goes Except the attribute repeated, property set X is formed;
Step 3. participant cooperates with manager carries out sensitivity mark to the attribute in property set X, by property set X Attribute be divided into Sensitive Attributes and the class of non-sensitive attribute two;
The attribute pair that each attribute is formed with other all properties in addition to itself in step 4. computation attribute collection X Mutual information;
Step 5. initializes Bayesian networkAnd set
An attribute x is arbitrarily chosen in step 6. dependence collection X0And its parent attribute is set to beAttribute to (x0,) It is added in Bayesian network N, attribute x0It is added in set V;
Step 7. is chosen in the remaining attribute in property set X and chooses an attribute as current attribute, and judges whether same When meet following two conditions:
1. current attribute belongs to property set X, but is not belonging to set V;
2. the number of all subset elements is min (k, | V |) in set V, wherein | V | the number of attribute in set V is represented, K represents that the maximum of any attribute in the Bayesian network of setting enters angle value, min represent k and | V | in smaller;
If it is, go to step 8;Otherwise, current Bayesian network N is returned, and goes to step 10;
Step 8. selects the maximum attribute pair of mutual information from the attribute centering of all about current attribute, and the maximum Another attribute of attribute centering is added in Bayesian network N;
Step 9. judge current Bayesian network N any attribute node enter angle value whether be respectively less than be equal to k:If so, Current Bayesian network N is then returned, and goes to step 10;Otherwise, return to step 7, until all properties addition property set X Into Bayesian network N;
The attribute pair of step 10. all Sensitive Attributes and non-sensitive attribute composition from the current Bayesian network N of return In, the attribute pair that mutual information is more than given threshold θ is selected, and the attribute selected by statistics is to the quantity of the Sensitive Attributes included N and non-sensitive attribute quantity m;
Step 11. calculates each respectively according to the quantity n of the Sensitive Attributes counted and the quantity m of non-sensitive attribute The privacy budget ε ' of participant:
Wherein, ε represents given total privacy budget, and α represents the weights of given Sensitive Attributes, and n ' represents the participant The number of middle Sensitive Attributes, m ' represent the number of non-sensitive attribute in the participant;
For step 12. when user is inquired about the local data of some participant, manager is first with the participant Privacy budget ε ' to Query Result carry out difference privacy add make an uproar after be then forwarded to user.
In above-mentioned steps 4, attribute is to (xi, ∏j) mutual phase information MI (xi, ∏j) calculation formula is:
Wherein, Pr [xi, Πj] represent attribute to (xi, Πj) joint probability distribution, Pr [xi] represent attribute xiEdge Probability distribution, Pr [Πj] represent attribute ∏jMarginal probability distribution;
In above-mentioned steps 4, it is also necessary to according to mutual information size to all properties to being ranked up.
In above-mentioned steps 4, by all properties to carrying out descending sort according to mutual information, i.e. the big attribute of mutual information is to coming Before, after the small attribute of mutual information is to coming.
In above-mentioned steps 5, constructed k- degree Bayesian network is directed acyclic graph.
Consider that existing difference privacy methods use same privacy budget to all participants, the standard of being uniformly processed can not What is avoided causes workload to maximize.Also privacy budget allocation when do not account for personalization, the budget of mean allocation privacy certainly will Cause some local data sets protections weaker, and the problem of some local data sets protections are too strong.The present invention passes through Correlation between mutual information quantified property, the mutual information of respective attributes pair is calculated using mutual information formula.According to mutual information structure Approximate k- degree Bayesian network is built, Bayesian network can be good at embodying the dependence between attribute and attribute.According to The personalized distribution privacy budget of quantity of Sensitive Attributes and the non-quick attribute for the condition that meets.Each participant is according to the individual character distributed Change privacy budget, carry out plus make an uproar to data processing, is carried out plus is made an uproar using random response mechanism.When user is to the sheet of some participant When ground data are inquired about, manager is carried out after difference privacy adds and make an uproar first with the privacy budget ε ' of the participant to Query Result It is then forwarded to user.The present invention ensure that privacy requirement when issuing data, and the data of processing are largely reduced, so as to data Change reduce, improve the utilities of data, be advantageous to data analysis person and carry out correlation analysis.
Brief description of the drawings
Fig. 1 is the flow chart of the preferred embodiment of the present invention.
Fig. 2 is the 1- degree Bayesian networks N of 8 attribute nodes in the preferred embodiment of the present invention.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with instantiation, and with reference to attached Figure, the present invention is described in more detail.
The present invention considers that data attribute has juxtaposition situation in distributed environment, according to the different sensitivitys of attribute Relevance between and is distributed to realize that privacy budget is personalized, to solve the problems of prior art.Base of the present invention It is credible in half it is assumed that each local data base is carried out plus made an uproar so that it meets ε according to the privacy budget of distributioni- differential privacy(εiFor the privacy budget of distribution), after be sent to data manager and polymerize, data aggregate Meet total ε-differential privacy afterwards (whereinK participant), while preferably keep original The utility of data, be advantageous to the purpose of data analysis person carries out data analysis.
The system model of the present invention includes three roles:What k participant (i.e. local data owner), one and half trusted Data manager and user's (data analysis).Each participant PkIt is owned by a local data base Dk.Manager is half credible , assist k participant to issue integrated data collection D, the purpose of data of issue are directed to different data analyses for user.
Assuming that total property set:X={ x1,x2,x3,x4,x5,x6,x7,x8, wherein xi(i=1,2 .., 8) represents different Attribute.When some attribute x in the local data of participant being presentiWhen, represented with " 1 ", when not deposited in the local data of participant In some attribute xiWhen, represented with " 0 ", Pk(k=1,2,3,4) represents participant, then 4 participants:P1:1,1,0,1,0,0, 0,0 }, P2:{ 1,1,0,0,0,1,0,0 }, P3:{ 1,0,1,0,1,0,1,1 }, P4:{ 0,0,1,0,0,1,1,0 }.
The personalized difference method for secret protection that high dimensional data is issued under a kind of distributed environment, as shown in figure 1, combining tool The present invention will be described in further detail for the example of body.It is as follows that it includes step:
Step 1:Each participant is sent to manager after local data to be added to processing of making an uproar.
Because manager is half believable, it is impossible to real result of calculation directly is sent into it, then will be to local Data carry out adding processing of making an uproar to be then sent to manager.Here plus make an uproar processing be with one kind generation random number (scope is typically 0 To between 1) mode the marginal probability distribution of attribute is handled.Pr [x are used in examplei] represent plus make an uproar the attribute after handling xiMarginal probability distribution.
Step 2:Manager collects the data that each participant is sent, and the attribute progress to all participants of collection is whole Close, the attribute of repetition is removed during integration, to form property set X.
Step 3:Participant cooperates with manager and carries out sensitivity mark to the property set X for integrating generation, category Property is labeled as Sensitive Attributes and non-sensitive attribute.
Property set X={ x in example1,x2,x3,x4,x5,x6,x7,x8, attribute is divided into Sensitive Attributes and non-sensitive category Property, attribute { x in example4,x7Sensitive Attributes are defined as, remaining attribute is non-sensitive attribute.So any attribute is to (xi, Πj) (x hereiRepresent any attribute in property set X, ΠjRepresent xiAny parent attribute, except attribute xiIt is any outside itself Attribute is all probably its parent attribute) composition can be divided into three kinds of situations, i.e., Sensitive Attributes and Sensitive Attributes, Sensitive Attributes with it is non-quick The attribute pair that sense attribute, non-sensitive attribute are formed with non-sensitive attribute.
Step 4:The joint probability point of each attribute and other all properties in addition to itself in computation attribute collection X Cloth, it is expressed as Pr [xi, Πj], and then the association relationship MI of all properties pair after integration is calculated, and by all properties to being stored in Sorted in array candidate and in a manner of descending.
The mutual information of attribute pair can be obtained by mutual information calculation formula, wherein an attribute is to (xi, Πi) mutual information MI(xi, Πj) calculation formula be:
Wherein, xiFor any attribute in property set X, ΠjFor attribute xiParent attribute collection Π in any parent attribute, Pr [xi, Πj] for attribute to (xi, Πj) joint probability distribution, Pr [xi] it is to add the attribute x after processing of making an uproariMarginal probability point Cloth, Pr [Πj] it is to add the parent attribute Π after processing of making an uproariMarginal probability distribution.Wherein xi∈ X, Πj∈ X, i ≠ j.
Step 5:Initialize Bayesian networkAnd set
The empty list of Bayesian network N to one AP pairs of initialization, wherein AP is to referring to any attribute to (xi, ∏j), xi∈ X, ∏j∈ ∏, i ≠ j, ∏ are attribute xiThe set of all parent attributes, initialization| ∏ |≤k's, i.e. some attribute Parent attribute quantity is less than or equal to k, it is ensured that Bayesian network N is a k- degree Bayesian network.To ensure Bayesian network N simultaneously For directed acyclic graph (DAG).
Set V deposits the set of the attribute of its known parent attribute, such as known attribute x when being structure N1Parent attribute for category Property x2, then just attribute x1It is stored in V;
Step 6:An attribute x is randomly selected in dependence collection X (assuming that including d attribute altogether)0And its parent attribute is set ForAttribute to (x0,) be added in Bayesian network N, x0It is added in set V.
For each selected properties xi∈ X V and eachX V represent attribute xiBelong to property set X but do not belong to In set V, if for example, xaIt is added in V, then subsequently choose xiWhen will exclude xaRepresent all in V (subset refers to that all properties are combined formed set in V to subset here, for example, it is assumed that V={ x1,x2, institute It is { x to have subset1}、{x2}、{x1,x2) in attribute number be min (k, | V |).
Step 7:For the remaining attribute in property set X, d-1 iteration is carried out using greedy algorithm, until property set X All properties be added in Bayesian network N;
In each iteration, attribute x is judgediWhether following two conditions are met simultaneously:
(1) attribute xiBelong to property set X and be not belonging to set V;
(2) number of all subset elements is min (k, | V |) in set V, wherein | V | represent the individual of attribute in set V Number;The maximum of any attribute enters angle value in the Bayesian network that k represents to set;
If it is satisfied, then go in next step;Otherwise, current Bayesian network is returned, and goes to step 10;
In fact, if being unsatisfactory for above-mentioned 2 conditions, it is on stricti jurise that the Bayesian network of return, which can not can be regarded as, Bayesian network, but in order to allow algorithm to be smoothed out, still exported as Bayesian network, in the case of this can pair Last Query Result, which adds to make an uproar, produces larger error.
Step 8:According to array candidate in step 2 as attribute to Candidate Set.Regard the attribute in V as parent attribute, Chosen from X V has the attribute of larger association relationship next in Bayesian network N as being added to these parent attributes Attribute.By d-1 iteration, each iteration, to adding in N, finally returns to shellfish as far as possible the attribute for possessing larger mutual information This network of leaf.Such as establish 1- degree Bayesian networks, then selection is exactly and attribute xaThere is the attribute of maximum mutual information value, The attribute being added to as next time in Bayesian network N.It is identical if there is the mutual information of multiple attributes pair, preferentially select during selection The attribute pair related to Sensitive Attributes is taken, is added in Bayesian network N.
Step 9:Whether the Bayesian network N that judgment step 8 is built meets k- degree, and meeting the standard of k- degree is:Current shellfish Any attribute node of this network N of leaf enters whether angle value is respectively less than equal to k, if k- degree Bayesian networks are then returned to, in reality The Bayesian network returned in example is as shown in Figure 2.Otherwise iterations adds 1, return to step 7.
The correlation of mutual information quantified property in information theory, using the GreedyBayes algorithms in PrivBayes Build the Bayesian network of approximate k- degree (it is k that the maximum of any attribute node, which enters angle value, in the Bayesian network built). Node represents attribute in Bayesian network, and directed edge or directed walk represent certain dependence between 2 attributes.Such as In Fig. 2, attribute x1With attribute x2There is a directed edge, represent attribute x2Depend directly on attribute x1.Attribute x1With attribute x4Have one Bar directed walk, represent attribute x4Indirectly depend on attribute x1.Bayesian network can be good at embodying between attribute and attribute Dependence.
Step 10:Count in the Bayesian network returned and meet the Sensitive Attributes of condition and the quantity of non-sensitive attribute.
First, all attributes pair being made up of Sensitive Attributes and non-sensitive attribute are chosen from the Bayesian network of return. In example, return is 1- degree Bayesian networks, and Sensitive Attributes are { x4,x7The attribute chosen is to for { x1,x4}、{x2,x4}、 {x3,x4}、{x5,x4}、{x6,x4}、{x8,x4}、{x1,x7}、{x2,x7}、{x3,x7}、{x5,x7}、{x6,x7}、{x8,x7}。
Then, the mutual information MI of these Sensitive Attributes and the attribute pair of non-sensitive attribute is compared with set threshold θ Compared with selecting attributes pair of the mutual information MI more than those Sensitive Attributes and the non-sensitive attribute of threshold θ.Setting for threshold θ, Depend on the circumstances, found by test of many times optimal.Such as in instances, threshold θ=0.2 is set, it is assumed that there are MI (x2,x4)>θ, MI(x8,x7)>θ, therefore the non-sensitive attribute for needing exist for carrying out secret protection has 2, respectively x2、x8
Finally, the selected quantity n of attribute centering Sensitive Attributes and the quantity m of non-sensitive attribute taken out of statistics.With it is set Determine the size of threshold θ, if MI is more than θ, count the quantity of associated sensitive and non-sensitive attribute, it is otherwise insensitive to these Handled with Sensitive Attributes.N=2, m=2 in example.
Step 11:Personalization distribution privacy budget.
Because existing difference privacy methods use same privacy budget to all participants, the standard of being uniformly processed can not What is avoided causes workload to maximize.Also privacy budget allocation when do not account for personalization, the budget of mean allocation privacy certainly will Cause some local data sets protections weaker, and some local data sets protections are too strong.So we use individual character The privacy budget allocation scheme of change, the quantity n of the Sensitive Attributes counted according to step 10 and the quantity m of non-sensitive attribute, The privacy budget ε ' of each participant is calculated respectively:
Wherein, ε represents given total privacy budget, and α represents the weights of given Sensitive Attributes, and n ' represents the participant The number of middle Sensitive Attributes, m ' represent the number of non-sensitive attribute in the participant;
Pay attention to, the ratio i.e. α of the privacy budget distributed for Sensitive Attributes and non-sensitive attribute, to depend on the circumstances, Found by test of many times optimal, but distribute total privacy budget and should be less than being equal to ε, so as to meet ε-difference privacy.
In instances, total privacy budget ε=0.1, α=0.6 are set.Now, ε1=α ε=0.6 ε, ε2=(1- α) ε= 0.4 ε, then the privacy budget for distributing to Sensitive Attributes are respectively ε (x4)=ε (x7ε/2 of)=0.6, distribute to the hidden of non-sensitive attribute Private budget is respectively ε (x2)=ε (x8ε/2 of)=0.4.
Step 12:The process of making an uproar is added to the difference privacy of Query Result.When user is inquired about data, manager according to The privacy budget ε ' distributed carries out adding processing of making an uproar to data.Added using random response (RR) mechanism for meeting ε '-difference privacy Plus noise.
It is main to be applied to the inquiry of logarithm value type and tie because universal in existing difference privacy plus mode of making an uproar be Laplace mechanism The protection of fruit, add used here as random response (RR) mechanism and make an uproar, be better than Laplace mechanism in some scenes.Random response machine System is made up of following algorithm:S is that a random algorithm is used to generate a problem q.Res be using q and correct option tA as input, Answer nA with noise is the random algorithm of output, is expressed as Res (q, tA), generally omits q, is expressed as Res (tA).Eval (nA1..., nAn) it is with the answer with noise, nA1..., nAnAs input, export as valuation.So given ε >=0, with Machine response mechanism M=(S, Res, Eval) meets ε-difference privacy, if for any two correct option tA0∈R、tA1∈ R, For arbitrary element s ∈ R, Pr [Res (tA0)=s]≤eε×Pr[Res(tA1)=s].Wherein (S, Res, Eval) is represented respectively Three random algorithms, ε are secret protection budget.In some scenarios by adding caused mean square error of making an uproar to be less than at random Mean square error caused by Laplace mechanism, someone demonstrate this point.
Step 13:Terminate.
The present invention calculates the mutual of respective attributes pair by the correlation between mutual information quantified property using mutual information formula Information.Approximate k- degree Bayesian network is built according to mutual information, Bayesian network can be good at embodying attribute and attribute it Between dependence.According to the personalized distribution privacy budget of the quantity of Sensitive Attributes and the non-quick attribute for meeting condition.It is each to participate in According to the privacy budget distributed, carry out plus make an uproar to data root processing, is carried out plus is made an uproar using random response mechanism.After adding and making an uproar Data are sent to manager, and integrated data collection, then externally issue are synthesized by manager's aggregation.The present invention protects when issuing data Privacy requirement is demonstrate,proved, the data of processing are largely reduced, and so as to which the change to data reduces, improve the utility of data, favorably Correlation analysis is carried out in data analysis person.
It should be noted that although embodiment of the present invention is illustrative above, but it is to the present invention that this, which is not, Limitation, therefore the invention is not limited in above-mentioned embodiment.Without departing from the principles of the present invention, it is every The other embodiment that those skilled in the art obtain under the enlightenment of the present invention, it is accordingly to be regarded as within the protection of the present invention.

Claims (5)

1. the personalized difference method for secret protection that high dimensional data is issued under distributed environment, it is characterized in that, including step is as follows:
The each participant of step 1. by local data carry out plus make an uproar processing after be sent to manager;
Step 2. manager collects the data that each participant is sent, and the attribute of all participants is integrated, and removes weight Multiple attribute, form property set X;
Step 3. participant cooperates with manager carries out sensitivity mark to the attribute in property set X, by the category in property set X Property is divided into Sensitive Attributes and the class of non-sensitive attribute two;
The attribute pair that each attribute is formed with other all properties in addition to itself in step 4. computation attribute collection X it is mutual Information;
Step 5. initializes Bayesian networkAnd set
An attribute x is arbitrarily chosen in step 6. dependence collection X0And its parent attribute is set to beAttribute pairAdd It is added in Bayesian network N, attribute x0It is added in set V;
Step 7. is chosen in the remaining attribute in property set X and chooses an attribute as current attribute, and judges whether simultaneously full The following two conditions of foot:
1. current attribute belongs to property set X, but is not belonging to set V;
2. the number of all subset elements is min (k, | V |) in set V, wherein | V | represent the number of attribute in set V, k tables Show that the maximum of any attribute in the Bayesian network of setting enters angle value;
If it is, go to step 8;Otherwise, current Bayesian network N is returned, and goes to step 10;
Step 8. selects the maximum attribute pair of mutual information from the attribute centering of all about current attribute, and the maximum attribute Another attribute of centering is added in Bayesian network N;
Step 9. judge current Bayesian network N any attribute node enter angle value whether be respectively less than be equal to k:If so, then return Current Bayesian network N is returned, and goes to step 10;Otherwise, return to step 7, until property set X all properties are added to shellfish In this network N of leaf;
The attribute centering of step 10. all Sensitive Attributes and non-sensitive attribute composition from the current Bayesian network N of return, choosing Go out the attribute pair that mutual information is more than given threshold θ, and count selected attribute to the quantity n of Sensitive Attributes that is included and non- The quantity m of Sensitive Attributes;
Step 11. calculates each participate in respectively according to the quantity n of the Sensitive Attributes counted and the quantity m of non-sensitive attribute The privacy budget ε ' of side:
Wherein, ε represents given total privacy budget, and α represents the weights of given Sensitive Attributes, and n ' represents quick in the participant Feel the number of attribute, m ' represents the number of non-sensitive attribute in the participant;
Step 12. when user is inquired about the local data of some participant, manager first with the participant privacy Budget ε ' to Query Result carry out difference privacy add make an uproar after be then forwarded to user.
2. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1, it is special Sign is, in step 4, attribute is to (xi, Πj) mutual phase information MI (xi, Πj) calculation formula is:
Wherein, Pr [xi, Πj] represent attribute to (xi, Πj) joint probability distribution, Pr [xi] represent attribute xiMarginal probability Distribution, Pr [Πj] represent attribute ΠjMarginal probability distribution.
3. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1 or claim 2, its It is characterized in, in step 4, it is also necessary to according to mutual information size to all properties to being ranked up.
4. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 3, it is special Sign is, in step 4, by all properties to according to mutual information carry out descending sort, i.e., the big attribute of mutual information to coming before, mutual trust After ceasing small attribute to coming.
5. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1, it is special Sign is, in step 5, constructed k- degree Bayesian network is directed acyclic graph.
CN201711092850.9A 2017-11-08 2017-11-08 Personalized differential privacy protection method for high-dimensional data release in distributed environment Active CN107871087B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711092850.9A CN107871087B (en) 2017-11-08 2017-11-08 Personalized differential privacy protection method for high-dimensional data release in distributed environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711092850.9A CN107871087B (en) 2017-11-08 2017-11-08 Personalized differential privacy protection method for high-dimensional data release in distributed environment

Publications (2)

Publication Number Publication Date
CN107871087A true CN107871087A (en) 2018-04-03
CN107871087B CN107871087B (en) 2020-10-30

Family

ID=61752616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711092850.9A Active CN107871087B (en) 2017-11-08 2017-11-08 Personalized differential privacy protection method for high-dimensional data release in distributed environment

Country Status (1)

Country Link
CN (1) CN107871087B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763954A (en) * 2018-05-17 2018-11-06 西安电子科技大学 Linear regression model (LRM) multidimensional difference of Gaussian method for secret protection, information safety system
CN108776763A (en) * 2018-06-08 2018-11-09 哈尔滨工程大学 One kind being based on the relevant difference method for secret protection of attribute
CN108959956A (en) * 2018-06-07 2018-12-07 广西师范大学 Difference private data dissemination method based on Bayesian network
CN108959958A (en) * 2018-06-14 2018-12-07 中国人民解放军战略支援部队航天工程大学 A kind of method for secret protection and system being associated with big data
CN109241770A (en) * 2018-08-10 2019-01-18 深圳前海微众银行股份有限公司 Information value calculating method, equipment and readable storage medium storing program for executing based on homomorphic cryptography
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN110334539A (en) * 2019-06-12 2019-10-15 北京邮电大学 A kind of personalized method for secret protection and device based on random response
CN110334546A (en) * 2019-07-08 2019-10-15 辽宁工业大学 Difference privacy high dimensional data based on principal component analysis optimization issues guard method
CN111242194A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111259442A (en) * 2020-01-15 2020-06-09 广西师范大学 Differential privacy protection method for decision tree under MapReduce framework
WO2020177484A1 (en) * 2019-03-01 2020-09-10 华南理工大学 Localized difference privacy urban sanitation data report and privacy calculation method
CN112131604A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 High-dimensional privacy data publishing method based on Bayesian network attribute cluster analysis technology
CN112395630A (en) * 2020-11-26 2021-02-23 平安普惠企业管理有限公司 Data encryption method and device based on information security, terminal equipment and medium
CN112395638A (en) * 2019-08-16 2021-02-23 国际商业机器公司 Collaborative AI with respect to privacy-assured transactional data
CN112528316A (en) * 2020-09-18 2021-03-19 江苏方天电力技术有限公司 Privacy protection lineage workflow publishing method based on Bayesian network
CN112822004A (en) * 2021-01-14 2021-05-18 山西财经大学 Belief network-based targeted privacy protection data publishing method
CN113379062A (en) * 2020-03-10 2021-09-10 百度在线网络技术(北京)有限公司 Method and apparatus for training a model
WO2022199473A1 (en) * 2021-03-25 2022-09-29 支付宝(杭州)信息技术有限公司 Service analysis method and apparatus based on differential privacy
CN115329898A (en) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 Distributed machine learning method and system based on differential privacy policy

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050267A (en) * 2014-06-23 2014-09-17 中国科学院软件研究所 Individuality recommendation method and system protecting user privacy on basis of association rules
CN104573560A (en) * 2015-01-27 2015-04-29 上海交通大学 Differential private data publishing method based on wavelet transformation
CN105608388A (en) * 2015-09-24 2016-05-25 武汉大学 Differential privacy data publishing method and system based on dependency removal
CN105608389A (en) * 2015-10-22 2016-05-25 广西师范大学 Differential privacy protection method of medical data dissemination
US20170024575A1 (en) * 2015-07-22 2017-01-26 International Business Machines Corporation Obfuscation and protection of data rights
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050267A (en) * 2014-06-23 2014-09-17 中国科学院软件研究所 Individuality recommendation method and system protecting user privacy on basis of association rules
CN104573560A (en) * 2015-01-27 2015-04-29 上海交通大学 Differential private data publishing method based on wavelet transformation
US20170024575A1 (en) * 2015-07-22 2017-01-26 International Business Machines Corporation Obfuscation and protection of data rights
CN105608388A (en) * 2015-09-24 2016-05-25 武汉大学 Differential privacy data publishing method and system based on dependency removal
CN105608389A (en) * 2015-10-22 2016-05-25 广西师范大学 Differential privacy protection method of medical data dissemination
CN106778314A (en) * 2017-03-01 2017-05-31 全球能源互联网研究院 A kind of distributed difference method for secret protection based on k means

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIN ZHANG等: "Efficient privacy-preserving classification construction model with differential privacy technology", 《JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS》 *
孙奎等: "一种增强的差分隐私数据发布算法", 《计算机工程》 *
王玲玲等: "基于位置服务的隐私保护机制度量研究综述", 《计算机应用研究》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108763954A (en) * 2018-05-17 2018-11-06 西安电子科技大学 Linear regression model (LRM) multidimensional difference of Gaussian method for secret protection, information safety system
CN108763954B (en) * 2018-05-17 2022-03-01 西安电子科技大学 Linear regression model multidimensional Gaussian difference privacy protection method and information security system
CN108959956A (en) * 2018-06-07 2018-12-07 广西师范大学 Difference private data dissemination method based on Bayesian network
CN108959956B (en) * 2018-06-07 2021-06-22 广西师范大学 Differential privacy data publishing method based on Bayesian network
CN108776763A (en) * 2018-06-08 2018-11-09 哈尔滨工程大学 One kind being based on the relevant difference method for secret protection of attribute
CN108959958A (en) * 2018-06-14 2018-12-07 中国人民解放军战略支援部队航天工程大学 A kind of method for secret protection and system being associated with big data
CN109241770A (en) * 2018-08-10 2019-01-18 深圳前海微众银行股份有限公司 Information value calculating method, equipment and readable storage medium storing program for executing based on homomorphic cryptography
CN109299436A (en) * 2018-09-17 2019-02-01 北京邮电大学 A kind of ordering of optimization preference method of data capture meeting local difference privacy
CN109299436B (en) * 2018-09-17 2021-10-15 北京邮电大学 Preference sorting data collection method meeting local differential privacy
WO2020177484A1 (en) * 2019-03-01 2020-09-10 华南理工大学 Localized difference privacy urban sanitation data report and privacy calculation method
CN110334539A (en) * 2019-06-12 2019-10-15 北京邮电大学 A kind of personalized method for secret protection and device based on random response
CN110334539B (en) * 2019-06-12 2021-06-22 北京邮电大学 Personalized privacy protection method and device based on random response
CN110334546A (en) * 2019-07-08 2019-10-15 辽宁工业大学 Difference privacy high dimensional data based on principal component analysis optimization issues guard method
CN112395638B (en) * 2019-08-16 2024-04-26 国际商业机器公司 Collaborative AI with respect to transaction data with privacy guarantee
CN112395638A (en) * 2019-08-16 2021-02-23 国际商业机器公司 Collaborative AI with respect to privacy-assured transactional data
CN111242194B (en) * 2020-01-06 2022-03-08 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111242194A (en) * 2020-01-06 2020-06-05 广西师范大学 Differential privacy protection method for affinity propagation clustering
CN111259442A (en) * 2020-01-15 2020-06-09 广西师范大学 Differential privacy protection method for decision tree under MapReduce framework
CN113379062A (en) * 2020-03-10 2021-09-10 百度在线网络技术(北京)有限公司 Method and apparatus for training a model
CN113379062B (en) * 2020-03-10 2023-07-14 百度在线网络技术(北京)有限公司 Method and device for training model
CN112528316A (en) * 2020-09-18 2021-03-19 江苏方天电力技术有限公司 Privacy protection lineage workflow publishing method based on Bayesian network
CN112528316B (en) * 2020-09-18 2022-07-15 江苏方天电力技术有限公司 Privacy protection lineage workflow publishing method based on Bayesian network
CN112131604A (en) * 2020-09-24 2020-12-25 合肥城市云数据中心股份有限公司 High-dimensional privacy data publishing method based on Bayesian network attribute cluster analysis technology
CN112131604B (en) * 2020-09-24 2023-12-15 合肥城市云数据中心股份有限公司 High-dimensional privacy data release method based on Bayesian network attribute cluster analysis
CN112395630A (en) * 2020-11-26 2021-02-23 平安普惠企业管理有限公司 Data encryption method and device based on information security, terminal equipment and medium
CN112822004A (en) * 2021-01-14 2021-05-18 山西财经大学 Belief network-based targeted privacy protection data publishing method
WO2022199473A1 (en) * 2021-03-25 2022-09-29 支付宝(杭州)信息技术有限公司 Service analysis method and apparatus based on differential privacy
CN115329898A (en) * 2022-10-10 2022-11-11 国网浙江省电力有限公司杭州供电公司 Distributed machine learning method and system based on differential privacy policy

Also Published As

Publication number Publication date
CN107871087B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN107871087A (en) The personalized difference method for secret protection that high dimensional data is issued under distributed environment
US20170250796A1 (en) Trans Vernam Cryptography: Round One
Zhang et al. Identifying influential nodes in complex networks with community structure
Chen et al. Differentially private transit data publication: a case study on the montreal transportation system
Gupta et al. Centrality measures for networks with community structure
Task et al. A guide to differential privacy theory in social network analysis
Navarro-Arribas et al. Information fusion in data privacy: A survey
CN101916256A (en) Community discovery method for synthesizing actor interests and network topology
Zeng et al. Stackelberg game under asymmetric information in critical infrastructure system: From a complex network perspective
Ahmed et al. A random matrix approach to differential privacy and structure preserved social network graph publishing
Rabelo et al. Multigraph approach to quantum non-locality
CN107729767A (en) Community network data-privacy guard method based on figure primitive
Kumar et al. A novel architecture to identify locations for Real Estate Investment
CN110413652A (en) A kind of big data privacy search method based on edge calculations
Wei et al. Differential privacy-based trajectory community recommendation in social network
Le et al. Full autonomy: A novel individualized anonymity model for privacy preserving
Lv et al. Edge-fog-cloud secure storage with deep-learning-assisted digital twins
Miller Equivalence of several generalized percolation models on networks
Kaleli et al. SOM-based recommendations with privacy on multi-party vertically distributed data
Wang et al. Differential privacy for weighted network based on probability model
CN115438227A (en) Network data publishing method based on difference privacy and compactness centrality
Dhanalakshmi et al. Privacy preserving data mining techniques-survey
da Silva et al. Inference in distributed data clustering
Mazalov et al. Game-theoretic centrality measures for weighted graphs
CN103200034B (en) Network user structure disturbance method based on spectral constraint and sensitive area partition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant