CN107871087A

CN107871087A - The personalized difference method for secret protection that high dimensional data is issued under distributed environment

Info

Publication number: CN107871087A
Application number: CN201711092850.9A
Authority: CN
Inventors: 李先贤; 赵华兴; 王利娥; 刘鹏; 于东然
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2017-11-08
Filing date: 2017-11-08
Publication date: 2018-04-03
Anticipated expiration: 2037-11-08
Also published as: CN107871087B

Abstract

The present invention disclose a kind of personalized difference method for secret protection of the issue of high dimensional data under distributed environment, and by the correlation between mutual information quantified property, the mutual information of respective attributes pair is calculated using mutual information formula.Approximate k degree Bayesian network is built according to mutual information, Bayesian network can be good at embodying the dependence between attribute and attribute.According to the personalized distribution privacy budget of the quantity of Sensitive Attributes and the non-quick attribute for meeting condition.According to the privacy budget distributed, carry out plus make an uproar to data each participant processing, is carried out plus is made an uproar using random response mechanism.Adding the data after making an uproar to be sent to manager, integrated data collection, then externally issue are synthesized by manager's aggregation.The present invention ensure that privacy requirement when issuing data, and the data of processing are largely reduced, and so as to which the change to data reduces, improve the utility of data, is advantageous to data analysis person and carries out correlation analysis.

Description

The personalized difference method for secret protection that high dimensional data is issued under distributed environment

Technical field

The present invention relates to network data security technical field, and in particular to high dimensional data is issued under a kind of distributed environment Personalized difference method for secret protection.

Background technology

In recent years, as the fast development of internet and information technology, secret protection data publication are of considerable interest, People for protect individual privacy, realize data sharing idea understand it is more and more deep.Many specific data owners The initial data (such as user data of the medical data of certain hospital, certain social network sites etc.) for needing to be possessed oneself is issued out Go, so that other mechanisms are researched and analysed or for other purposes.It may be included in the initial data to be issued substantial amounts of Sensitive individual privacy information (such as wages, disease condition, individual deposit etc.), directly issue may cause privacy information Leakage, therefore data owner needs initial data carrying out secret protection and then issued again.These secret protection technologies are substantially 3 classes can be divided into：1) distribution technology based on restrictive condition：According to the characteristic of initial data, selectively issue is local special Property data.Guard method or model have：K- anonymities method for secret protection, l- diversity indexes, t- approximate models, m- consistency Deng.2) distribution technology based on data distortion：Make private data distortion but keep some characteristics of legacy data.Guard method Or model has：Randomized technique, block techniques, switching technology, condensation technique etc..3) distribution technology based on data encryption：It is The method of sensitive data is hidden in data mining process using encryption technology.Data encryption technology now development very into Ripe, guard method or model have：Multi-party computations (SMC), des encryption technology, rsa encryption technique etc..

Difference privacy (differential privacy) is that Dwork carries in the privacy leakage problem for staqtistical data base A kind of new privacy definition gone out.It is the distribution technology based on data distortion, and the technology using addition noise makes data distortion But keep some data or data attribute constant simultaneously.Difference secret protection technology can ensure, when the addition in data set or Person deletes a data record and does not interfere with inquiry output result, therefore even if in the worst case, mesh known to attacker All sensitive datas outside mark record, still can ensure that target record will not be compromised.The protection technique is hidden unlike tradition Private protection technique is the same, and it provides very strong theoretical guarantee on the privacy concern of data publication, i.e., not to the back of the body of attacker Scape knowledge does any ad hoc hypothesis.The technology is not only that the issue of private data provides higher safety guarantee, and in reality It is widely used in trampling.But existing difference secret protection technology can not handle high dimensional data under distributed environment well RELEASE PROBLEM, particularly when including substantial amounts of attribute field in the data of issue, existing technology is injected greatly into data The noise information of amount, it is possible to cause the data of issue to lose due effectiveness.

The content of the invention

To be solved by this invention is the deficiency present in existing privacy dissemination method, there is provided high under a kind of distributed environment The personalized difference method for secret protection of dimension data issue.

To solve the above problems, the present invention is achieved by the following technical solutions：

The personalized difference method for secret protection that high dimensional data is issued under distributed environment, including step are as follows：

The each participant of step 1. by local data carry out plus make an uproar processing after be sent to manager；

Step 2. manager collects the data that each participant is sent, and the attribute of all participants is integrated, and goes Except the attribute repeated, property set X is formed；

Step 3. participant cooperates with manager carries out sensitivity mark to the attribute in property set X, by property set X Attribute be divided into Sensitive Attributes and the class of non-sensitive attribute two；

The attribute pair that each attribute is formed with other all properties in addition to itself in step 4. computation attribute collection X Mutual information；

Step 5. initializes Bayesian networkAnd set

An attribute x is arbitrarily chosen in step 6. dependence collection X₀And its parent attribute is set to beAttribute to (x₀,) It is added in Bayesian network N, attribute x₀It is added in set V；

Step 7. is chosen in the remaining attribute in property set X and chooses an attribute as current attribute, and judges whether same When meet following two conditions：

1. current attribute belongs to property set X, but is not belonging to set V；

2. the number of all subset elements is min (k, | V |) in set V, wherein | V | the number of attribute in set V is represented, K represents that the maximum of any attribute in the Bayesian network of setting enters angle value, min represent k and | V | in smaller；

If it is, go to step 8；Otherwise, current Bayesian network N is returned, and goes to step 10；

Step 8. selects the maximum attribute pair of mutual information from the attribute centering of all about current attribute, and the maximum Another attribute of attribute centering is added in Bayesian network N；

Step 9. judge current Bayesian network N any attribute node enter angle value whether be respectively less than be equal to k：If so, Current Bayesian network N is then returned, and goes to step 10；Otherwise, return to step 7, until all properties addition property set X Into Bayesian network N；

The attribute pair of step 10. all Sensitive Attributes and non-sensitive attribute composition from the current Bayesian network N of return In, the attribute pair that mutual information is more than given threshold θ is selected, and the attribute selected by statistics is to the quantity of the Sensitive Attributes included N and non-sensitive attribute quantity m；

Step 11. calculates each respectively according to the quantity n of the Sensitive Attributes counted and the quantity m of non-sensitive attribute The privacy budget ε ' of participant:

Wherein, ε represents given total privacy budget, and α represents the weights of given Sensitive Attributes, and n ' represents the participant The number of middle Sensitive Attributes, m ' represent the number of non-sensitive attribute in the participant；

For step 12. when user is inquired about the local data of some participant, manager is first with the participant Privacy budget ε ' to Query Result carry out difference privacy add make an uproar after be then forwarded to user.

In above-mentioned steps 4, attribute is to (x_i, ∏_j) mutual phase information MI (x_i, ∏_j) calculation formula is：

Wherein, Pr [x_i, Π_j] represent attribute to (x_i, Π_j) joint probability distribution, Pr [x_i] represent attribute x_iEdge Probability distribution, Pr [Π_j] represent attribute ∏_jMarginal probability distribution；

In above-mentioned steps 4, it is also necessary to according to mutual information size to all properties to being ranked up.

In above-mentioned steps 4, by all properties to carrying out descending sort according to mutual information, i.e. the big attribute of mutual information is to coming Before, after the small attribute of mutual information is to coming.

In above-mentioned steps 5, constructed k- degree Bayesian network is directed acyclic graph.

Consider that existing difference privacy methods use same privacy budget to all participants, the standard of being uniformly processed can not What is avoided causes workload to maximize.Also privacy budget allocation when do not account for personalization, the budget of mean allocation privacy certainly will Cause some local data sets protections weaker, and the problem of some local data sets protections are too strong.The present invention passes through Correlation between mutual information quantified property, the mutual information of respective attributes pair is calculated using mutual information formula.According to mutual information structure Approximate k- degree Bayesian network is built, Bayesian network can be good at embodying the dependence between attribute and attribute.According to The personalized distribution privacy budget of quantity of Sensitive Attributes and the non-quick attribute for the condition that meets.Each participant is according to the individual character distributed Change privacy budget, carry out plus make an uproar to data processing, is carried out plus is made an uproar using random response mechanism.When user is to the sheet of some participant When ground data are inquired about, manager is carried out after difference privacy adds and make an uproar first with the privacy budget ε ' of the participant to Query Result It is then forwarded to user.The present invention ensure that privacy requirement when issuing data, and the data of processing are largely reduced, so as to data Change reduce, improve the utilities of data, be advantageous to data analysis person and carry out correlation analysis.

Brief description of the drawings

Fig. 1 is the flow chart of the preferred embodiment of the present invention.

Fig. 2 is the 1- degree Bayesian networks N of 8 attribute nodes in the preferred embodiment of the present invention.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with instantiation, and with reference to attached Figure, the present invention is described in more detail.

The present invention considers that data attribute has juxtaposition situation in distributed environment, according to the different sensitivitys of attribute Relevance between and is distributed to realize that privacy budget is personalized, to solve the problems of prior art.Base of the present invention It is credible in half it is assumed that each local data base is carried out plus made an uproar so that it meets ε according to the privacy budget of distribution_i- differential privacy(ε_iFor the privacy budget of distribution), after be sent to data manager and polymerize, data aggregate Meet total ε-differential privacy afterwards (whereinK participant), while preferably keep original The utility of data, be advantageous to the purpose of data analysis person carries out data analysis.

The system model of the present invention includes three roles：What k participant (i.e. local data owner), one and half trusted Data manager and user's (data analysis).Each participant P_kIt is owned by a local data base D_k.Manager is half credible , assist k participant to issue integrated data collection D, the purpose of data of issue are directed to different data analyses for user.

Assuming that total property set：X={ x₁,x₂,x₃,x₄,x₅,x₆,x₇,x₈, wherein x_i(i=1,2 .., 8) represents different Attribute.When some attribute x in the local data of participant being present_iWhen, represented with " 1 ", when not deposited in the local data of participant In some attribute x_iWhen, represented with " 0 ", P_k(k=1,2,3,4) represents participant, then 4 participants：P₁:1,1,0,1,0,0, 0,0 }, P₂:{ 1,1,0,0,0,1,0,0 }, P₃:{ 1,0,1,0,1,0,1,1 }, P₄:{ 0,0,1,0,0,1,1,0 }.

The personalized difference method for secret protection that high dimensional data is issued under a kind of distributed environment, as shown in figure 1, combining tool The present invention will be described in further detail for the example of body.It is as follows that it includes step：

Step 1：Each participant is sent to manager after local data to be added to processing of making an uproar.

Because manager is half believable, it is impossible to real result of calculation directly is sent into it, then will be to local Data carry out adding processing of making an uproar to be then sent to manager.Here plus make an uproar processing be with one kind generation random number (scope is typically 0 To between 1) mode the marginal probability distribution of attribute is handled.Pr [x are used in example_i] represent plus make an uproar the attribute after handling x_iMarginal probability distribution.

Step 2：Manager collects the data that each participant is sent, and the attribute progress to all participants of collection is whole Close, the attribute of repetition is removed during integration, to form property set X.

Step 3：Participant cooperates with manager and carries out sensitivity mark to the property set X for integrating generation, category Property is labeled as Sensitive Attributes and non-sensitive attribute.

Property set X={ x in example₁,x₂,x₃,x₄,x₅,x₆,x₇,x₈, attribute is divided into Sensitive Attributes and non-sensitive category Property, attribute { x in example₄,x₇Sensitive Attributes are defined as, remaining attribute is non-sensitive attribute.So any attribute is to (x_i, Π_j) (x here_iRepresent any attribute in property set X, Π_jRepresent x_iAny parent attribute, except attribute x_iIt is any outside itself Attribute is all probably its parent attribute) composition can be divided into three kinds of situations, i.e., Sensitive Attributes and Sensitive Attributes, Sensitive Attributes with it is non-quick The attribute pair that sense attribute, non-sensitive attribute are formed with non-sensitive attribute.

Step 4：The joint probability point of each attribute and other all properties in addition to itself in computation attribute collection X Cloth, it is expressed as Pr [x_i, Π_j], and then the association relationship MI of all properties pair after integration is calculated, and by all properties to being stored in Sorted in array candidate and in a manner of descending.

The mutual information of attribute pair can be obtained by mutual information calculation formula, wherein an attribute is to (x_i, Π_i) mutual information MI(x_i, Π_j) calculation formula be：

Wherein, x_iFor any attribute in property set X, Π_jFor attribute x_iParent attribute collection Π in any parent attribute, Pr [x_i, Π_j] for attribute to (x_i, Π_j) joint probability distribution, Pr [x_i] it is to add the attribute x after processing of making an uproar_iMarginal probability point Cloth, Pr [Π_j] it is to add the parent attribute Π after processing of making an uproar_iMarginal probability distribution.Wherein x_i∈ X, Π_j∈ X, i ≠ j.

Step 5：Initialize Bayesian networkAnd set

The empty list of Bayesian network N to one AP pairs of initialization, wherein AP is to referring to any attribute to (x_i, ∏_j), x_i∈ X, ∏_j∈ ∏, i ≠ j, ∏ are attribute x_iThe set of all parent attributes, initialization| ∏ |≤k's, i.e. some attribute Parent attribute quantity is less than or equal to k, it is ensured that Bayesian network N is a k- degree Bayesian network.To ensure Bayesian network N simultaneously For directed acyclic graph (DAG).

Set V deposits the set of the attribute of its known parent attribute, such as known attribute x when being structure N₁Parent attribute for category Property x₂, then just attribute x₁It is stored in V；

Step 6：An attribute x is randomly selected in dependence collection X (assuming that including d attribute altogether)₀And its parent attribute is set ForAttribute to (x₀,) be added in Bayesian network N, x₀It is added in set V.

For each selected properties x_i∈ X V and eachX V represent attribute x_iBelong to property set X but do not belong to In set V, if for example, x_aIt is added in V, then subsequently choose x_iWhen will exclude x_a。Represent all in V (subset refers to that all properties are combined formed set in V to subset here, for example, it is assumed that V={ x₁,x₂, institute It is { x to have subset₁}、{x₂}、{x₁,x₂) in attribute number be min (k, | V |).

Step 7：For the remaining attribute in property set X, d-1 iteration is carried out using greedy algorithm, until property set X All properties be added in Bayesian network N；

In each iteration, attribute x is judged_iWhether following two conditions are met simultaneously：

(1) attribute x_iBelong to property set X and be not belonging to set V；

(2) number of all subset elements is min (k, | V |) in set V, wherein | V | represent the individual of attribute in set V Number；The maximum of any attribute enters angle value in the Bayesian network that k represents to set；

If it is satisfied, then go in next step；Otherwise, current Bayesian network is returned, and goes to step 10；

In fact, if being unsatisfactory for above-mentioned 2 conditions, it is on stricti jurise that the Bayesian network of return, which can not can be regarded as, Bayesian network, but in order to allow algorithm to be smoothed out, still exported as Bayesian network, in the case of this can pair Last Query Result, which adds to make an uproar, produces larger error.

Step 8：According to array candidate in step 2 as attribute to Candidate Set.Regard the attribute in V as parent attribute, Chosen from X V has the attribute of larger association relationship next in Bayesian network N as being added to these parent attributes Attribute.By d-1 iteration, each iteration, to adding in N, finally returns to shellfish as far as possible the attribute for possessing larger mutual information This network of leaf.Such as establish 1- degree Bayesian networks, then selection is exactly and attribute x_aThere is the attribute of maximum mutual information value, The attribute being added to as next time in Bayesian network N.It is identical if there is the mutual information of multiple attributes pair, preferentially select during selection The attribute pair related to Sensitive Attributes is taken, is added in Bayesian network N.

Step 9：Whether the Bayesian network N that judgment step 8 is built meets k- degree, and meeting the standard of k- degree is：Current shellfish Any attribute node of this network N of leaf enters whether angle value is respectively less than equal to k, if k- degree Bayesian networks are then returned to, in reality The Bayesian network returned in example is as shown in Figure 2.Otherwise iterations adds 1, return to step 7.

The correlation of mutual information quantified property in information theory, using the GreedyBayes algorithms in PrivBayes Build the Bayesian network of approximate k- degree (it is k that the maximum of any attribute node, which enters angle value, in the Bayesian network built). Node represents attribute in Bayesian network, and directed edge or directed walk represent certain dependence between 2 attributes.Such as In Fig. 2, attribute x₁With attribute x₂There is a directed edge, represent attribute x₂Depend directly on attribute x₁.Attribute x₁With attribute x₄Have one Bar directed walk, represent attribute x₄Indirectly depend on attribute x₁.Bayesian network can be good at embodying between attribute and attribute Dependence.

Step 10：Count in the Bayesian network returned and meet the Sensitive Attributes of condition and the quantity of non-sensitive attribute.

First, all attributes pair being made up of Sensitive Attributes and non-sensitive attribute are chosen from the Bayesian network of return. In example, return is 1- degree Bayesian networks, and Sensitive Attributes are { x₄,x₇The attribute chosen is to for { x₁,x₄}、{x₂,x₄}、 {x₃,x₄}、{x₅,x₄}、{x₆,x₄}、{x₈,x₄}、{x₁,x₇}、{x₂,x₇}、{x₃,x₇}、{x₅,x₇}、{x₆,x₇}、{x₈,x₇}。

Then, the mutual information MI of these Sensitive Attributes and the attribute pair of non-sensitive attribute is compared with set threshold θ Compared with selecting attributes pair of the mutual information MI more than those Sensitive Attributes and the non-sensitive attribute of threshold θ.Setting for threshold θ, Depend on the circumstances, found by test of many times optimal.Such as in instances, threshold θ=0.2 is set, it is assumed that there are MI (x₂,x₄)>θ, MI(x₈,x₇)>θ, therefore the non-sensitive attribute for needing exist for carrying out secret protection has 2, respectively x₂、x₈。

Finally, the selected quantity n of attribute centering Sensitive Attributes and the quantity m of non-sensitive attribute taken out of statistics.With it is set Determine the size of threshold θ, if MI is more than θ, count the quantity of associated sensitive and non-sensitive attribute, it is otherwise insensitive to these Handled with Sensitive Attributes.N=2, m=2 in example.

Step 11：Personalization distribution privacy budget.

Because existing difference privacy methods use same privacy budget to all participants, the standard of being uniformly processed can not What is avoided causes workload to maximize.Also privacy budget allocation when do not account for personalization, the budget of mean allocation privacy certainly will Cause some local data sets protections weaker, and some local data sets protections are too strong.So we use individual character The privacy budget allocation scheme of change, the quantity n of the Sensitive Attributes counted according to step 10 and the quantity m of non-sensitive attribute, The privacy budget ε ' of each participant is calculated respectively：

Pay attention to, the ratio i.e. α of the privacy budget distributed for Sensitive Attributes and non-sensitive attribute, to depend on the circumstances, Found by test of many times optimal, but distribute total privacy budget and should be less than being equal to ε, so as to meet ε-difference privacy.

In instances, total privacy budget ε=0.1, α=0.6 are set.Now, ε₁=α ε=0.6 ε, ε₂=(1- α) ε= 0.4 ε, then the privacy budget for distributing to Sensitive Attributes are respectively ε (x₄)=ε (x₇ε/2 of)=0.6, distribute to the hidden of non-sensitive attribute Private budget is respectively ε (x₂)=ε (x₈ε/2 of)=0.4.

Step 12：The process of making an uproar is added to the difference privacy of Query Result.When user is inquired about data, manager according to The privacy budget ε ' distributed carries out adding processing of making an uproar to data.Added using random response (RR) mechanism for meeting ε '-difference privacy Plus noise.

It is main to be applied to the inquiry of logarithm value type and tie because universal in existing difference privacy plus mode of making an uproar be Laplace mechanism The protection of fruit, add used here as random response (RR) mechanism and make an uproar, be better than Laplace mechanism in some scenes.Random response machine System is made up of following algorithm：S is that a random algorithm is used to generate a problem q.Res be using q and correct option tA as input, Answer nA with noise is the random algorithm of output, is expressed as Res (q, tA), generally omits q, is expressed as Res (tA).Eval (nA₁..., nA_n) it is with the answer with noise, nA₁..., nA_nAs input, export as valuation.So given ε >=0, with Machine response mechanism M=(S, Res, Eval) meets ε-difference privacy, if for any two correct option tA₀∈R、tA₁∈ R, For arbitrary element s ∈ R, Pr [Res (tA₀)=s]≤e^ε×Pr[Res(tA₁)=s].Wherein (S, Res, Eval) is represented respectively Three random algorithms, ε are secret protection budget.In some scenarios by adding caused mean square error of making an uproar to be less than at random Mean square error caused by Laplace mechanism, someone demonstrate this point.

Step 13：Terminate.

The present invention calculates the mutual of respective attributes pair by the correlation between mutual information quantified property using mutual information formula Information.Approximate k- degree Bayesian network is built according to mutual information, Bayesian network can be good at embodying attribute and attribute it Between dependence.According to the personalized distribution privacy budget of the quantity of Sensitive Attributes and the non-quick attribute for meeting condition.It is each to participate in According to the privacy budget distributed, carry out plus make an uproar to data root processing, is carried out plus is made an uproar using random response mechanism.After adding and making an uproar Data are sent to manager, and integrated data collection, then externally issue are synthesized by manager's aggregation.The present invention protects when issuing data Privacy requirement is demonstrate,proved, the data of processing are largely reduced, and so as to which the change to data reduces, improve the utility of data, favorably Correlation analysis is carried out in data analysis person.

It should be noted that although embodiment of the present invention is illustrative above, but it is to the present invention that this, which is not, Limitation, therefore the invention is not limited in above-mentioned embodiment.Without departing from the principles of the present invention, it is every The other embodiment that those skilled in the art obtain under the enlightenment of the present invention, it is accordingly to be regarded as within the protection of the present invention.

Claims

1. the personalized difference method for secret protection that high dimensional data is issued under distributed environment, it is characterized in that, including step is as follows：

Step 2. manager collects the data that each participant is sent, and the attribute of all participants is integrated, and removes weight Multiple attribute, form property set X；

Step 3. participant cooperates with manager carries out sensitivity mark to the attribute in property set X, by the category in property set X Property is divided into Sensitive Attributes and the class of non-sensitive attribute two；

The attribute pair that each attribute is formed with other all properties in addition to itself in step 4. computation attribute collection X it is mutual Information；

Step 5. initializes Bayesian networkAnd set

An attribute x is arbitrarily chosen in step 6. dependence collection X₀And its parent attribute is set to beAttribute pairAdd It is added in Bayesian network N, attribute x₀It is added in set V；

Step 7. is chosen in the remaining attribute in property set X and chooses an attribute as current attribute, and judges whether simultaneously full The following two conditions of foot：

2. the number of all subset elements is min (k, | V |) in set V, wherein | V | represent the number of attribute in set V, k tables Show that the maximum of any attribute in the Bayesian network of setting enters angle value；

Step 8. selects the maximum attribute pair of mutual information from the attribute centering of all about current attribute, and the maximum attribute Another attribute of centering is added in Bayesian network N；

Step 9. judge current Bayesian network N any attribute node enter angle value whether be respectively less than be equal to k：If so, then return Current Bayesian network N is returned, and goes to step 10；Otherwise, return to step 7, until property set X all properties are added to shellfish In this network N of leaf；

The attribute centering of step 10. all Sensitive Attributes and non-sensitive attribute composition from the current Bayesian network N of return, choosing Go out the attribute pair that mutual information is more than given threshold θ, and count selected attribute to the quantity n of Sensitive Attributes that is included and non- The quantity m of Sensitive Attributes；

Step 11. calculates each participate in respectively according to the quantity n of the Sensitive Attributes counted and the quantity m of non-sensitive attribute The privacy budget ε ' of side:

Wherein, ε represents given total privacy budget, and α represents the weights of given Sensitive Attributes, and n ' represents quick in the participant Feel the number of attribute, m ' represents the number of non-sensitive attribute in the participant；

Step 12. when user is inquired about the local data of some participant, manager first with the participant privacy Budget ε ' to Query Result carry out difference privacy add make an uproar after be then forwarded to user.

2. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1, it is special Sign is, in step 4, attribute is to (x_i, Π_j) mutual phase information MI (x_i, Π_j) calculation formula is：

Wherein, Pr [x_i, Π_j] represent attribute to (x_i, Π_j) joint probability distribution, Pr [x_i] represent attribute x_iMarginal probability Distribution, Pr [Π_j] represent attribute Π_jMarginal probability distribution.

3. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1 or claim 2, its It is characterized in, in step 4, it is also necessary to according to mutual information size to all properties to being ranked up.

4. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 3, it is special Sign is, in step 4, by all properties to according to mutual information carry out descending sort, i.e., the big attribute of mutual information to coming before, mutual trust After ceasing small attribute to coming.

5. the personalized difference method for secret protection that high dimensional data is issued under distributed environment according to claim 1, it is special Sign is, in step 5, constructed k- degree Bayesian network is directed acyclic graph.