CN107070932B

CN107070932B - Anonymous method for preventing label neighbor attack in social network dynamic release

Info

Publication number: CN107070932B
Application number: CN201710270625.3A
Authority: CN
Inventors: 李先贤; 胡晓依; 雷聪; 许元馨; 王利娥
Original assignee: Guangxi Normal University
Current assignee: Guangxi Normal University
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2020-02-07
Anticipated expiration: 2037-04-24
Also published as: CN107070932A

Abstract

The invention discloses an anonymous method for preventing label neighbor attack in social network dynamic release, which adopts an algorithm designed based on neighbor label similarity to distribute proper groups for nodes, simultaneously considers neighbor structure and degree information of nodes in the same group in an original social network, and improves the used StruSim method on the basis of the algorithm to meet privacy requirements; and improving the structural similarity of the nodes in the group, and preparing for the next operation, so that the social network graph grouped at the time t is obtained by the method. The graph structure is changed by adopting a random scrambling method, edges are randomly added and deleted, then a fuzzification method is adopted, and each edge has a corresponding probability to exist in the social network, so that an attacker can not determine an individual with a sensitive label uniquely with a probability higher than 1/l. The invention only disturbs the label neighbor information of the individual with the sensitive label, thereby reducing the number of uncertain graphs and improving the usability of data.

Description

Anonymous method for preventing label neighbor attack in social network dynamic release

Technical Field

The invention relates to the technical field of data privacy protection, in particular to an anonymous method for preventing label neighbor attack in social network dynamic release.

Background

In recent years, with the explosion of internet technology, social network software, such as microblog letters and Facebook, is becoming popular, and these social network applications provide a convenient communication platform for people and also generate a great amount of information about users. Such information has a wide range of uses, such as advertising, merchandise recommendation, and social behavior prediction. The social network data contains a large amount of sensitive information, including attribute information of the individual (such as occupation, salary, etc.), behavior information of the individual (such as social relationship of the individual, etc.), and these information can be published and shared without processing, which may invade the privacy of the user. Therefore, the issue of privacy protection in social networking data distribution has been a focus of attention by many researchers.

In social network data, personal information of a user, such as name, gender, age, address, occupation, etc., is generally represented by a tag, the user can select an attribute to be hidden by himself, the hidden attribute is sensitive, and the disclosed attribute is insensitive, so that the tag is classified into sensitive and insensitive. One of the most common and intuitive anonymization methods is simple anonymity, i.e. removing explicit identifier attributes that uniquely identify the user, leaving only the attributes represented by the tag. This can reduce some privacy disclosure, but the protection is not strong enough, and the attacker can still uniquely identify the individual through background knowledge such as degree and neighbor structure information. Therefore, when conducting social networking analysis, further considerations are needed to ensure privacy is not compromised, and these efforts are primarily concerned with individual and relationship leaks.

The existing work mainly focuses on static network analysis, but many applications relate to dynamic development changes of networks, different from static networks, privacy protection for dynamic network data puts forward higher requirements, the privacy protection not only needs to ensure that data at a certain moment meets anonymity requirements, but also needs to ensure the security of issuing privacy information for many times because data between different moments have an intrinsic incidence relation, and an attacker can obtain more privacy information by comparing data issued at successive moments, so the existing privacy protection method for static social network analysis is not suitable for the requirement of privacy protection for dynamic issuing. In order to solve the problems, a privacy protection model for preventing label neighbor attacks in social network dynamic publishing is provided.

Disclosure of Invention

The invention aims to solve the problem that the existing privacy protecting party is designed only based on social network static release and cannot be applied to social network dynamic release, and provides an anonymous method for preventing label neighbor attack in social network dynamic release, which can meet the privacy requirement of a dynamic network when releasing data.

In order to solve the problems, the invention is realized by the following technical scheme:

the anonymous method for preventing the label neighbor attack in the social network dynamic release comprises the following steps:

step 1, initializing an original social network graph at the current moment;

step 2, grouping the nodes with the sensitive labels in the social network graph according to the structural similarity;

step 3, matching the neighbor nodes of the nodes with the sensitive labels after grouping so as to enable the neighbor label information of the nodes in the group to be the same;

step 4, randomizing the label-neighbor graph which is subjected to the node label matching operation to obtain a randomized social network graph;

and 5, releasing the randomized social network diagram.

The specific process of the step 2 is as follows:

step 3.1, collecting nodes with sensitive labels in social network data

Arranging according to the descending order of the degrees to obtain a new node set;

3.2, selecting the node with the sensitive label with the maximum degree in the new node set, and removing the selected node from the new node set;

3.3, calculating the structural similarity between the selected node with the sensitive label and each node in the new node set, and grouping the selected node with the sensitive label and the node with the most similar structural similarity into a group until the number of the nodes contained in the group reaches a privacy level l;

and 3.4, repeating the steps 3.2-3.3 until the new node set does not contain the nodes with the sensitive labels any more, and finishing the grouping work.

In the step 3.1, the node sets are arranged in descending order according to the degree to obtain a new node sequence.

The specific process of the step 4 is as follows:

step 4.1, randomly adding and/or deleting edges to the label-neighbor graph which has completed the node label matching operation; randomly selecting any 2 nodes in the label-neighbor graph, and if the edges among the 2 nodes exist in the original social network graph, deleting the edges from the label-neighbor graph; otherwise, add this edge to the label-neighbor graph;

and 4.2, randomly generating a [0, 1] probability for each edge in the label-neighbor graph after the edges are randomly added and/or deleted, and taking the probability as the probability that the edge exists in the label-neighbor graph.

Compared with the prior art, the invention has the following characteristics:

1. an algorithm GSGA (Global Similarity-based group clustering) designed based on the Similarity of neighbor tags is adopted to allocate proper groups for nodes, meanwhile, the neighbor structure and degree information of the nodes in the same group in the original social network are considered, and the used StruSim method is improved on the basis of the algorithm to achieve the privacy requirement; and improving the structural similarity of the nodes in the group, and preparing for the next operation, so that the social network graph grouped at the time t is obtained by the method.

2. The graph structure is changed by adopting a random scrambling method, edges are randomly added and deleted, then a fuzzification method is adopted, and each edge has a corresponding probability to exist in the social network, so that an attacker can not determine an individual with a sensitive label uniquely with a probability higher than 1/l.

3. Only the tag neighbor information of the individual with the sensitive tag is disturbed, so that the number of uncertain graphs is reduced, and the usability of data is improved.

Drawings

FIG. 1 is a schematic diagram of an anonymization method for preventing label neighbor attacks in social network dynamic publishing.

Fig. 2 is a data anonymization grouping process.

Fig. 3 is a data anonymization random processing process.

Fig. 4 is an original social network diagram input at different times, where (a) t is the original social network diagram input at time 1, and (b) t is the original social network diagram input at time 2.

Fig. 5 is a diagram of a grouping process.

Fig. 6 is a reconstruction process diagram and a release diagram.

Detailed Description

The social network data used by the invention is a simple undirected graph with labels, and the background knowledge of an attacker can be specific sub-graph information, namely neighbor label information, of any node. Before the social network data is released, preliminary anonymous processing needs to be carried out, namely display identification attributes such as names and the like of the unique identification nodes are removed, and attribute information is represented by labels instead. Graph G for publication_t(V^t；E^t；L^t) Diagram representing time t of a dynamic network, where V^tRepresenting individuals or other entities in a social network for a collection of nodes; e^tRepresenting associations between individuals, i.e., collections of edges in a graph, representing relationships between individuals or entities, such as friends, partnerships. L is^tA set of tags representing an individual.

The invention proposes a Dynamic network multiple publishing privacy model, namely Dynamic-l-diversity, in which a Dynamic network G is given a tag at time t_t(V^t；V_s ^t；E^t；L^t) And a privacy threshold/. V^tRepresenting a collection of nodes, E^tA set of edges is represented that are,representing a set of nodes with a sensitized tag, L^tA set of labels representing individuals, wherein gamma represents a node-to-label mapping relation, and gamma is V^t→L^t. A privacy model of a dynamic social network is provided, label neighbor information of a node is used as background knowledge of an attacker, the node with a sensitive label needs to be protected, and an original network G is used_tAnonymous social network G converted to randomization_t' to satisfy the requirement of l-diversity in dynamic distribution of anonymous network. I.e. for any sensitized tagged node u e V^tAt least l-1 nodes with different sensitive labels, but with the same label-neighbor graph, social network G at time t₁,G₂,G₃,...,G_tAfter anonymizationSocial network graph G'₁,G’₂,G’₃,...,G’_tThe l-diversity requirement is met in dynamic publishing. The method can effectively prevent an attacker from using background knowledge to relocate the published data of the user. Wherein the label-neighbor graph represents for an arbitrary node u, the associated label neighbor graph G for node u_l(u)＝(G_u,NLS_u) Wherein G is_uOne-hop neighbor graph, NLS, representing node u_uOne-hop neighbor label of node u.

Referring to fig. 1, the data processing method designed in the present invention mainly includes: the first is to initialize the data, i.e. to remove the displayed identification attribute and to use the label to represent it. In addition, the nodes with the sensitive labels are collected

And arranging according to the descending order of the degrees to obtain a new node sequence. Next, grouping the nodes with the sensitive labels according to the structural similarity, wherein in the process: selecting a first node with a sensitive label, wherein the node with the maximum degree is selected, and the node set with the sensitive label at the moment

Removing the selected node, and calculating the sum of the node and the selected node

The structural similarity of each node in the graph. And selecting the most similar nodes with the sensitive labels to be divided into a group until the number of the nodes with the sensitive labels in the group reaches a privacy level l. And selecting the next sensitive label node until the sensitive label node set

Is empty. And if the number of individuals in the grouping of the individuals with the sensitive labels is less than l, removing the group g from the group set, and dividing the nodes in the group into other groups according to the structural similarity. Then, the neighbor nodes of the grouped nodes with the sensitive labels are matched, so that the finally obtained neighbor label information of the nodes in the groupAnd finally randomizing the matched label-neighbor graph, and randomly adding and deleting edges to ensure that the number of the edges does not exceed a certain threshold value. Then the edges of the label-neighbor graph of the node are changed into uncertain edges, and the probability of the edges is uniformly distributed [0, 1]]The probability of (c). And finally obtaining the graph data meeting the dynamic social network.

(1) Packet processing

When a new snapshot G_tWhen adding or deleting edges, if the sub-graph structure of the nodes with the sensitive labels changes, the nodes with the changes are grouped, namely, L-group (G) is called_t) (ii) a Then randomizing the label-neighbor graph of the grouped nodes; and if a new node with the sensitive label is added into the social network graph, grouping all the nodes with the sensitive label which are added newly, and then randomizing the label-neighbor graph of the grouped nodes. If the label of the node with the sensitive label changes at the time T +1, if the label changes from B to T, the label is issued at the time T +2, namely the release is delayed, and the label is not identified by an attacker. The invention provides an anonymous method meeting the privacy of dynamic social network data, and aims at the specific application of preventing label neighbor attack in the process of repeatedly releasing the dynamic social network. The main implementation of the grouping is shown in fig. 2.

In order to allocate proper groups to nodes, the problem is solved by using GSGA (Global Similarity-based Group optimization) designed according to the Similarity of neighbor labels to meet the privacy requirement, namely, for two nodes v₁,v₂，v₁Neighbor tag information of

v₂Neighbor tag information of

The greater the similarity of the neighbor labels, the more similar the neighbor label information representing the two nodes. Their neighbor tag information similarity can be calculated as follows:

however, in the method, the neighbor structure information of the nodes in the original social network is not considered in the grouping process, and the StruSim method proposed in the document is improved on the basis of the algorithm, and the structural similarity (StruSim) is defined as follows:

definitions (StruSim) for two nodes v₁,v₂，Value_1iRepresenting a tagged node v₁I-th neighbor node degree information of v₂The ith neighbor node degree information (Value) of_2i) The greater the similarity of the neighbor labels, the more similar the neighbor label information of the two nodes. Their neighbor tag information similarity can be calculated as follows:

step 1: initializing data sets, entering a social network G^tAnd a parameter privacy threshold l.

Step 2: setting C to be null, and setting a node set V with a sensitive label_s ^tAnd descending the order according to the degree so as to facilitate the selection of the nodes in the subsequent process.

And step 3: judging the current node set V with sensitive labels_s ^tWhether it is greater than l. And if so, sorting in a descending order according to the similarity of the labels. A new group g is created, and the first node u with the sensitive label_sAnd into a group and then to step 4. Otherwise go to step 7.

And 4, step 4: and judging whether the number of the nodes in the current grouping is less than l. If less than l, go to step 5. Otherwise go to step 6.

And 5: under the condition of ensuring that the similarity of the neighbor labels is larger, finding the u and the u_sNode u with maximum structural similarity StruSim_maxAnd into a group while removing u from the remaining set of candidate ditag nodes_max. And (5) when the number of the nodes in the group is more than or equal to l, ending the circulation and turning to the step 6.

In step 5, the attack of dynamically issuing the neighbor tag information is prevented from the following aspects, and meanwhile, the utility of the data is ensured: when the node group with the sensitive label is selected, the node with the maximum structure similarity is found under the condition of finding the larger similarity of the neighbor labels, so that the information loss amount is convenient to be as small as possible in the randomization method.

Step 6: merging the existing group g into the group set C, judging whether the nodes in the rest sensitive node sets are larger than l, and turning to the step 4. Otherwise go to step 7.

And 7: and judging whether the number of the nodes with the sensitive labels which are not grouped at present is an empty set. If so, go to step 9. Otherwise go to step 8.

And 8: and according to the structural similarity, dividing the nodes in the g into other groups according to the structural similarity.

And step 9: groups in all the group sets are obtained.

Step 10: and judging whether the current group set is empty, and if so, turning to the step 11. Otherwise go to step 12.

Step 11: taking a sensitive label node u of a first group in a group set_fFor arbitrary u_iAnd calculating all the neighbor tag information in the group. And then finding all unmatched nodes in the social network for all the nodes in the group, and combining the unmatched nodes into the current label-neighbor graph Cl (v) with the sensitive label nodes.

In step 11, the first function to be performed is to combine the neighbor label information of all nodes in the current group for each group, and we use NLSg to represent the neighbor label information of all nodes in the group. And comparing the label neighbor information of the label-neighbor graph of the node with the sensitive label in each group with the neighbor label information NLSg of the current group, finding unmatched nodes for the current cluster Cl (v), merging the unmatched nodes, wherein the node matching firstly meets the condition that the labels are the same and the degree is similar until the neighbor label information of the node in the cluster Cl (v) is the same as the neighbor label information NLSg of the current group for any node v in the group, and completing the node matching. Here we simply refer to the label-neighbor graph of the node that has completed the node label matching as cluster cl (v).

Step 12: and outputting the social network graph and the group set C after the node label matching is completed.

Step 13: and (6) ending.

(2) Randomization process

Inputting a social network graph and a group set C after node label matching is completed, putting all edges of a cluster Cl (V) into an edge set Es, and randomly selecting nodes u in the cluster Cl (V), wherein V belongs to V_Cl(v)If the edge (u, v) originally has E_Cl(v)(ii) deleting (u, v) from Es; if edge (u, v) does not have E_Cl(v)In (3), add edge (u, v) until | E_s|＝β|E_Cl(v)|. β, the value of β is greater than 0, except that the subgraph is originally in the state of a complete graph, the value range is (0, 1), because the edge can not be added any more, in addition, the value is greater than 0, after the subgraph with the edge of Es is obtained, for the subgraph with the edge of Es, the subgraph is obtained

The probability of uniform distribution between 0-1 is generated to confound the probability of the edge existence. Fig. 3 is a flow chart of the implementation of randomization.

Step 1, inputting the social network graph after the node label matching is completed, the group set C and the parameters β.

Step 2: and judging whether the Gs is not an empty set. If yes, go to step 3. Otherwise go to step 9.

And step 3: and judging whether the set of the neighbor-label graph of the current node with the sensitive label is not empty. If the result is empty, go to step 2, otherwise go to step 9.

And 4, step 4: and putting the edges of the matched node label-neighbor graph into Es. Go to step 5.

And 5: judgment | E_sWhether | equals β | E_Cl(v)L. If so, go to step 6. Otherwise go to step 7.

Step 6: and after a subgraph with the edge being Es is obtained, generating the probability of uniform distribution between 0 and 1 for the edge of the subgraph.

And 7: and randomly selecting the node u in the label-neighbor graph Cl (V) of the matched node, wherein V belongs to V_Cl(v)Go to step 8.

And 8: determining whether the edge (u, v) belongs to the edge set E_Cl(v)If the edge (u, v) originally already exists the edge set E_Cl(v)(ii) deleting (u, v) from Es; if edge (u, v) does not have E_Cl(v)In (5), the edge (u, v) is added. Up to | E_s| equals β | E_Cl(v)|。

And step 9: outputting anonymous social network graph G at time t_t’。

Step 10: and (6) ending.

The effect of the invention is further illustrated below by means of a specific example:

original graph: g_tL is 2, t is 1, 2. Fig. 4 is an original social network diagram input at time t-1 and time t-2, where nodes a and C are nodes with sensitive labels.

In order for the nodes within each group to have the same neighbor label information, we group the nodes appropriately so that the 1-neighbor graph and label information within the same group are isomorphic. Nodes with non-sensitive labels need to be inserted into the graph so that the label-neighbor graph of the node is homogeneous within each group, and there are no less than l nodes within the group. We group the nodes with the following indices: neighbor label similarity (NLSS). Neighbor tag information (NLS) of v1 for two nodes_v1) And neighbor label information (NLS) of v2_v2) Their neighbor tag information similarity can be calculated as:

initializing data: and (3) sequencing the nodes with the sensitive labels at the time t-1 by degrees V: {4,3,3}.

Susceptible label node Grouping L-Grouping (G)_t)：

Let l be 2. Firstly, after node degrees are arranged in a descending order, neighbor tag information of a node A is { B, B, D, T }, neighbor tag information of a node C is { B, B, D }, neighbor tag information of a node F is { B, B, D }, firstly, the node A with the sensitive tag is considered, a node sequence with the sensitive tag is arranged in a descending order according to the similarity of the neighbor tags as { A, C, F }, a group g is established for the node A, according to the example, an adjacent matrix is established as shown in a table 1, and the node A is obtained through StruSim:

table 1: adjacency matrix W of one-hop neighborhood of node

The structural similarity of the nodes a and C,

the structural similarity s (A, C) of the nodes A and C is 0.8,

the structural similarity between the node a and the node C is the greatest, so that the current grouping of the nodes with different susceptibilities is g ═ a, C, and obviously, for the node F, if no other nodes with susceptibilities are grouped, the node F is classified into the existing grouping g. Then, node merging operation is carried out on the label-neighbor graphs of the nodes in the group set C, node label matching operation is carried out on the label-neighbor graphs of the nodes A and the nodes C in the group by traversing NLSg { B, B, D, T } of the nodes in the group, the nodes R1, R2, R3 and the nodes X1, X2 and X3 are matched, however, the nodes R4 cannot find the nodes matched with the nodes in the label-neighbor graph of the node C, so that the nodes X4 which are not anonymous in the social network are matched with the nodes R4, and the labels are T. And ending the cycle until all the neighbor graphs with the sensitive label nodes are matched. See fig. 5.

And (3) randomization: for the randomization process, the randomization process is performed on the label-neighbor graphs of nodes in the same group that have completed node label matching. Node F is not considered first. For example, at G of FIG. 6₁When t is 1, the 2-grouping social network graph β is 0.9, randomizing matched label-neighbor graphs of nodes { A, C } with sensitive labels in the group, namely randomizing clusters Cl (A) and Cl (C), randomly selecting edges (u, v) of two nodes for subgraph Cl (A), judging whether the edges exist in the clusters Cl (A), if not, adding the edges, and if not, judging whether the edges exist in the clusters Cl (A), adding the edges or notThe edges are deleted until the number of edges in Cl (A) is β | E_Cl(v)5 (rounded down); and for the cluster Cl (C), randomly selecting two nodes (u, v), judging whether the cluster Cl (C) exists, if not, adding edges, otherwise, deleting the edges. Until the number of edges in cluster Cl (C) is 4; g in FIG. 6₁' is the graph G₁The result of the randomization of the first packet. Since the edge changes at the time t-2, the changed subgraphs with the sensitive labels are regrouped, then nodes are randomized, and the graph G at the time t-2 is reconstructed by the same method₂G in FIG. 6₂' is the anonymous graph to be published last.

In the process of randomization, only a few edges are randomly added and deleted, the graph is obfuscated by a randomization method, besides, each group is larger than or equal to l, therefore, for any node v in the social network graph, at least l-1 nodes are indistinguishable from the issued anonymous graph, therefore, in the issued social network graph, the probability that an attacker can successfully re-identify a target node does not exceed 1/l, namely, the privacy requirement of l anonymity is met. And meanwhile, by adopting the method for introducing the uncertain edge into the sub-graph, even under the condition of dynamic release, an attacker cannot uniquely determine an individual. Meanwhile, the method for preventing the dynamic network from issuing the neighbor attack of the social network label for multiple times can facilitate the user to directly analyze the issued data, and has obvious effect of protecting the original graph structure in the original social network.

Anonymization treatment: when a new snapshot G_tAnd (4) processing the social network graph according to the situation. In case 1, when edges are added or deleted in a subgraph of a node with a sensitive label, if the subgraph structure of the node with the sensitive label changes, the changed nodes are grouped, namely, a sensitive node Grouping algorithm L-Grouping (G) is called_t) (ii) a Then randomizing the 1-neighbor subgraphs of the grouped nodes; in case 2, if a new node with a sensitized tag is added to the social network graph, all the newly added nodes with the sensitized tag are grouped, and then the tag-neighbor graph of the grouped nodes is randomized.In case 3, if the tag of the node with the sensitive tag changes at the time T +1, if the time B changes to T, the node is issued at the time T +2, that is, the node is delayed to issue, so that the node is not identified by an attacker.

And (3) anonymous method degree test: the following is the case where the algorithm designed by the present invention protects the individual with the security zone sensitive tag before and after the social networking map data is anonymous in this example.

If the attacker knows the label-neighbor graph of the individual a at the time t is 1 and t is 2, the anonymously issued label-neighbor graph of the individual a is no longer the label-neighbor graph in the original graph through the randomization process. And the individual C also has a similar label-neighbor graph, and there is also a change in the label-neighbor graph at time t-2. The attacker cannot identify the individual A with a probability higher than 1/2, and by comparing the two published social network graphs, the attacker cannot uniquely determine the attribute L of the individual A and the attribute L of the individual A, because the node degree of the A is not unique, and the A cannot be found out because the A is not a unique node degree and the correct label-neighbor graph is not matched, so that the occupation of the A and the occupation of the A cannot be revealed.

Claims

1. The anonymous method for preventing the label neighbor attack in the social network dynamic release is characterized by comprising the following steps:

step 1, initializing an original social network graph at the current moment;

step 2.1, set V with sensitive label nodes in social network data_t ^sArranging according to the descending order of the degrees to obtain a new node set;

2.2, selecting the node with the sensitive label with the maximum degree in the new node set, and removing the selected node from the new node set;

step 2.3, calculating the structural similarity of the selected node with the sensitive label and each node in the new node set, and grouping the selected node with the sensitive label and the node with the most similar structural similarity into a group until the number of the nodes contained in the group reaches the privacy level l;

step 2.4, repeating the steps 2.2-2.3 until the new node set does not contain the nodes with the sensitive labels any more, namely finishing grouping work;

and 5, releasing the randomized social network diagram.

2. The anonymous method for preventing label neighbor attacks in dynamic social network distribution according to claim 1, wherein in step 2.1, the node sets are arranged in descending order according to degree to obtain a new node sequence.

3. The anonymous method for preventing the label neighbor attack in the social network dynamic publishing as claimed in claim 1, wherein the specific process of step 4 is as follows: