CN107240026B - Community discovery method suitable for noise network - Google Patents

Community discovery method suitable for noise network Download PDF

Info

Publication number
CN107240026B
CN107240026B CN201710260472.4A CN201710260472A CN107240026B CN 107240026 B CN107240026 B CN 107240026B CN 201710260472 A CN201710260472 A CN 201710260472A CN 107240026 B CN107240026 B CN 107240026B
Authority
CN
China
Prior art keywords
node
network
community
value
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710260472.4A
Other languages
Chinese (zh)
Other versions
CN107240026A (en
Inventor
杨清海
蒋群利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Original Assignee
Xidian University
Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University, Xian Cetc Xidian University Radar Technology Collaborative Innovation Research Institute Co Ltd filed Critical Xidian University
Priority to CN201710260472.4A priority Critical patent/CN107240026B/en
Publication of CN107240026A publication Critical patent/CN107240026A/en
Application granted granted Critical
Publication of CN107240026B publication Critical patent/CN107240026B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of social networks, and discloses a community discovery method suitable for a noise network, which comprises the following steps: calculating the importance value of the nodes in the network, and establishing a core point set and a boundary point set; selecting core representative points to construct prior information; selecting boundary representative points to construct prior information; incorporating the prior information into an extremum optimization process; randomly dividing the network into two parts with approximately equal node numbers according to a topological structure to form an initial community structure; and calculating the contribution value of each node to the community module density, moving the node with the minimum contribution to another part to perform self-organization optimization, and repeating the self-organization optimization process until the module density value of the network is not increased any more. And removing the connecting edge between the two finally obtained communities until the module density value of the whole network reaches the maximum. The invention effectively improves the accuracy of community division with lower cost and improves the robustness of community division in a noise environment.

Description

Community discovery method suitable for noise network
Technical Field
The invention belongs to the technical field of social networks, and particularly relates to a community discovery method suitable for a noise network.
Background
Many networks in the real world, such as telephone networks, mail networks, crime networks, etc., often contain wrong or missing individual connections due to the difficulty in obtaining accurate and complete network structure information, and such networks are called noise networks. Most of the current methods for community discovery are based on the connection relationship between nodes in the network to discover the community structure in the network. Because the methods completely depend on the topological structure of the network, the method cannot be applied to the network with noise, and when the noise ratio in the network is increased, the capability of finding a real community structure is rapidly reduced; under a real network environment, part of prior knowledge of community division is known. For example, we may know that some users belong to a certain community, or that some two users belong to the same or different communities. The prior information is integrated into community division for community discovery, so that the community division accuracy can be effectively improved, and the robustness of community division in a noise environment is improved. However, the existing methods either do not give the prior information from which to come, or randomly extract part of the nodes from the network to form the prior information. In general, the prior information is obtained by labeling nodes selected from the network by experts in the corresponding field. Such labeling work is time-consuming and labor-consuming, and requires high cost, while a random extraction mode is too blind, and the obtained prior information may not have a strong guiding effect, so that the quality of community division cannot be effectively improved with a low cost.
In summary, the problems of the prior art are as follows: the payment cost is high, and the obtained prior information may not have a strong guiding function; the existing community discovery method can not be suitable for a network with noise, and the capability of discovering a real community structure is rapidly reduced along with the increase of the noise ratio in the network; when the prior information is obtained, the high-quality prior information cannot be obtained with low labor cost, and the accuracy of community discovery is reduced.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a community discovery method suitable for a noise network.
The community discovery method suitable for the noise network is based on the organic combination of the community division method of the extremum optimization module density and the active learning method;
the community division method based on the density of the extremum optimization module combines the prior information into the method of the density of the extremum optimization module, optimizes local variables and global variables by utilizing a pair of constraint sets, and guides community discovery in the process of optimizing an objective function;
the active learning method constructs a pair of constraint sets by actively selecting core nodes which can represent local community structures and nodes at community boundaries from a network, and generates high-quality prior information.
Further, the community division method based on the density of the extremum optimization module for the community discovery method suitable for the noise network comprises the following steps:
calculating an importance value of a node in a network, and establishing a core point set and a boundary point set;
selecting core representative points to construct prior information;
selecting boundary representative points to construct prior information;
step four, combining the prior information into an extreme value optimization process;
step five, dividing the network into two parts with approximately equal node numbers randomly according to a topological structure to form an initial community structure;
step six, calculating the contribution value of each node to the community module density, moving the node with the minimum contribution to another part to perform self-organization optimization, and repeating the self-organization optimization process until the module density value of the network is not increased any more;
and step seven, removing the connecting edge between the two finally obtained communities, and then executing the step five and the step six on each sub-network until the module density value of the whole network reaches the maximum value.
Further, the first step specifically includes: the importance value of the calculation node utilizes an index for comprehensively measuring the importance of the node based on the degree and the aggregation coefficient, and is represented as follows:
pi=f(ki)+g(ci);
wherein,
Figure BDA0001274569260000031
is the degree of the node i and,
Figure BDA0001274569260000032
is the cluster coefficient of node i, EiThe number of edges actually between the nodes; f (k)i) Is to kiThe value of the normalization process is the ratio of the difference between the value of the node i and the minimum value of the node in the network to the difference between the maximum value of the node in the network and the minimum value of the node in the network; g (c)i) Is to ciIs the difference between the maximum aggregation coefficient of the nodes in the network and the aggregation coefficient of the node i, and the networkThe ratio of the difference between the maximum aggregation factor of the nodes in the network and the minimum aggregation factor of the nodes in the network.
According to given parameters
Figure BDA0001274569260000033
Determining a set of core points and a set of boundary nodes, all importance values in the network being greater than a given parameter
Figure BDA0001274569260000034
The node(s) of (b) constitute a core point set CS, and the boundary point set BS is a node set constituted by non-core nodes.
Further, the second step specifically includes: if the representative point set RS is empty, selecting a node k with the maximum importance value from the core point set CS and adding the node k into the representative point set RS; otherwise, selecting the node i with the minimum similarity to the representative point set RS from the core point set CS as the candidate representative point, where the similarity between the node i and the set C is represented as:
S(i,C)=max(Sim(i,j)|j∈C);
wherein,
Figure BDA0001274569260000035
N+ithe method is a set formed by a node i and a neighbor node thereof, wherein delta is a gain value, and is selected to be 1;
and constructing prior information < i, j > for each pair of representative points in the representative point set RS, and labeling the constraint types by a domain expert.
Further, the third step specifically includes: selecting a boundary point b1 with the maximum similarity to the node i from the boundary node set BS, if a plurality of nodes meeting the conditions exist, selecting the node with the minimum importance value as a representative point, constructing prior information < i, b1>, and submitting the prior information to a domain expert to mark the constraint type of the prior information;
and selecting a boundary point b2 with the minimum similarity to the node i from the boundary point set BS, if a plurality of nodes meeting the conditions exist, selecting the node with the maximum importance value as a representative point, constructing prior information < i, b2>, and submitting the prior information to a domain expert to label the constraint type.
Further, the fourth step specifically includes: module density of the network D Global variables and contribution of each node to the module density qiThe local variables are related, the module density D is optimized and solved by the mode of punishing violating the constraint condition by using the known pairwise constraint information, and the general form of defining a punishment function is as follows:
Figure BDA0001274569260000041
wherein alpha is1、α2Is a balance factor between the penalty and the reward,<i,j,w,type>e C represents the relevant community membership of nodes i and j,
Figure BDA0001274569260000042
representing non-negative costs of violations of constraints, CiIs the community to which node i belongs, when Ci=CjWhen is delta (C)i,Cj) 1, otherwise δ (C)i,Cj)=0;
And punishing the partition which does not meet the constraint condition by adopting a punishment mode, namely reducing the module density contribution value of the node i. At this time, let U (C) be alpha1=0,α21, hence, the optimized local variable q 'in combination with the prior information'iExpressed as:
Figure BDA0001274569260000043
wherein,
Figure BDA0001274569260000048
represents Community CiNode i and community C withiniThe number of other node-connecting edges in the interior,
Figure BDA0001274569260000046
represents Community CiNode i and community C withiniNumber of connecting edges of other nodes outside, | CiI denotes Community CiNode inThe number of the cells;
rewarding the partitions meeting the constraint conditions in a rewarding mode, namely increasing the value of the global variable D; at this time, let U (C) be alpha1=1,α 20, therefore, the global variable D' optimized in conjunction with a priori information is represented as:
Figure BDA0001274569260000047
wherein,
Figure BDA0001274569260000051
is represented by C1And C2The number of edges in between;
Figure BDA0001274569260000052
is represented by C 12 times the sum of the number of internal edges;
Figure BDA0001274569260000053
represents V1Total number of connecting edges between the internal node and the external node, wherein
Figure BDA0001274569260000054
Given one division of the network G: g1(C1,E1),G2(C2,E2),…,Gm(Cm,Em) In which C isiAnd EiIs GiVertex set and edge set of (i ═ 1,2, …, m), | CiI is Community CiThe number of internal nodes.
Further, in the fifth step: dividing the network G into two parts G randomly according to the topology structure1And G2Each part has approximately equal node number, and nodes connected by edges in each part form a community to form an initial community structure.
Further, in the sixth step: calculating contribution value q 'of each node to community module density'iMoving the node with the minimum contribution to the community module density to another part for self-organization optimization; recalculate each time after each moveThe contribution value of each node; this self-organizing optimization process is repeated until the module density value D' of the network no longer increases.
Further, in the seventh step: removing the connecting edge between the two finally obtained communities to obtain a plurality of connected sub-networks; steps 5 and 6 are performed for each subnetwork until the module density of the whole network is maximized.
Another object of the present invention is to provide a social network applying the community discovery method applicable to a noise network.
The invention has the advantages and positive effects that:
1. as the prior information is integrated into the community division for semi-supervised community discovery, the influence caused by noise is effectively compensated, and the robustness of the community division in a noise environment is improved.
2. The prior information is obtained by adopting an active learning technology, and the prior information which effectively improves the community division quality can be obtained with lower labor cost.
3. The module density is used as a community evaluation function, and the resolution limit phenomenon based on a modularity optimization method is overcome.
When the noise ratios are respectively set to be 2%, 4%, 6%, 8% and 10%, in the Dolphins network, only 10 pairs of constraint NMI values are added, so that the value can be improved by 1-7%, and 20 pairs of constraint NMI values can be improved by 3-14%; similar results are also obtained in Football networks, the addition of 10 pairs of constrained NMI values can increase by 2-7%, and the addition of 20 pairs of constrained NMI values can increase by 6-13%.
Drawings
Fig. 1 is a flowchart of a community discovery method suitable for a noise network according to an embodiment of the present invention.
Fig. 2 is a schematic flow chart illustrating an implementation of the community discovery method suitable for the noise network according to the embodiment of the present invention.
FIG. 3 is a graph illustrating performance of the present invention at different noise ratios on a Dophins network, according to an embodiment of the present invention.
Fig. 4 is a performance evaluation graph of the invention at different noise ratios on a Football network, provided by an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of the principles of the invention is provided in connection with the accompanying drawings.
The community discovery method suitable for the noise network provided by the embodiment of the invention is an organic combination of a community division method based on the density of an extremum optimization module and an active learning method; the prior information is combined into the method of the density of the extremum optimization module, local variables and global variables are optimized by utilizing a pair of constraint sets, and community discovery is guided in the process of optimizing an objective function; core nodes which can represent local community structures and nodes at community boundaries are actively selected from the network to construct a pair of constraint sets, and high-quality prior information is generated.
As shown in fig. 1, the community discovery method applicable to a noise network according to an embodiment of the present invention includes the following steps:
s101: calculating the importance value of the nodes in the network, and establishing a core point set and a boundary point set;
s102: selecting core representative points to construct prior information;
s103: selecting boundary representative points to construct prior information;
s104: incorporating the prior information into an extremum optimization process;
s105: randomly dividing the network into two parts with approximately equal node numbers according to a topological structure to form an initial community structure;
s106: calculating the contribution value of each node to the community module density, moving the node with the minimum contribution to another part to perform self-organization optimization, and repeating the self-organization optimization process until the module density value of the network is not increased any more;
s107: and removing the connecting edge between the two finally obtained communities, and executing the step S105 and the step S106 on each sub-network until the module density value of the whole network reaches the maximum value.
The community discovery method suitable for the noise network provided by the embodiment of the invention specifically comprises the following steps:
step 1: and calculating the importance value of the nodes in the network, and determining a core point set and a boundary point set.
Further, the importance of each node in the network is evaluated by using a node importance measuring index, all nodes with importance values larger than a given parameter in the network form a core point set according to the given parameter, and the boundary point set is a node set formed by non-core points.
Step 2: and selecting core representative points to construct prior information.
Further, if the representative point set is empty, selecting a node k with the largest importance value from the core point set and adding the node k to the representative point set; otherwise, selecting the node i with the minimum similarity to the representative point set from the core point set as a candidate representative point;
further, for each pair of representative points in the representative point set, prior information is constructed, and a domain expert marks the constraint type of the prior information.
And step 3: selecting boundary representative points to construct prior information.
Further, selecting a boundary point b1 with the maximum similarity to the node i from the boundary point set, if a plurality of nodes meeting the conditions exist, selecting the node with the minimum importance value as a representative point, constructing prior information < i, b1>, and submitting the prior information to a domain expert to label the constraint type of the prior information;
further, a boundary point b2 with the minimum similarity to the node i is selected from the boundary point set, if a plurality of nodes meeting the conditions exist, the node with the maximum importance value is selected as a representative point, prior information < i, b2> is constructed, and the prior information is submitted to a domain expert to mark the constraint type.
And 4, step 4: and judging whether the acquired prior information reaches the specified number, if so, continuing to execute the step 5, otherwise, returning to the step 2.
And 5: the a priori information is incorporated into an extremum optimization process.
Further, the module density (global variable) of the network is related to the contribution amount (local variable) of each node to the module density, and the module density is optimized and solved in a mode of punishing (rewarding) violation (meeting) constraint conditions by using known pairwise constraint information;
further, punishment is carried out on the partition which does not meet the constraint condition in a punishment mode, namely the value of the local variable is reduced;
further, the partition meeting the constraint condition is rewarded in a rewarding mode, namely the value of the global variable is increased.
Step 6: initialization: the whole network is randomly divided into two parts, each part has approximately equal node number, and the nodes connected by edges in each part form a community, so that an initial community structure is formed.
And 7: iteration: and moving the node with the minimum contribution to the community module density to another part to perform self-organization optimization, recalculating the contribution value of each node after each movement, and repeating the self-organization optimization process until the module density value of the network is not increased any more.
And 8: optimizing: and (4) removing the connecting edge between the two finally obtained communities to obtain a plurality of connected sub-networks, and then executing the step 6 and the step 7 on each sub-network until the module density value of the whole network reaches the maximum.
The application of the principles of the present invention will now be described in further detail with reference to the accompanying drawings.
As shown in fig. 2, the implementation steps of the present invention are as follows:
step 1: and calculating the importance value of the node, and establishing a core point set and a boundary point set.
Further, the importance value of the computing node utilizes an index for comprehensively measuring the importance of the node based on the degree and the aggregation coefficient, and is represented as follows:
pi=f(ki)+g(ci);
wherein,
Figure BDA0001274569260000091
is the degree of the node i and,
Figure BDA0001274569260000092
is the cluster coefficient of node i, EiIs the number of edges actually between these nodes. f (k)i) Is to kiThe value of the normalization process is the ratio of the difference between the value of the node i and the minimum value of the node in the network to the difference between the maximum value of the node in the network and the minimum value of the node in the network; g (c)i) Is to ciThe normalization process of (1) is a ratio of a difference between the maximum aggregation coefficient of the nodes in the network and the aggregation coefficient of the node i to a difference between the maximum aggregation coefficient of the nodes in the network and the minimum aggregation coefficient of the nodes in the network.
Further, according to given parameters
Figure BDA0001274569260000093
Determining a set of core points and a set of boundary nodes, all importance values in the network being greater than a given parameter
Figure BDA0001274569260000094
The node(s) of (b) constitute a core point set CS, and the boundary point set BS is a node set constituted by non-core nodes.
Step 2: and selecting representative points from the core point set to construct prior information.
Further, if the representative point set RS is empty, selecting the node k with the maximum importance value from the core point set CS and adding the node k into the representative point set RS; otherwise, selecting the node i with the minimum similarity to the representative point set RS from the core point set CS as the candidate representative point, where the similarity between the node i and the set C is represented as:
S(i,C)=max(Sim(i,j)|j∈C);
wherein,
Figure BDA0001274569260000095
N+ithe method is a set formed by a node i and a neighbor node thereof, wherein delta is a gain value, and is selected to be 1;
further, for each pair of representative points in the representative point set RS, prior information < i, j > is constructed, and the constraint types are labeled by domain experts.
And step 3: and selecting representative points from the boundary point set to construct prior information.
Further, selecting a boundary point b1 with the maximum similarity to the node i from the boundary node set BS, if a plurality of nodes meeting the conditions exist, selecting the node with the minimum importance value as a representative point, constructing prior information < i, b1>, and submitting the prior information to a domain expert to label the constraint type of the prior information;
further, a boundary point b2 with the minimum similarity to the node i is selected from the boundary point set BS, if a plurality of nodes meeting the conditions exist, the node with the maximum importance value is selected as a representative point, prior information < i, b2> is constructed, and the constraint type is marked by a domain expert.
And 4, step 4: the a priori information is incorporated into an extremum optimization process.
Further, the module density D (global variable) of the network and the contribution q of each node to the module densityi(local variables) are related, and the module density D is optimized and solved by the form of violation (conformity) of penalty (reward) constraint condition by using known paired constraint information, and the general form of defining penalty (reward) function is as follows:
Figure BDA0001274569260000101
wherein alpha is1、α2Is a balance factor between the penalty and the reward,<i,j,w,type>e C represents the relevant community membership of nodes i and j,
Figure BDA0001274569260000102
representing non-negative costs of violations of constraints, CiIs the community to which node i belongs, when Ci=CjWhen is delta (C)i,Cj) 1, otherwise δ (C)i,Cj)=0;
Further, punishment is carried out on the partition which does not meet the constraint condition in a punishment mode, namely the module density contribution value of the node i is reduced. At this time, let U (C) be alpha1=0,α21, hence, the optimized local variable q 'in combination with the prior information'iExpressed as:
Figure BDA0001274569260000103
wherein,
Figure BDA0001274569260000107
represents Community CiNode i and community C withiniThe number of other node-connecting edges in the interior,
Figure BDA0001274569260000106
represents Community CiNode i and community C withiniNumber of connecting edges of other nodes outside, | CiI denotes Community CiThe number of nodes within;
further, the partition meeting the constraint condition is rewarded in a rewarding mode, namely the value of the global variable D is increased. At this time, let U (C) be alpha1=1,α 20, therefore, the global variable D' optimized in conjunction with a priori information is represented as:
Figure BDA0001274569260000111
wherein,
Figure BDA0001274569260000112
is represented by C1And C2The number of edges in between;
Figure BDA0001274569260000113
is represented by C 12 times the sum of the number of internal edges;
Figure BDA0001274569260000114
represents V1Total number of connecting edges between the internal node and the external node, wherein
Figure BDA0001274569260000115
Given networkOne division of G: g1(C1,E1),G2(C2,E2),…,Gm(Cm,Em) In which C isiAnd EiIs GiVertex set and edge set of (i ═ 1,2, …, m), | CiI is Community CiThe number of internal nodes.
And 5: dividing the network G into two parts G randomly according to the topology structure1And G2Each part has approximately equal node number, and nodes connected by edges in each part form a community to form an initial community structure.
Step 6: calculating contribution value q 'of each node to community module density'iMoving the node with the minimum contribution to the community module density to another part for self-organization optimization; recalculating the contribution value of each node after each movement; this self-organizing optimization process is repeated until the module density value D' of the network no longer increases.
And 7: removing the connecting edge between the two finally obtained communities to obtain a plurality of connected sub-networks; steps 5 and 6 are performed for each subnetwork until the module density of the whole network is maximized.
The effects of the present invention will be described in detail below with reference to performance evaluation.
FIG. 3 is a graph of performance evaluation of the present invention on a Dophins network at different noise ratios.
FIG. 4 is a graph of performance evaluation of the present invention at different noise ratios on a Football network.
The dopins network was constructed by d.lusseau et al observing a dolphin population inhabiting the Doubtful Sound channel in new zealand for up to 7 years. The network includes 62 nodes and 159 edges, where each node in the network represents one dolphin in the population and an edge represents that two connected dolphins have frequent contacts.
Football networks are networks abstractly built by Girvan and Newman for the American college student Football league for the 2000 season. The network includes 115 nodes and 613 edges, where each node in the network represents a football team and the edges represent two teams playing during the season. The games between teams in the same league are more frequent and the games between teams in different leagues are less.
As can be seen from fig. 3 and 4, the performance of the algorithm can be greatly improved by increasing the prior information at each noise ratio of the two networks. In the Dolphins network, the constraint NMI value can be improved by 1-7% by adding 10 pairs of constraint NMI values, and can be improved by 3-14% by adding 20 pairs of constraint NMI values; similar results are also obtained in Football networks, the addition of 10 pairs of constrained NMI values can increase by 2-7%, and the addition of 20 pairs of constrained NMI values can increase by 6-13%.
With the increase of the noise ratio in the network, the performance of the community discovery method completely depending on the network topology structure can be rapidly reduced, and the prior information is integrated into the community discovery process, so that the influence caused by noise can be effectively compensated, and the higher community division accuracy is kept.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A community discovery system applicable to a noise network for implementing a community discovery method applicable to the noise network is characterized in that the community discovery method applicable to the noise network is based on the organic combination of a community division method of extremum optimization module density and an active learning method;
the community division method based on the density of the extremum optimization module combines the prior information into the method of the density of the extremum optimization module, optimizes local variables and global variables by utilizing a pair of constraint sets, and guides community discovery in the process of optimizing an objective function;
the active learning method comprises the steps of actively selecting core nodes capable of representing local community structures and nodes at community boundaries from a network to construct a pair of constraint sets, and generating high-quality prior information;
the community division method based on the density of the extremum optimization module for the community discovery method suitable for the noise network comprises the following steps:
calculating an importance value of a node in a network, and establishing a core point set and a boundary point set;
selecting core representative points to construct prior information;
selecting boundary representative points to construct prior information;
step four, combining the prior information into an extreme value optimization process;
step five, dividing the network into two parts with approximately equal node numbers randomly according to a topological structure to form an initial community structure;
step six, calculating the contribution value of each node to the community module density, moving the node with the minimum contribution to another part to perform self-organization optimization, and repeating the self-organization optimization process until the module density value of the network is not increased any more;
and step seven, removing the connecting edge between the two finally obtained communities, and then executing the step five and the step six on each sub-network until the module density value of the whole network reaches the maximum value.
2. The community discovery system for a noise network of claim 1, wherein said first step comprises: the importance value of the calculation node utilizes an index for comprehensively measuring the importance of the node based on the degree and the aggregation coefficient, and is represented as follows:
pi=f(ki)+g(ci);
wherein,
Figure FDA0002793202090000021
is the degree of the node i and,
Figure FDA0002793202090000022
is the cluster coefficient of node i, EiThe number of edges actually between the nodes; f (k)i) Is to kiIs the difference between the value of node i and the minimum value of the node in the network and the maximum value of the node in the network and the minimum value of the node in the networkThe ratio of the difference between the values; g (c)i) Is to ciThe value of the normalization process of (1) is the ratio of the difference between the maximum aggregation coefficient of the nodes in the network and the aggregation coefficient of the node i to the difference between the maximum aggregation coefficient of the nodes in the network and the minimum aggregation coefficient of the nodes in the network;
according to given parameters
Figure FDA0002793202090000024
Determining a set of core points and a set of boundary nodes, all importance values in the network being greater than a given parameter
Figure FDA0002793202090000025
The node(s) of (b) constitute a core point set CS, and the boundary point set BS is a node set constituted by non-core nodes.
3. The community discovery system for a noise network as claimed in claim 1, wherein said second step specifically comprises: if the representative point set RS is empty, selecting a node k with the maximum importance value from the core point set CS and adding the node k into the representative point set RS; otherwise, selecting the node i with the minimum similarity to the representative point set RS from the core point set CS as the candidate representative point, where the similarity between the node i and the set CS is represented as:
S(i,CS)=max(Sim(i,j)|j∈CS);
wherein,
Figure FDA0002793202090000023
N+ithe method is a set formed by a node i and a neighbor node thereof, wherein delta is a gain value, and is selected to be 1;
and constructing prior information < i, j > for each pair of representative points in the representative point set RS, and marking the constraint types of the prior information < i, j >.
4. The community discovery system for a noise network of claim 1, wherein said step three specifically comprises: selecting a boundary point b1 with the maximum similarity to the node i from the boundary node set BS, if a plurality of nodes meeting the conditions exist, selecting the node with the minimum importance value as a representative point, constructing prior information < i, b1>, and marking the constraint type of the prior information < i, b1 >;
and selecting a boundary point b2 with the minimum similarity to the node i from the boundary point set BS, if a plurality of nodes meeting the conditions exist, selecting the node with the maximum importance value as a representative point, constructing prior information < i, b2>, and submitting the prior information to a domain expert to label the constraint type.
5. The community discovery system for a noise network of claim 1, wherein said step four comprises: module density of the network D Global variables and contribution of each node to the module density qiThe local variables are related, the module density D is optimized and solved by the mode of punishing violating the constraint condition by using the known pairwise constraint information, and the general form of defining a punishment function is as follows:
Figure FDA0002793202090000031
wherein alpha is1、α2Is a balance factor between the penalty and the reward,<i,j,w,type>e C represents the relevant community membership of nodes i and j,
Figure FDA0002793202090000032
representing non-negative costs of violations of constraints, CiIs the community to which node i belongs, when Ci=CjWhen is delta (C)i,Cj) 1, otherwise δ (C)i,Cj)=0;
Punishing the partition which does not meet the constraint condition by adopting a punishing mode, namely reducing the module density contribution value of the node i; at this time, let U (C) be alpha1=0,α21, hence, the optimized local variable q 'in combination with the prior information'iExpressed as:
Figure FDA0002793202090000033
wherein,
Figure FDA0002793202090000034
Figure FDA0002793202090000035
represents Community CiNode i and community C withiniThe number of other node-connecting edges in the interior,
Figure FDA0002793202090000036
represents Community CiNode i and community C withiniNumber of connecting edges of other nodes outside, | CiI denotes Community CiThe number of nodes within;
rewarding the partitions meeting the constraint conditions in a rewarding mode, namely increasing the value of the global variable D; at this time, let U (C) be alpha1=1,α20, therefore, the global variable D' optimized in conjunction with a priori information is represented as:
Figure FDA0002793202090000037
wherein,
Figure FDA0002793202090000041
Figure FDA0002793202090000042
is represented by C1And C2The number of edges in between;
Figure FDA0002793202090000043
is represented by C12 times the sum of the number of internal edges;
Figure FDA0002793202090000044
represents V1Total number of connecting edges between the internal node and the external node, wherein
Figure FDA0002793202090000045
Given one division of the network G: g1(C1,E1),G2(C2,E2),…,Gm(Cm,Em) In which C isiAnd EiIs GiVertex set and edge set of (i ═ 1,2, …, m), | CiI is Community CiThe number of internal nodes.
6. The community discovery system for noise networks of claim 1, wherein in step five: dividing the network G into two parts G randomly according to the topology structure1And G2Each part has approximately equal node number, and nodes connected by edges in each part form a community to form an initial community structure.
7. The community discovery system for a noise network of claim 1, wherein in step six: calculating contribution value q 'of each node to community module density'iMoving the node with the minimum contribution to the community module density to another part for self-organization optimization; recalculating the contribution value of each node after each movement; this self-organizing optimization process is repeated until the module density value D' of the network no longer increases.
8. The community discovery system for a noise network of claim 1, wherein in step seven: removing the connecting edge between the two finally obtained communities to obtain a plurality of connected sub-networks; steps 5 and 6 are performed for each subnetwork until the module density of the whole network is maximized.
CN201710260472.4A 2017-04-20 2017-04-20 Community discovery method suitable for noise network Active CN107240026B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710260472.4A CN107240026B (en) 2017-04-20 2017-04-20 Community discovery method suitable for noise network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710260472.4A CN107240026B (en) 2017-04-20 2017-04-20 Community discovery method suitable for noise network

Publications (2)

Publication Number Publication Date
CN107240026A CN107240026A (en) 2017-10-10
CN107240026B true CN107240026B (en) 2021-01-29

Family

ID=59983069

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710260472.4A Active CN107240026B (en) 2017-04-20 2017-04-20 Community discovery method suitable for noise network

Country Status (1)

Country Link
CN (1) CN107240026B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108880935B (en) * 2018-06-05 2020-09-15 广州杰赛科技股份有限公司 Method and device for obtaining importance of network node, equipment and storage medium
CN111694862A (en) * 2019-03-11 2020-09-22 北京京东尚科信息技术有限公司 Data stream processing method and system, electronic device and storage medium
CN113326064A (en) * 2021-06-10 2021-08-31 深圳前海微众银行股份有限公司 Method for dividing business logic module, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810288A (en) * 2014-02-25 2014-05-21 西安电子科技大学 Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN106327357A (en) * 2016-08-17 2017-01-11 深圳先进技术研究院 Load identification method based on improved probabilistic neural network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8768001B2 (en) * 2009-12-10 2014-07-01 The Chinese University Of Hong Kong Apparatus and methods for generating and processing manga-style images

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103810288A (en) * 2014-02-25 2014-05-21 西安电子科技大学 Method for carrying out community detection on heterogeneous social network on basis of clustering algorithm
CN103905246A (en) * 2014-03-06 2014-07-02 西安电子科技大学 Link prediction method based on grouping genetic algorithm
CN106327357A (en) * 2016-08-17 2017-01-11 深圳先进技术研究院 Load identification method based on improved probabilistic neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于因子图模型的半监督社区发现方法;黄立威;《自动化学报》;20161031;全文 *

Also Published As

Publication number Publication date
CN107240026A (en) 2017-10-10

Similar Documents

Publication Publication Date Title
CN104598605B (en) A kind of user force appraisal procedure in social networks
CN110532436B (en) Cross-social network user identity recognition method based on community structure
CN107240026B (en) Community discovery method suitable for noise network
CN109617888B (en) Abnormal flow detection method and system based on neural network
Skibski et al. Algorithms for the Shapley and Myerson values in graph-restricted games
CN109766710B (en) Differential privacy protection method of associated social network data
CN113422695B (en) Optimization method for improving robustness of topological structure of Internet of things
CN105893637A (en) Link prediction method in large-scale microblog heterogeneous information network
CN109960755B (en) User privacy protection method based on dynamic iteration fast gradient
CN115630328A (en) Identification method of key nodes in emergency logistics network
CN114268547A (en) Multi-attribute decision-making air emergency communication network key node identification method
Procaccia et al. Optimal aggregation of uncertain preferences
CN107169594B (en) Optimization method and device for vehicle path problem
CN104715034A (en) Weighed graph overlapping community discovery method based on central persons
CN112633591B (en) Space searching method and device based on deep reinforcement learning
You et al. Accuracy Degrading: Toward Participation-Fair Federated Learning
Chen et al. Learning implicit information in Bayesian games with knowledge transfer
CN109472712A (en) A kind of efficient Markov random field Combo discovering method strengthened based on structure feature
CN109255433B (en) Community detection method based on similarity
CN117172332A (en) Node contribution degree measuring method in collaborative learning process
CN116566777A (en) Frequency hopping signal modulation identification method based on graph convolution neural network
Yuan et al. A Multi‐Granularity Backbone Network Extraction Method Based on the Topology Potential
CN112015954B (en) Martha effect-based community detection method
CN115242659A (en) High-order collective influence-based hyper-network node analysis method
Kong et al. Taprank: A time-aware author ranking method in heterogeneous networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant