CN105302823A - Overlapped community parallel discovery method and system - Google Patents

Overlapped community parallel discovery method and system Download PDF

Info

Publication number
CN105302823A
CN105302823A CN201410302016.8A CN201410302016A CN105302823A CN 105302823 A CN105302823 A CN 105302823A CN 201410302016 A CN201410302016 A CN 201410302016A CN 105302823 A CN105302823 A CN 105302823A
Authority
CN
China
Prior art keywords
community
node
concentration class
communities
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410302016.8A
Other languages
Chinese (zh)
Inventor
徐敏
周修庄
刘卉
吴敏华
周丽娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Capital Normal University
Original Assignee
Capital Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Capital Normal University filed Critical Capital Normal University
Priority to CN201410302016.8A priority Critical patent/CN105302823A/en
Publication of CN105302823A publication Critical patent/CN105302823A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present invention discloses an overlapped community parallel fast discovery method and system. Firstly, a node distribution network diagram is read, a community set is established, and the node distribution network diagram is associated with the community set; then first n nodes with the highest node degree are respectively used as center nodes of n initial communities, and the n initial communities are stored in the community set; the n initial communities are simultaneously subjected to the following procedures of: using directly adjacent nodes of each center node as candidate member nodes of the corresponding community, sequentially judging whether the candidate member nodes belong to the community, incorporating the nodes belonging to the community into the community, using directly adjacent nodes of the nodes belonging to the community as candidate member nodes of the community and repeating the step; and removing the nodes which do not belong to the community from a candidate member set; for nodes which are not incorporated to any community, the above steps are repeatedly executed until all the nodes in the node distribution network diagram belong to one community; and finally, community incorporation is performed by a community overlapping degree and the effect of two communities on mutual community gathering degrees.

Description

Overlapping community walks abreast the method and system found
Technical field
The present invention relates to complex network field, in particular to one, based on MapReduce, (Map, maps; Reduce, abbreviation; MapReduce is the software architecture that Google proposes, the concurrent operation for large-scale dataset) overlapping community to walk abreast the method and system found fast.
Background technology
Virtual community refers to the social clustering phenomenon based on internet, it constitutes the personal relationship network of certain scale, has been polymerized the user of same interest in virtual network, enable them to mutually propose, express, exchange the viewpoint of oneself, the user of different community also may be mutual.Community can provide timely for user, reliable, valuable information, is still conducive to businessman simultaneously and finds client exactly.But community is hidden in numerous numerous and diverse relation and is connected behind, it is not dominant existence, this just needs researchist use correlation technique and technology the community discovery hidden and excavate out, and the community relations of discovery then can be utilized to provide personalized application and service for all individualities.
In recent years along with high speed development and the popularization and application of network technology, community discovery has developed into a study hotspot problem interdisciplinary, and the researchist of multiple subjects such as sociology, pedagogy, psychology expands series of studies to it from different perspectives.From more broadly angle, most of system of real world can be described by network, and network is the relation of set by node and these nodes, and the namely limit set of connected node formed.A large amount of real data all shows that complex network is all isomery usually, and that is it is made up of various dissimilar node, associates more between the node of wherein same type, and belong to connect between dissimilar node relatively less.
For large complicated network, the design concept of conventional community discovery method and mode of operation cannot complete fast in real time.
Summary of the invention
The invention provides a kind of overlapping community based on MapReduce to walk abreast discover method and system, in order to the community in the large complicated network of quick-searching.
For achieving the above object, the invention provides a kind of overlapping community and to walk abreast rapid discovery method, comprising the following steps:
S1: read Node distribution network chart from document data set; Establishment community gathers; Described Node distribution network chart is gathered with described community and is associated;
S2: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of the highest node of front n the node number of degrees respectively as n initial community; Now the concentration class of each initial community is 0; N initial community is stored in the set of described community; Wherein n be more than or equal to 1 natural number;
S3: following program is performed simultaneously to each of the initial community of said n: the candidate member's node direct neighbor node of described Centroid being elected as this community, and the candidate member's set described candidate member's node being added this community; Judge whether the described candidate member's node in described candidate member set belongs to this community, the node belonging to this community is incorporated to this community, the direct neighbor node of the node belonging to this community is elected as to candidate member's node of this community, repeat this step successively; Shift out described candidate member set to the node not belonging to this community not deal with; This community data is gathered stored in described community;
S4: judge whether have node not to be incorporated to any community in described Node distribution network chart, if so, then to node repeated execution of steps S2, S3, S4 of not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
S5: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and upgrades the corresponding data of described community set;
S6: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
Wherein, in step S3, realize in multi-threaded parallel mode each program performed of the initial community of said n simultaneously, the method whether node judging in described candidate member set belongs to this community is: the node contribution degree calculating each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that the node in described candidate member set belongs to this community and calculates the first intermediary communities concentration class of this community successively, to suppose not by the node in described candidate member set and the concentration class of this community fashionable is initial aggregation degree, if this first intermediary communities concentration class of this community is greater than this initial community concentration class, then respective nodes is incorporated to this community; If this first intermediary communities concentration class is less than this initial community concentration class, then respective nodes and node below thereof are not incorporated to this community, be judged as that this community discovery is complete when not having described first intermediary communities concentration class to be greater than this initial community's concentration class, this initial community concentration class is community's concentration class of this community.
Wherein, in step S3, realize under MapReduce framework each program performed of the initial community of said n simultaneously, the method whether node judging in described candidate member set belongs to this community is:
Suppose that all nodes in described candidate member set belong to this community respectively, second intermediary communities concentration class of this community when the node simultaneously calculated respectively in described candidate member set belongs to this community, and all nodes in described candidate member's set are sorted from big to small according to described second intermediary communities concentration class, to suppose not by the node in described candidate member set and the concentration class of this community fashionable is initial aggregation degree, compare with this initial community concentration class by the described second intermediary communities concentration class of each node successively, if the described second intermediary communities concentration class of node is greater than this initial community concentration class, then corresponding node is incorporated to this community, if this second intermediary communities concentration class is less than this initial community concentration class, then respective nodes machine node below is not incorporated to this community, be judged as that this community discovery is complete when not having described first intermediary communities concentration class to be greater than this initial community's concentration class, this initial community concentration class is community's concentration class of this community.
Wherein, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
Wherein, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 Σ i = 1 N - 1 D ( V i , V 0 ) × L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
Wherein, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A ∩ C B C A ∪ C B
Wherein, community A and community B is two different communities.
In addition, the present invention also provides a kind of overlapping community to walk abreast Fast Discovery System, comprising:
Community discovery preparation module, for: from document data set, read Node distribution network chart; Establishment community gathers; Described Node distribution network chart is gathered with described community and is associated;
Initial community determination module, for: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of the highest node of front n the node number of degrees respectively as n initial community; Now the concentration class of each initial community is 0; N initial community is stored in the set of described community; Wherein n be more than or equal to 1 natural number;
Community is parallel finds module, for: to each of the initial community of said n, the direct neighbor node (namely having the node that limit is connected) of described Centroid is elected as candidate member's node of this community, and described candidate member's node is added candidate member's set of this community; Calculate the node contribution degree of each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that it belongs to this community and calculates the intermediary communities concentration class of this community successively, if this intermediary communities concentration class is greater than this initial community concentration class, then respective nodes is incorporated to this community, and repeats this step; If this intermediary communities concentration class is less than this initial community concentration class, then respective nodes is not incorporated to this community, and is judged as that this community discovery is complete, and this initial community concentration class is community's concentration class of this community; This community data is gathered stored in described community;
Judge control module, for: judge whether have node not to be incorporated to any community in described Node distribution network chart, if, then call described initial community determination module and described community parallel discovery module, community discovery is re-started to the node not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
First community merges module, for: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and upgrades the corresponding data of described community set;
Second community merges module, for: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
Wherein, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
Wherein, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 Σ i = 1 N - 1 D ( V i , V 0 ) × L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
Wherein, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A ∩ C B C A ∪ C B
Wherein, community A and community B is two different communities.
Compared with prior art, beneficial effect of the present invention is embodied in:
The present invention proposes a kind of overlapping community based on localized clusters degree to walk abreast discover method, multi-threading can be used to realize, also, under can be embodied in MapReduce parallel computation frame, while algorithm can being walked abreast based on the strategy of localized clusters degree, multiple community is found; In strategy of the present invention, community can juxtaposition, crossover node is had between namely different communities, data between different community's detections can not produce association and impact, so the community to different Centroid that can walk abreast based on the parallel communities detecting module of MapReduce framework carries out iteration detection, until Data Convergence and terminating; The parallelization strategies that the present invention proposes under MapReduce framework, parallelization strategies is reasonable in design, effective, the magnanimity computing power of computer node can be made full use of, this parallel model of the Web vector graphic of small data quantity carries out community's detection, and along with the increase of computer node, the communication cost between computer node also increases, model Advantages found is not obvious, but when processing the data of the large order of magnitude, the paralleling tactic Advantages found of the method is especially remarkable, and efficiency of algorithm significantly improves.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme of the embodiment of the present invention, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is that the overlapping community of one embodiment of the invention walks abreast the program flow diagram of rapid discovery method;
Fig. 2 is that the overlapping community of one embodiment of the invention walks abreast the flow chart of steps of rapid discovery method.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not paying the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Shown in Fig. 1, Fig. 2, overlapping community provided by the invention walks abreast rapid discovery method.
Here is first embodiment of the present invention, comprises the following steps:
S1: read Node distribution network chart mgraph from document data set; Create community set Cset and be initialized as empty set; Node distribution network chart mgraph and community are gathered Cset be associated;
S2: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of node respectively as n initial community of front n the node number of degrees the highest (topn); Now each initial community only has a Centroid (hubnode), and community concentration class M is 0; N initial community is stored in community's set Cset;
S3: following program is performed simultaneously (as shown in Figure 1 to each of the initial community of said n, initial community 1 carries out to initial community n is parallel): the candidate member's node direct neighbor node of described Centroid hubnode (namely by node that limit is connected with Centroid) being elected as this community, and the candidate member described candidate member's node being added this community gathers; Calculate the node contribution degree of each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that it belongs to this community and calculates the intermediary communities concentration class of this community successively, if this intermediary communities concentration class is greater than this initial community concentration class, then respective nodes is incorporated to this community, and repeats this step; If this intermediary communities concentration class is less than this initial community concentration class, then respective nodes is not incorporated to this community, and is judged as that this community discovery is complete, and this initial community concentration class is community's concentration class of this community; This community data is gathered Cset stored in community;
S4: judge whether have node not to be incorporated to any community in described Node distribution network chart, if so, then to node repeated execution of steps S2, S3, S4 of not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
S5: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and more new communities gather the corresponding data of Cset;
S6: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
Here is second embodiment of the present invention, comprises the following steps:
S1: read Node distribution network chart mgraph from document data set; Create community set Cset and be initialized as empty set; Node distribution network chart mgraph and community are gathered Cset be associated;
S2: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of node respectively as n initial community of front n the node number of degrees the highest (topn); Now each initial community only has a Centroid (hubnode), and community concentration class M is 0; N initial community is stored in community's set Cset;
S3: following program is performed simultaneously (as shown in Figure 1 to each of the initial community of said n, initial community 1 carries out to initial community n is parallel): the candidate member's node direct neighbor node of described Centroid hubnode (namely by node that limit is connected with Centroid) being elected as this community, and the candidate member described candidate member's node being added this community gathers; The task iteration be made up of two MapReduce is run, judge whether the described candidate member's node in described candidate member set belongs to this community, the node belonging to this community is incorporated to this community, the direct neighbor node of the node belonging to this community is elected as to candidate member's node of this community, repeat this step; Described candidate member set is shifted out to the node not belonging to this community and does not do other process; This community data is gathered stored in described community; Be specially:
Set up global variable, comprise the distance sum etc. of the inner existing inner fillet number in community and all nodes in community, the number of the existing node in community, total number of degrees of the existing node in community, the existing node in community and Centroid, put into a HDFS (HadoopDistributedFileSystem, distributed file system) in file, all computer nodes can obtain their value;
Mapper1 reads the node data in Node distribution network chart mgraph, the form of this node data is [<Node, HubNode>, <FlagDistanceM{adjNodes}>], wherein HubNode is the Centroid of certain initial community, and Node is present node, two tuple <Node, HubNode> is key (index), for searching node; <FlagDistanceM{adjNodes}> is the text that nodal information generates is value (content of storage), Flag is node label position, value has four kinds: represent when being 0 that node is untreated, represent when being 1 node be certain community both candidate nodes, represent when being 2 node be newly add certain community node, represent that node processing is complete when being 3, Distance is the distance between present node and HubNode, and M is community's concentration class present node being added this community behind the community of HubNode place.After initial community determines, only have the flag initial value of each Centroid HubNode to be 2, the flag initial value of other nodes is 0; When a node is added into both candidate nodes set, its flag value is assigned 1; When a node is determined to belong to certain community, its flag value is assigned 2; After all adjacent nodes of this node are added to both candidate nodes collection, its flag value is assigned 3.Mapper1 reads in the data block of node, if the Flag value of node equals 2, represent that this node is the node being newly incorporated to community, community's both candidate nodes set is added with it by the node that limit is connected all, and export corresponding <k ', the value of v ' > (k is key, and v is value, and this form represents new data block); If the node Flag value of the data block of reading in is not equal to 2, then the direct former state of former data block exports;
The task of Reducer1 is that the data that Mapper1 generates are arranged, carry out because Mapper1 walks abreast according to n initial community, crossover node (namely simultaneously processed in multiple community node) may be had, therefore Mapper1 export data in a node may have many data, their information is carried out merging gather, the data layout that last Reducer1 exports remains [<Node, HubNode>, <FlagDistanceM{adjNodes}>], but a node only has data,
Mapper2 reads in the node data block [<Node that Reducer1 exports, HubNode>, <FlagDistanceM{adjNodes}>], if the value of the Flag of node is 1, then this node is a member that community's both candidate nodes is concentrated, so suppose that this node is incorporated to the community of node centered by node HubNode, calculate now the value of this community's accumulations degree assignment to the M of this node, i.e. the intermediary communities concentration class of community; After Mapper2 upgrades M value, this node is exported again, and output format is [<HubNode, Flag>, <NodeDistanceM{adjNodes}>].Two tuple <HubNode, Flag> export Key value, and the key value of all both candidate nodes of same like this community is all identical, and they can be output to same reduce node;
Reducer2 reads in all both candidate nodes of certain community, compare their M value, and by M value order from big to small, all both candidate nodes are sorted, from the node that M value is maximum, by node M value compared with the initial community concentration class of this community, if node M value is greater than the initial community concentration class of this community, then the Flag of this node be set to 2 (being namely included into this community) and continue more next, if node M value is less than the initial community concentration class of this community, then below do not need compare, this node and later node do not belong to this community, this community discovery is complete, upgrade global variable Aggregate, wherein global variable Aggregate represents whether the discovery procedure of a community completes, before MapReduce iteration is run each time, the value of write global variable Aggregate (set) is 0, when MapReduce performs, if there is node to be incorporated to any one community, the value of then putting Aggregate is 1, at the end of one takes turns Mapper1-Reducer1-Mapper2-Reducer2 task, detect the value of Aggregate, if Aggregate still equals 0, representing all just all has been found that complete in the community of parallel detecting, if the value of Aggregate is 1, then continue to perform new one and take turns Mapper1-Reducer1-Mapper2-Reducer2 task until algorithm convergence (namely Aggregate remains 0).
S4: judge whether have node not to be incorporated to any community in described Node distribution network chart, if so, then to node repeated execution of steps S2, S3, S4 of not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
S5: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and more new communities gather the corresponding data of Cset;
Community's degree of overlapping detects and merges execution map and the reduce task of module iteration, principal function is set as first property the target community that current consideration merges and writes global data, all map nodes read in community's record respectively, target community's information is read from global data, calculate community's degree of overlapping of target community and this community, if community's degree of overlapping is greater than threshold value (in one embodiment of the invention, this threshold value is set as 0.7), then this community OverlapFlag (overlapping mark) be set to 1, otherwise OverlapFlag is set to 0; Map is used as the community ID of target community as key value, the information (comprising overlapping mark) of this community and target community is used as value value, exports <key, value > data pair.Reduce checks the <key that map exports, the data of list<value>> (information list that all communities are relevant to target community), whether check in list<value> has OverlapFlag to equal the record of 1, Article 1 OverlapFlag equal 1 community and target community merge, generate new community and export the corresponding <key of new communities, list<value>> data, the new communities generated are upgraded global data as target community simultaneously, the MergeFlag (merging mark) putting global data is again masked as 1.The meaning merging mark MergeFlag is that set to 0 by MergeFlag before each iteration task, when there being community to merge, MergeFlag is set to 1.If record OverlapFlag values all in list<value> all equals 0, then represent that community's degree of overlapping of target community and all communities is all less than and set threshold value 0.7, it does not need to merge with any community, and so the value of MergeFlag is 0.OverlapFlag value is equaled to community's record of 1, adjustment formatted output <key ', value ' > data pair.The task of Map judges whether community's degree of overlapping of target community and all communities exceedes threshold value, and the task of Reduce is in the community exceeding threshold value, get one and target community's merging, if do not exceed the community of threshold value, gives and merge mark MergeFlag assignment.Map-Reduce runs once, principal function determining program end condition (global data community process of aggregation complete and current MergeFlag be masked as 0), if condition meets, EOP (end of program) now obtains all community datas after merging; If condition does not meet, continue iteration and perform new one and take turns map-reduce task, degree of overlapping inspection is carried out to new target community.
S6: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
In one embodiment of the invention, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
In one embodiment of the invention, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 &Sigma; i = 1 N - 1 D ( V i , V 0 ) &times; L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
In one embodiment of the invention, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A &cap; C B C A &cup; C B
Wherein, community A and community B is two different communities.
In addition, the present invention also provides a kind of overlapping community to walk abreast Fast Discovery System, comprising:
Community discovery preparation module, for: from document data set, read Node distribution network chart; Establishment community gathers; Described Node distribution network chart is gathered with described community and is associated;
Initial community determination module, for: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of the highest node of front n the node number of degrees respectively as n initial community; Now the concentration class of each initial community is 0; N initial community is stored in the set of described community;
Community is parallel finds module, for: to each of the initial community of said n, the direct neighbor node (namely having the node that limit is connected) of described Centroid is elected as candidate member's node of this community, and described candidate member's node is added candidate member's set of this community; Calculate the node contribution degree of each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that it belongs to this community and calculates the intermediary communities concentration class of this community successively, if this intermediary communities concentration class is greater than this initial community concentration class, then respective nodes is incorporated to this community, and repeats this step; If this intermediary communities concentration class is less than this initial community concentration class, then respective nodes is not incorporated to this community, and is judged as that this community discovery is complete, and this initial community concentration class is community's concentration class of this community; This community data is gathered stored in described community;
Judge control module, for: judge whether have node not to be incorporated to any community in described Node distribution network chart, if, then initial community's determination module described in electrophoresis and described community parallel discovery module, community discovery is re-started to the node not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
First community merges module, for: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and upgrades the corresponding data of described community set;
Second community merges module, for: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
In one embodiment of the invention, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
In one embodiment of the invention, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 &Sigma; i = 1 N - 1 D ( V i , V 0 ) &times; L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
In one embodiment of the invention, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A &cap; C B C A &cup; C B
Wherein, community A and community B is two different communities.
One of ordinary skill in the art will appreciate that: accompanying drawing is the schematic diagram of an embodiment, the module in accompanying drawing or flow process might not be that enforcement the present invention is necessary.
One of ordinary skill in the art will appreciate that: the module in the device in embodiment can describe according to embodiment and be distributed in the device of embodiment, also can carry out respective change and be arranged in the one or more devices being different from the present embodiment.The module of above-described embodiment can merge into a module, also can split into multiple submodule further.
Last it is noted that above embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to previous embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in previous embodiment, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of embodiment of the present invention technical scheme.

Claims (10)

1. overlapping community walks abreast a rapid discovery method, it is characterized in that, comprises the following steps:
S1: read Node distribution network chart from document data set; Establishment community gathers; Described Node distribution network chart is gathered with described community and is associated;
S2: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of the highest node of front n the node number of degrees respectively as n initial community; Now the concentration class of each initial community is 0; N initial community is stored in the set of described community;
S3: following program is performed simultaneously to each of the initial community of said n: the candidate member's node direct neighbor node of described Centroid being elected as this community, and the candidate member's set described candidate member's node being added this community; Judge whether the described candidate member's node in described candidate member set belongs to this community, the node belonging to this community is incorporated to this community, the direct neighbor node of the node belonging to this community is elected as to candidate member's node of this community, repeat this step successively; Described candidate member set is shifted out to the node not belonging to this community; This community data is gathered stored in described community;
S4: judge whether have node not to be incorporated to any community in described Node distribution network chart, if so, then to node repeated execution of steps S2, S3, S4 of not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
S5: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and upgrades the corresponding data of described community set;
S6: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
2. overlapping community according to claim 1 walks abreast rapid discovery method, it is characterized in that, in step S3, realize in multi-threaded parallel mode each program performed of the initial community of said n simultaneously, the method whether node judging in described candidate member set belongs to this community is: the node contribution degree calculating each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that the node in described candidate member set belongs to this community and calculates the first intermediary communities concentration class of this community successively, to suppose not by the node in described candidate member set and the concentration class of this community fashionable is initial aggregation degree, if this first intermediary communities concentration class of this community is greater than this initial community concentration class, then respective nodes is incorporated to this community; If this first intermediary communities concentration class is less than this initial community concentration class, then respective nodes and node below thereof are not incorporated to this community, be judged as that this community discovery is complete when not having described first intermediary communities concentration class to be greater than this initial community's concentration class, this initial community concentration class is community's concentration class of this community.
3. overlapping community according to claim 1 walks abreast rapid discovery method, it is characterized in that, in step S3, realize under MapReduce framework each program performed of the initial community of said n simultaneously, the method whether node judging in described candidate member set belongs to this community is:
Suppose that all nodes in described candidate member set belong to this community respectively, second intermediary communities concentration class of this community when the node simultaneously calculated respectively in described candidate member set belongs to this community, and all nodes in described candidate member's set are sorted from big to small according to described second intermediary communities concentration class, to suppose not by the node in described candidate member set and the concentration class of this community fashionable is initial aggregation degree, compare with this initial community concentration class by the described second intermediary communities concentration class of each node successively, if the described second intermediary communities concentration class of node is greater than this initial community concentration class, then corresponding node is incorporated to this community, if this second intermediary communities concentration class is less than this initial community concentration class, then respective nodes machine node below is not incorporated to this community, be judged as that this community discovery is complete when not having described first intermediary communities concentration class to be greater than this initial community's concentration class, this initial community concentration class is community's concentration class of this community.
4. overlapping community according to claim 1 walks abreast rapid discovery method, and it is characterized in that, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
5. overlapping community according to claim 1 walks abreast rapid discovery method, and it is characterized in that, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 &Sigma; i = 1 N - 1 D ( V i , V 0 ) &times; L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
6. overlapping community according to claim 1 walks abreast rapid discovery method, and it is characterized in that, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A &cap; C B C A &cup; C B
Wherein, community A and community B is two different communities.
7. overlapping community walks abreast a Fast Discovery System, it is characterized in that, comprising:
Community discovery preparation module, for: from document data set, read Node distribution network chart; Establishment community gathers; Described Node distribution network chart is gathered with described community and is associated;
Initial community determination module, for: the number setting initial community is n; Calculate the number of degrees of each node in described Node distribution network chart, and obtain the Centroid of the highest node of front n the node number of degrees respectively as n initial community; Now the concentration class of each initial community is 0; N initial community is stored in the set of described community;
Community is parallel finds module, for: to each of a said n initial community, the direct neighbor node of described Centroid is elected as candidate member's node of this community, and described candidate member's node is added candidate member's set of this community; Calculate the node contribution degree of each described candidate member's node; The node being 1 by described node contribution degree is incorporated to this community, and calculates the initial community concentration class of this community; For the node that described node contribution degree is not 1, sorted on earth by height according to node contribution degree described in it, and from the node that wherein said node contribution degree is the highest, suppose that it belongs to this community and calculates the intermediary communities concentration class of this community successively, if this intermediary communities concentration class is greater than this initial community concentration class, then respective nodes is incorporated to this community, and repeats this step; If this intermediary communities concentration class is less than this initial community concentration class, then respective nodes is not incorporated to this community, and is judged as that this community discovery is complete, and this initial community concentration class is community's concentration class of this community; This community data is gathered stored in described community;
Judge control module, for: judge whether have node not to be incorporated to any community in described Node distribution network chart, if, then call described initial community determination module and described community parallel discovery module, community discovery is re-started to the node not being incorporated to any community, until all nodes in described Node distribution network chart all belong to a community;
First community merges module, for: its community's degree of overlapping is calculated between two to all communities in the set of described community, if the described community degree of overlapping of Liang Ge community is greater than setting threshold value, the Liang Ge community then community's degree of overlapping described in this being greater than setting threshold value merges into a community, and upgrades the corresponding data of described community set;
Second community merges module, for: to two communities with common node any in the set of described community, calculate the new communities' concentration class being merged into new communities, by this new communities' concentration class, front these two the described community concentration class with the community of common node contrast with merging respectively, if these two have the described community concentration class of the community of common node before this new communities' concentration class is greater than merging respectively, the community then these two being had common node merges into new communities, this new communities' concentration class is community's concentration class of the new communities after these two communities with common node merge, and upgrade the corresponding data of described community set.
8. overlapping community according to claim 7 walks abreast Fast Discovery System, and it is characterized in that, the computing method of described node contribution degree are:
Suppose node v idescribed node contribution degree be c, community n jinterior node number is N, node V ibelong to community n j, node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then node V idescribed node contribution degree c by following formulae discovery:
c = L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n.
9. overlapping community according to claim 7 walks abreast Fast Discovery System, and it is characterized in that, the computing method of described community concentration class are:
Suppose community n idescribed community concentration class be M, community n jinterior node number is N, node V ibelong to community n j, and node V ito community center node V 0distance be D (V i, V 0), node V iwith community n jthe limit number that other nodes inner connect is L in, node V iwith community n jthe limit number that exterior node connects is L out, then community n jdescribed community concentration class be M by following formulae discovery:
M = 1 &Sigma; i = 1 N - 1 D ( V i , V 0 ) &times; L in L in + L out
Wherein, i is the natural number from 1 to N-1, and j is the natural number from 1 to n;
The computing method of described initial community concentration class, described intermediary communities concentration class and described new communities concentration class are all identical with the computing method of described community concentration class.
10. overlapping community according to claim 7 walks abreast Fast Discovery System, and it is characterized in that, the computing method of the described community degree of overlapping of Liang Ge community are:
Suppose that Liang Ge community is A and B, the node set of community A is designated as C a, the node set of community B is designated as C b, the described community degree of overlapping of community A and community B is designated as O aB, then the described community degree of overlapping O of community A and community B aBby following formulae discovery:
O AB = C A &cap; C B C A &cup; C B
Wherein, community A and community B is two different communities.
CN201410302016.8A 2014-06-27 2014-06-27 Overlapped community parallel discovery method and system Pending CN105302823A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410302016.8A CN105302823A (en) 2014-06-27 2014-06-27 Overlapped community parallel discovery method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410302016.8A CN105302823A (en) 2014-06-27 2014-06-27 Overlapped community parallel discovery method and system

Publications (1)

Publication Number Publication Date
CN105302823A true CN105302823A (en) 2016-02-03

Family

ID=55200098

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410302016.8A Pending CN105302823A (en) 2014-06-27 2014-06-27 Overlapped community parallel discovery method and system

Country Status (1)

Country Link
CN (1) CN105302823A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107222334A (en) * 2017-05-24 2017-09-29 南京大学 Suitable for the local Combo discovering method based on core triangle of social networks
CN108459909A (en) * 2018-02-27 2018-08-28 北京临近空间飞行器***工程研究所 A kind of Multi-bodies Separation mesh overlay method and system suitable for parallel processing
CN108600013A (en) * 2018-04-26 2018-09-28 北京邮电大学 The overlapping community discovery method and device of dynamic network
CN109345239A (en) * 2018-09-10 2019-02-15 河海大学 A kind of organization overlapping parallelization community discovery method
CN109635074A (en) * 2018-11-13 2019-04-16 平安科技(深圳)有限公司 A kind of entity relationship analysis method and terminal device based on public feelings information

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107222334A (en) * 2017-05-24 2017-09-29 南京大学 Suitable for the local Combo discovering method based on core triangle of social networks
CN108459909A (en) * 2018-02-27 2018-08-28 北京临近空间飞行器***工程研究所 A kind of Multi-bodies Separation mesh overlay method and system suitable for parallel processing
CN108459909B (en) * 2018-02-27 2021-02-09 北京临近空间飞行器***工程研究所 Multi-body separation grid overlapping method and system suitable for parallel processing
CN108600013A (en) * 2018-04-26 2018-09-28 北京邮电大学 The overlapping community discovery method and device of dynamic network
CN109345239A (en) * 2018-09-10 2019-02-15 河海大学 A kind of organization overlapping parallelization community discovery method
CN109635074A (en) * 2018-11-13 2019-04-16 平安科技(深圳)有限公司 A kind of entity relationship analysis method and terminal device based on public feelings information
CN109635074B (en) * 2018-11-13 2024-05-07 平安科技(深圳)有限公司 Entity relationship analysis method and terminal equipment based on public opinion information

Similar Documents

Publication Publication Date Title
Yu et al. Hierarchical clustering in minimum spanning trees
Wang et al. Effective lossless condensed representation and discovery of spatial co-location patterns
CN105302823A (en) Overlapped community parallel discovery method and system
CN103902988B (en) A kind of sketch shape matching method based on Modular products figure with Clique
CN101320370B (en) Deep layer web page data source sort management method based on query interface connection drawing
CN107391542A (en) A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
CN102456062B (en) Community similarity calculation method and social network cooperation mode discovery method
CN102708327A (en) Network community discovery method based on spectrum optimization
Fel et al. Xplique: A deep learning explainability toolbox
Ortmann et al. Efficient orbit-aware triad and quad census in directed and undirected graphs
Lin et al. A frequent itemset mining algorithm based on the Principle of Inclusion–Exclusion and transaction mapping
CN113609345B (en) Target object association method and device, computing equipment and storage medium
CN102033947A (en) Region recognizing device and method based on retrieval word
CN104463129A (en) Fingerprint registration method and device
CN105335368A (en) Product clustering method and apparatus
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
Tri et al. Exploiting geotagged resources to spatial ranking by extending hits algorithm
CN102496033B (en) Image SIFT feature matching method based on MR computation framework
CN106097090A (en) A kind of taxpayer interests theoretical based on figure associate group&#39;s recognition methods
Zhang et al. Automated detecting and placing road objects from street-level images
Fu et al. Stereo matching confidence learning based on multi-modal convolution neural networks
CN114913321A (en) Object attention mining method and system based on local-to-global knowledge migration
CN102819581B (en) Method for generating polygonal chain with concentrated topology of geographic information system
CN102193928B (en) Method for matching lightweight ontologies based on multilayer text categorizer
CN104899283A (en) Frequent sub-graph mining and optimizing method for single uncertain graph

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160203