CN115809243A - B-tree-based overlapping community discovery method, device, equipment and storage medium - Google Patents

B-tree-based overlapping community discovery method, device, equipment and storage medium Download PDF

Info

Publication number
CN115809243A
CN115809243A CN202211449751.2A CN202211449751A CN115809243A CN 115809243 A CN115809243 A CN 115809243A CN 202211449751 A CN202211449751 A CN 202211449751A CN 115809243 A CN115809243 A CN 115809243A
Authority
CN
China
Prior art keywords
tree
graph
target
community
vertex
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211449751.2A
Other languages
Chinese (zh)
Inventor
郑志高
杜博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202211449751.2A priority Critical patent/CN115809243A/en
Publication of CN115809243A publication Critical patent/CN115809243A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a method, a device, equipment and a storage medium for discovering overlapping communities based on a B tree, relating to the technical field of computer software and theory, and comprising the steps of constructing a search array based on all graph vertexes in a target community, and storing neighbor graph vertexes outside the target community in the B tree; when one target graph vertex in the search array is accessed, the target graph vertex is mapped to a continuous memory area so as to perform parallel traversal on all neighbor graph vertices in the B tree; respectively calculating the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community; and adding the vertex of the neighbor graph with the variable quantity of the modularity larger than 0 into the target community. According to the method and the device, the B tree is used for organizing the graph data, so that branch divergence in the calculation process can be effectively eliminated, the problem of poor memory access efficiency caused by irregular memory access is solved, and the calculation capability, the calculation efficiency and the memory access efficiency of the GPU in the overlapped community detection are improved.

Description

Overlapping community discovery method, device, equipment and storage medium based on B tree
Technical Field
The present application relates to the field of computer software and theoretical technologies, and in particular, to a method, an apparatus, a device, and a storage medium for discovering overlapping communities based on a B-tree.
Background
The community detection algorithm mainly comprises a discrete community detection algorithm and an overlapping community detection algorithm, wherein network graph data vertexes are divided into only one community, and one vertex can belong to multiple communities simultaneously in the overlapping community detection algorithm. In the real network graph data, one vertex may have different identities in different scenes, which means that the same vertex may belong to multiple communities at the same time, and thus the vertices between different communities may be overlapped. For example, in a social relationship network, when divided according to different dimensions such as professional relationships, familial relationships, classmatic relationships, and the like, a person may belong to multiple communities at the same time; in an academic paper citation relationship network, a researcher may be interested in both computer architecture and network security aspects. Therefore, compared with the traditional discrete communities, the overlapping communities can reflect the basic structure of a complex network more truly and comprehensively, the overlapping vertexes as the 'bridges' among different communities reflect some potential relations among different communities to a certain extent, and the overlapping vertexes have higher research value than common vertexes.
In order to implement overlapping community detection on large-scale graph data, a GPU (Graphics Processing Unit) is widely used for overlapping community detection. However, because a large number of branch splitting operations exist in the detection process of the overlapping communities, a large number of threads wait in the execution process of the overlapping community detection algorithm on the GPU, and further the computational capability and computational efficiency of the GPU are severely limited; in addition, the GPU can reach the performance peak value only in the data model of regular memory access, but a large amount of irregular memory access exists in the overlapping community detection process, so that the memory access efficiency of the GPU is seriously influenced.
Disclosure of Invention
The application provides a method, a device, equipment and a storage medium for discovering overlapping communities based on a B tree, which are used for solving the problems of poor GPU computing capacity, poor computing efficiency and poor memory access efficiency caused by branch bifurcation operation and irregular memory access in overlapping community detection in the related technology.
In a first aspect, a method for discovering overlapping communities based on a B-tree is provided, which includes the following steps:
constructing a search array based on all graph vertexes in the target community, and storing neighbor graph vertexes outside the target community in a B tree;
when one target graph vertex in the search array is accessed, mapping the target graph vertex to a continuous memory area so as to perform parallel traversal on all neighbor graph vertices in the B tree;
respectively calculating the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
and adding the vertex of the neighbor graph with the variable quantity of the modularity larger than 0 into the target community.
In some embodiments, the storing neighbor graph vertices outside of the target community in the B-tree comprises:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the neighbor graph vertexes on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing other neighbor graph vertexes, wherein N represents the path number of the B tree.
In some embodiments, the method further comprises:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing other vertexes of the neighbor graph.
In some embodiments, before the step of constructing a lookup array based on all graph vertices within the target community and storing neighbor graph vertices outside the target community in the B-tree, the method further includes:
and setting two buffer areas, so that the action of loading the graph vertex in the target community and the neighbor graph vertex outside the target community on the CPU into the GPU is completed in one buffer area, and the action of reading the graph vertex in the target community on the GPU into the search array and reading the neighbor graph vertex outside the target community into the B tree is completed in the other buffer area.
In some embodiments, the number of ways of the B-tree is 32.
In a second aspect, an apparatus for B-tree based overlapping community discovery is provided, including:
the storage unit is used for constructing a search array based on all graph vertexes in the target community and storing neighbor graph vertexes outside the target community in the B tree;
the access unit is used for mapping the target graph vertex to a continuous memory area when one of the target graph vertices in the lookup array is accessed so as to perform parallel traversal on all neighbor graph vertices in the B tree;
the computing unit is used for respectively computing the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
a discovery unit for adding neighbor graph vertices with a modularity variation greater than 0 to the target community.
In some embodiments, the memory unit is specifically configured to:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the vertexes of the neighbor graph on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing the vertexes of other neighbor graphs, wherein N represents the path number of the B tree.
In some embodiments, the memory unit is further specifically configured to:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing other vertexes of the neighbor graph.
In a third aspect, an apparatus for B-tree based overlapping community discovery is provided, including: a memory and a processor, the memory having stored therein at least one instruction, the at least one instruction being loaded and executed by the processor to implement the aforementioned overlapping community discovery method based on B-trees.
In a fourth aspect, a computer-readable storage medium is provided, which stores a computer program that, when executed by a processor, implements the aforementioned B-tree based overlapping community discovery method.
The technical scheme who provides this application brings beneficial effect includes: the computing power, the computing efficiency and the memory access efficiency of the GPU in the overlapped community detection can be effectively improved.
The application provides a method, a device, equipment and a storage medium for discovering overlapping communities based on a B tree, which comprises the steps of constructing a search array based on all graph vertexes in a target community, and storing neighbor graph vertexes outside the target community in the B tree; when one target graph vertex in the search array is accessed, the target graph vertex is mapped to a continuous memory area so as to perform parallel traversal on all neighbor graph vertices in the B tree; respectively calculating the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community; and adding the vertex of the neighbor graph with the variable quantity of the modularity larger than 0 into the target community. The method and the device fully consider the characteristics of the size and the high parallelism of the memory space on the GPU, organize the graph data through the B tree, eliminate branch divergence in the calculation process, and enable a thread group to obtain the complete content of the node of the B tree through one-time reading because the B tree is a balanced tree structure and the node on the B tree comprises a plurality of neighbor graph vertexes, thereby effectively reducing the memory access times, improving the GPU cache utilization rate and reducing the memory access at the same time, and effectively improving the problem of poor memory access efficiency caused by irregular memory access.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic flowchart of a B-tree-based overlapping community discovery method according to an embodiment of the present disclosure;
FIG. 2 is a diagram illustrating memory branch mapping on a GPU in the prior art;
FIG. 3 is a schematic diagram of a node splitting operation in the prior art;
FIG. 4 shows a community c provided in this embodiment of the present application 1 A diagram of a neighbor graph vertex and its B-tree representation;
fig. 5 is a schematic structural diagram of an overlapping community discovery device based on a B-tree according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The embodiment of the application provides a method, a device, equipment and a storage medium for discovering overlapping communities based on a B tree, which can solve the problems of poor GPU computing capacity, poor computing efficiency and poor memory access efficiency caused by branch bifurcation operation and irregular memory access in overlapping community detection in the related technology.
Fig. 1 is a method for discovering overlapping communities based on a B-tree according to an embodiment of the present application, including the following steps:
step S10: constructing a search array based on all graph vertexes in the target community, and storing neighbor graph vertexes outside the target community in a B tree, wherein the number of the B tree is 32;
exemplarily, it should be understood that although the GPU is widely applied to accelerate various graph data processing algorithms, current research on overlapping community detection algorithms mainly focuses on serial algorithm research and parallel algorithm research based on a multi-core CPU, and thus there are a series of research challenges for GPU-based overlapping community detection.
Firstly, a large number of branch bifurcation operations exist in the overlapping community detection process, so that a large number of threads wait in the execution process of the overlapping community detection algorithm on the GPU, and the computing capacity, throughput, memory bandwidth utilization rate and the like of the GPU are severely limited.When the overlapped community detection is carried out, a large number of branch operations are included in the vertex division process, the main content of the branch operations is to confirm that the divided vertices are reasonable, and the main form of the branch operations is' if
Figure BDA0003951117360000051
then move vertex intoc 2 ”,
Figure BDA0003951117360000052
Represents community c 1 The modularity (the modularity is used to measure the accuracy of the overlapping community partitions),
Figure BDA0003951117360000053
representing Community c 2 It can be seen that a large number of IF-THEN statements are required to implement vertex movement in current overlapping community detection algorithms. In addition, referring to fig. 2, assuming that the size of the Warp (i.e. the thread group) is 4, when the groups X, Y and Z are loaded into the continuous memory area, the calculation of the branch 2 needs to depend on the calculation result a of the branch 1 during the calculation process, i.e. the branch 2 can only start after the execution of the branch 1 is finished; moreover, if the threads executing branch 1 and branch 2 are in the same thread group, all threads will access the same memory region, and this memory access mode will result in empty memory operations for the threads and also result in multiple memory accesses to obtain the data needed by the active thread. Therefore, avoiding branch divergence will greatly improve the execution efficiency of GPU in overlapping community detection.
In addition, because the scale difference of different communities is huge, the calculation amount difference in the detection process of different communities is huge, and the calculation speed of the modularity is different, so that the thread/thread group in charge of the small-scale community ends the calculation process before the thread/thread group in charge of the large-scale community. However, in the same iteration round, the thread responsible for the small-scale community needs to start calculating the modularity, while the thread responsible for the large-scale community needs to execute vertex moving operation, which will cause memory access conflict in the calculation process, and further limit the GPU efficiency, and according to the experimental result, it is shown that the branch efficiency on the GPU is only 66.89%, so that a large number of branch splitting operations will also greatly reduce the utilization rate of the thread.
Secondly, the GPU can reach its performance peak only in the data model of regular memory access, that is, the memory access bandwidth utilization rate can be increased only in the regular memory access mode, that is, all threads in a thread group access a continuous memory area, and each thread can acquire data required by the thread, whereas in the regular memory access mode, all threads in a thread group can acquire all required data only by accessing the memory once. However, due to the complex association relationship of the graph data, the data required in the memory access process is scattered in a plurality of memory segments, and this phenomenon enables one thread group to need to access the memory for many times to obtain the data required by all threads, so that a large amount of irregular memory access exists in the overlapping community detection process, which seriously affects the memory access efficiency of the GPU.
Therefore, in order to solve the problem of parallelism caused by a branch divergence phenomenon and the problem of low access efficiency caused by irregular access, the embodiment provides the B-tree-based overlapping community detection algorithm, which promotes the overall parallelism of the algorithm by supporting concurrent query through the B-tree, manages the graph data on the GPU through the B-tree storage structure, and maps the B-tree nodes to cache lines (cache lines) of the GPU, so as to reduce the access overhead while improving the cache utilization rate of the GPU.
It can be understood that the B-tree is a multi-way balanced lookup tree, where all elements in the node are in sequence and all leaf nodes are in the same level; the middle element of the non-leaf node is used as a partition point (pivot) to divide the child nodes into a plurality of subtrees, all record values (value) are stored in the leaf node, each node comprises a plurality of pointers pointing to the child nodes (the pointers of the leaf nodes are null), and in the process of traversing the B-tree, searching is mainly carried out according to the pointers among the nodes. For an N-way B-tree, the number of child nodes of any node is not more than N-1 except the root node, and the node at least comprises B/2 child nodes.
Specifically, in this embodiment, a plurality of vertices are randomly selected as representatives of different communities, and the selected vertices and their neighboring vertices are used as input of the algorithm. In the algorithm execution process, the Lugger takes the graph vertex in the target community as a lookup array (namely, a lookup array), and manages the neighbor graph vertex in the non-target community by adopting a B tree structure so as to eliminate branch divergence in the calculation process. In order to improve the GPU memory throughput, the Lugger in this embodiment uses a 32-way balanced tree (i.e., N = 32) to organize neighbor vertices in the non-target community; through the setting of the Lugger algorithm, one B-tree node contains at most 31 graph vertexes, and one thread group can obtain the complete content of the B-tree node through one-time reading, so that the memory access times in the algorithm execution process can be effectively reduced; and this arrangement will allow the cache (cache) to accommodate one or more B-tree nodes, thereby reducing the number of memory load (i.e., load or load) operations during algorithm execution. On the other hand, the B-tree is a balanced tree structure, and all record values are stored in leaf nodes, so that the problem of irregular access memory can be improved to a certain extent, and the problem of load inequality caused by vertex degree difference can be avoided by thread allocation based on the B-tree.
Further, before the step of constructing a lookup array based on all graph vertices in the target community and storing neighbor graph vertices outside the target community in the B-tree, the method further includes:
and setting two buffer areas, so that the action of loading the graph vertexes in the target community and the neighbor graph vertexes outside the target community on the CPU into the GPU is completed in one buffer area, and the action of reading the graph vertexes in the target community to the lookup array and reading the neighbor graph vertexes outside the target community to the B tree on the GPU is completed in the other buffer area.
Exemplarily, it should be understood that in current graph data applications, in addition to the most basic graph data format, a large number of temporary variables need to be stored, and the storage overhead of the temporary variables will further increase the pressure of the GPU memory space. Therefore, the present embodiment will use a CSR (Compressed Sparse Row, sparse matrix compression format, which uses a compact storage method with minimal storage space) to initialize graph data, and store neighbor vertices outside the target community into the B-tree structure. Because the B-tree structure is traversed orderly, the present embodiment stores the nodes of the B-tree structure into a continuous memory space in an ascending order according to the traversal order of the B-tree structure, and based on the storage structure, the present embodiment accelerates the memory access by using the GPU prefetching technology in the communication process between the host (i.e., CPU) and the GPU.
GPU prefetching is an important technology for processing data transmission from a host to a GPU, and the embodiment realizes large-scale image data transmission through the data prefetching technology in the Lugger. Specifically, in the Lugger data prefetching of this embodiment, separate buffer areas are respectively set for data reading and data loading, each buffer area works independently, that is, an action of loading a graph vertex in a target community and a neighbor graph vertex outside the target community on the CPU to the GPU is executed in one buffer area, and an action of reading a graph vertex in a target community on the GPU to a lookup array and a neighbor graph vertex outside the target community to a B tree are executed in another buffer area, so that the data reading work is not affected by the data loading work, thereby ensuring that the loading work of candidate active data can be overlapped with the calculation task of the current active vertex, and reducing pipeline interruption caused by data transmission.
Further, the storing the vertices of the neighbor graph outside the target community in the B-tree includes:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the vertexes of the neighbor graph on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing the vertexes of other neighbor graphs, wherein N represents the path number of the B tree.
Illustratively, it should be understood that any node in the B-tree has at most N children, and the number of elements in the node does not exceed N-1. Therefore, node splitting is required when more than N-1 elements are contained in a node. In the prior art, node splitting is often completed by adopting an atomic operation mode, so that data consistency in the node splitting process is ensured. However, in the node splitting process based on atomic operation, the nodes of the current layer where the node to be split is located and the nodes of the layer where the parent node is located need to be locked, and a large number of locking operations can seriously slow down the speed of building the B-tree; on the other hand, when the parent node of the split node itself stores N-1 elements, the split operation of the current node may be passed upwards to cause the parent node to need to be further split, such passing of node split may cause more nodes to need to be locked, and the child nodes of the split node are in an idle state, which may further reduce the GPU performance.
For example, inserting node 10 in the 3-way balanced tree shown in FIG. 3 would result in one split of node (7, 9), and since there are two elements (4, 6) in the parent node of node (7, 9), the split of node (7, 9) would result in one split of node (4, 6), so that the nodes containing vertices 4,6, 7, and 9 would be locked, while the nodes containing vertices 3 and 5 would be in an idle state until the node splitting operation is finished.
Therefore, in order to reduce the performance overhead caused by the node splitting operation as much as possible, the embodiment proposes an aggressive node splitting policy, and this policy advances the node splitting process to the traversal process: when the node contains N-2 elements (namely the node is not full of elements), the splitting of the node is started, so that all the splitting of the node is limited to be performed at the current layer, and the possibility of transmitting the splitting operation of the node is eliminated, so that the lock on the parent node of the node needing to be split can be removed, and the GPU performance and the B-tree creating speed are improved.
Further, the method further comprises:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing the vertexes of other neighbor graphs.
Exemplarily, in the present embodiment, when the parent node of the current node contains N-1 elements, the adjustment of the subtree caused by the splitting of the parent node is avoided by adding a new child node to the parent node through a restart mechanism. Although it is a mixture ofHowever, the restart mechanism will increase the overhead of child node establishment to some extent, but the coexistence in the whole B-tree establishment process does not exceed (2 log) N M) restart operations (assuming there is one upward-propagating split node per layer), M represents the number of nodes in the B-tree, and the restart mechanism does not increase the height of the tree, which would otherwise be caused by a conventional parent split operation. Therefore, according to the embodiment, through the restarting mechanism and the aggressive node splitting mechanism, the expenses caused by atomic operation and lock operation in the node splitting process can be reduced, and meanwhile, the parallelism of the algorithm can be increased to a certain extent through the restarting mechanism.
Step S20: when one target graph vertex in the search array is accessed, the target graph vertex is mapped to a continuous memory area so as to perform parallel traversal on all neighbor graph vertices in the B tree;
exemplarily, it should be understood that all graph vertices in the target community are organized into a lookup array, and since the nodes of the B-tree are stored in a continuous memory space, the lookup array can be mapped to a continuous memory area when being accessed, thereby realizing continuous access of the lookup array. All the graph vertexes in the lookup array can be accessed in parallel, all the graph vertexes in the lookup array can be accessed in series, and the method can be determined according to actual requirements. When a graph vertex in the lookup array is accessed, all elements in the B-tree (i.e., neighbor graph vertices) will be searched in parallel and step by step to compute whether neighbor vertices within the non-target community can be added to the target community. For example, referring to FIG. 4, graph vertex 1 and graph vertex 2 form an initial community c 1 Community c 1 The elements in the array are stored in an array form; and community c 1 Including 3, 4, 5, 6, 7,9, 10 and 11, stored in a B-tree, which is a 3-way balanced tree, the Lugger will search for each neighbor graph vertex in the B-tree in parallel starting from the root node.
Step S30: respectively calculating the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
exemplarily, in the present embodiment, each will be calculatedThe modularity change caused by adding a neighbor graph vertex into a target community, thereby dynamically adjusting the vertex division process and achieving the purpose of discovering overlapping communities. Specifically, the vertex v of the neighbor graph obtained by calculation according to the following formula is added to the target community C i The later corresponding modularity EQ:
Figure BDA0003951117360000111
wherein m represents the number of sides, C i Denotes the ith community, v denotes that it may need to be added to C i At the vertex of the neighbor graph, w represents C i Inner graph vertex, O v Represents the number of communities to which the neighbor graph vertex v belongs (i.e., the number of communities that have been discovered), O w Indicates the number of communities to which graph vertex w belongs, A vw Representing the adjacency matrix of diagram G, A vw =1 indicates that there is an edge between the neighbor graph vertex v and graph vertex w, a vw =0 denotes no edge between neighbor graph vertex v and graph vertex w, k v Degree representing the vertex v of the neighbor graph, i.e. the number of edges, k, to which the vertex v of the neighbor graph is connected w Representing the degree of the graph vertex w.
After the modularity EQ of the vertex v of the neighbor graph is obtained through calculation, the vertex v of the neighbor graph is not added to the target community C according to the modularity EQ and the requirement i Adding vertex v of the neighbor graph obtained by previous modularity calculation into a target community C i And judging whether the vertex v of the neighbor graph needs to be added into the target community C or not according to the magnitude of the delta EQ of the variable quantity delta EQ of the corresponding modularity i
Step S40: and adding the vertex of the neighbor graph with the variable quantity of the modularity larger than 0 into the target community.
Illustratively, in the present embodiment, Δ EQ is detected when Δ EQ is detected>0, judging that the vertex v of the neighbor graph needs to be added into the target community C i If Δ EQ is less than or equal to 0, it is determined that the neighbor graph vertex v is not required to be added to the target community C i And further realize the discovery of the overlapped communities.
Wherein, the Lugger algorithm is as follows:
Figure BDA0003951117360000112
Figure BDA0003951117360000121
specifically, in this embodiment, the discovery principle of overlapping communities will be explained by combining the above Lugger algorithm: randomly selecting a plurality of graph vertexes in the graph G as representatives of different communities, and taking the selected graph vertexes and neighbor graph vertexes thereof as input of an algorithm; in the algorithm execution process, lugger will community C i The graph vertex in the graph is used as a lookup array, and the non-community C is used i The inner neighbor graph vertex is managed by adopting a B tree structure, and the algorithm searches elements in the B tree in parallel from a root node in the execution process and calculates the non-community C step by step i Whether an inner neighbor graph vertex can be added to community C i I.e. calculate each neighbor graph vertex joining community C i The subsequent change of modularity, if and only if Δ EQ>0, the algorithm adds the corresponding neighbor graph vertex to community C i In (1). Wherein, the time complexity of the B-tree establishment and the update is O (Mlog) respectively 32 (M)) and O (log) 32 (M)), identifying and collecting non-overlapping communities with a temporal complexity of
Figure BDA0003951117360000122
Thus, the algorithm time complexity is
Figure BDA0003951117360000123
In addition, the embodiment can also implement a thread organization policy with Warp as the center to implement load balancing on the GPU: different numbers of Warp are allocated according to the degree difference of the node of the B tree, and each Warp comprises 32 threads, so that the embodiment can allocate
Figure BDA0003951117360000124
Calculated by WarpLoad balancing in the process.
In conclusion, the embodiment fully considers the characteristics of the size and the high parallelism of the memory space on the GPU, the B tree is used for organizing graph data, branch divergence in the calculation process is eliminated, and because the B tree is a balanced tree structure, nodes on the B tree comprise a plurality of neighbor graph vertexes, a thread group can obtain the complete content of the node of the B tree through one-time reading, so that the memory access times can be effectively reduced, the GPU cache utilization rate is improved, the memory access is reduced, and the problem of poor memory access efficiency caused by irregular memory access is effectively solved.
The embodiment of the present application further provides a device for discovering overlapping communities based on a B-tree, including:
the storage unit is used for constructing a search array based on all graph vertexes in the target community and storing neighbor graph vertexes outside the target community in the B tree;
the access unit is used for mapping the target graph vertex to a continuous memory area when one of the target graph vertices in the lookup array is accessed so as to perform parallel traversal on all neighbor graph vertices in the B tree;
the computing unit is used for respectively computing the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
a discovery unit for adding neighbor graph vertices with a modularity variation greater than 0 to the target community.
Further, the storage unit is specifically configured to:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the neighbor graph vertexes on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing other neighbor graph vertexes, wherein N represents the path number of the B tree.
Further, the storage unit is specifically further configured to:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing other vertexes of the neighbor graph.
Further, the apparatus further comprises a setting unit configured to:
and setting two buffer areas, so that the action of loading the graph vertex in the target community and the neighbor graph vertex outside the target community on the CPU into the GPU is completed in one buffer area, and the action of reading the graph vertex in the target community on the GPU into the search array and reading the neighbor graph vertex outside the target community into the B tree is completed in the other buffer area.
Furthermore, the number of the B-tree is 32.
It should be noted that, as is clear to those skilled in the art, for convenience and simplicity of description, for the specific working processes of the apparatus and each unit described above, reference may be made to the corresponding process in the foregoing overlapping community discovery method embodiment based on a B tree, and details are not described herein again.
The apparatus provided by the above embodiments may be implemented in the form of a computer program that can run on a B-tree based overlapping community discovery device as shown in fig. 5.
The embodiment of the present application further provides a device for discovering overlapping communities based on a B-tree, including: the system comprises a memory, a processor and a network interface which are connected through a system bus, wherein at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor so as to realize all or part of the steps of the overlapping community discovery method based on the B tree.
The network interface is used for performing network communication, such as sending distributed tasks. Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
The Processor may be a CPU, or other general purpose Processor, digital Signal Processor (DSP), application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the computer device and the various parts of the overall computer device being connected by various interfaces and lines.
The memory may be used to store computer programs and/or modules, and the processor may implement various functions of the computer device by executing or executing the computer programs and/or modules stored in the memory and invoking data stored in the memory. The memory may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a video playing function, an image playing function, etc.), and the like; the storage data area may store data (such as video data, image data, etc.) created according to the use of the cellular phone, etc. Further, the memory may include high speed random access memory, and may include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, all or part of the steps of the aforementioned B-tree-based overlapping community discovery method are implemented.
The embodiments of the present application may implement all or part of the foregoing processes, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of the foregoing methods. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer memory, read-Only memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable medium may contain other components which may be suitably increased or decreased as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, in accordance with legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunications signals.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, server, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A B-tree-based overlapping community discovery method is characterized by comprising the following steps:
constructing a search array based on all graph vertexes in the target community, and storing neighbor graph vertexes outside the target community in a B tree;
when one target graph vertex in the search array is accessed, the target graph vertex is mapped to a continuous memory area so as to perform parallel traversal on all neighbor graph vertices in the B tree;
respectively calculating the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
and adding the vertex of the neighbor graph with the variable quantity of the modularity larger than 0 into the target community.
2. The method of claim 1, wherein storing neighbor graph vertices outside of the target community in the B-tree comprises:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the neighbor graph vertexes on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing other neighbor graph vertexes, wherein N represents the path number of the B tree.
3. The method of B-tree based overlapping community discovery of claim 2, said method further comprising:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing the vertexes of other neighbor graphs.
4. The method of claim 1, wherein before the step of constructing a lookup array based on all graph vertices within the target community and storing neighbor graph vertices outside the target community in the B-tree, the method further comprises:
and setting two buffer areas, so that the action of loading the graph vertex in the target community and the neighbor graph vertex outside the target community on the CPU into the GPU is completed in one buffer area, and the action of reading the graph vertex in the target community on the GPU into the search array and reading the neighbor graph vertex outside the target community into the B tree is completed in the other buffer area.
5. The B-tree based overlapping community discovery method of claim 1, wherein: the number of the B-tree is 32.
6. An apparatus for B-tree based overlapping community discovery, comprising:
the storage unit is used for constructing a search array based on all graph vertexes in the target community and storing neighbor graph vertexes outside the target community in the B tree;
the access unit is used for mapping the target graph vertex to a continuous memory area when one target graph vertex in the lookup array is accessed so as to traverse all neighbor graph vertices in the B tree in parallel;
the computing unit is used for respectively computing the variable quantity of the corresponding modularity after each neighbor graph vertex is added into the target community;
a discovery unit for adding neighbor graph vertices with a modularity variation greater than 0 to the target community.
7. The B-tree based overlapping community discovery apparatus of claim 6, wherein said storage unit is specifically configured to:
storing the neighbor graph vertices on nodes in a B-tree;
when the number of the neighbor graph vertexes on one target node in the B tree reaches N-2, splitting the target node to obtain a new node for storing other neighbor graph vertexes, wherein N represents the path number of the B tree.
8. The B-tree based overlapping community discovery apparatus of claim 7, wherein said storage unit is further configured to:
and when the number of the vertexes of the neighbor graph on the father node of the target node reaches N-1, adding at least one child node below the father node for storing other vertexes of the neighbor graph.
9. A B-tree based overlapping community discovery device, comprising: a memory and a processor, the memory having stored therein at least one instruction that is loaded and executed by the processor to implement the B-tree based overlapping community discovery method of any of claims 1-5.
10. A computer-readable storage medium, characterized in that: the computer storage medium stores a computer program that, when executed by a processor, implements the B-tree based overlapping community discovery method of any one of claims 1 to 5.
CN202211449751.2A 2022-11-18 2022-11-18 B-tree-based overlapping community discovery method, device, equipment and storage medium Pending CN115809243A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211449751.2A CN115809243A (en) 2022-11-18 2022-11-18 B-tree-based overlapping community discovery method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211449751.2A CN115809243A (en) 2022-11-18 2022-11-18 B-tree-based overlapping community discovery method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115809243A true CN115809243A (en) 2023-03-17

Family

ID=85483504

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211449751.2A Pending CN115809243A (en) 2022-11-18 2022-11-18 B-tree-based overlapping community discovery method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115809243A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117609227A (en) * 2023-11-09 2024-02-27 北京火山引擎科技有限公司 Data processing method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117609227A (en) * 2023-11-09 2024-02-27 北京火山引擎科技有限公司 Data processing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Polychroniou et al. Track join: distributed joins with minimal network traffic
Karloff et al. A model of computation for MapReduce
US8959138B2 (en) Distributed data scalable adaptive map-reduce framework
Halim et al. A MapReduce-based maximum-flow algorithm for large small-world network graphs
Wehr et al. Parallel kd-tree construction on the gpu with an adaptive split and sort strategy
Shen et al. GPU‐based branch‐and‐bound method to solve large 0‐1 knapsack problems with data‐centric strategies
CN115809243A (en) B-tree-based overlapping community discovery method, device, equipment and storage medium
Chen et al. HiClus: Highly scalable density-based clustering with heterogeneous cloud
Wan et al. Efficient CPU‐GPU cooperative computing for solving the subset‐sum problem
KR20210024751A (en) Graph processing system including different kind memory device and an operation method thereof
Gupta et al. Map-based graph analysis on MapReduce
Tran et al. Exploring means to enhance the efficiency of GPU bitmap index query processing
Ediger et al. Computational graph analytics for massive streaming data
Neelima et al. Kepler GPU accelerated recursive sorting using dynamic parallelism
Salah et al. A time-space efficient algorithm for parallel k-way in-place merging based on sequence partitioning and perfect shuffle
Ibrahim et al. Improvement of data throughput in data-intensive cloud computing applications
Geetha et al. Implementation and performance comparison of partitioning techniques in apache spark
US20130144838A1 (en) Transferring files
Kim et al. DSP-CC-: I/O efficient parallel computation of connected components in billion-scale networks
CN116700995B (en) Concurrent access method, device, equipment and storage medium for heterogeneous memory pool
Metwally Scaling Equi-Joins
Ajwani et al. I/O-optimal distribution sweeping on private-cache chip multiprocessors
Neukirchen Survey and performance evaluation of DBSCAN spatial clustering implementations for big data and high-performance computing paradigms
Uprety et al. MapReduce: Big Data Maintained Algorithm
Tlili et al. A novel data partitioning approach for association rule mining on grids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination