CN112988064B - Concurrent multitask-oriented disk graph processing method - Google Patents

Concurrent multitask-oriented disk graph processing method Download PDF

Info

Publication number
CN112988064B
CN112988064B CN202110175548.XA CN202110175548A CN112988064B CN 112988064 B CN112988064 B CN 112988064B CN 202110175548 A CN202110175548 A CN 202110175548A CN 112988064 B CN112988064 B CN 112988064B
Authority
CN
China
Prior art keywords
vertex
graph
edge data
edge
data block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110175548.XA
Other languages
Chinese (zh)
Other versions
CN112988064A (en
Inventor
王芳
冯丹
徐湘灏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN202110175548.XA priority Critical patent/CN112988064B/en
Publication of CN112988064A publication Critical patent/CN112988064A/en
Application granted granted Critical
Publication of CN112988064B publication Critical patent/CN112988064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0625Power saving in storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a concurrent multitask-oriented disk graph processing method, which belongs to the technical field of computer big data processing and comprises the following steps: storing the edge data block and the vertex value set converted from the input graph data into a disk; when a plurality of graph tasks are executed, loading the vertex value set into a memory, and loading the edge data blocks in the memory in an in-and-out mode; updating a target vertex value by using a task updating function based on the edge data block and the vertex value set which are accessed by a plurality of graph tasks concurrently; when the target vertex values of all the accessed edge data blocks are updated and meet the convergence condition, outputting the final vertex value; otherwise, circularly loading the edge data block in the memory and updating the target vertex value; the invention can reduce the I/O access expense of the disk.

Description

Concurrent multitasking-oriented disk image processing method
Technical Field
The invention belongs to the technical field of computer big data processing, and particularly relates to a concurrent multitasking-oriented disk graph processing method.
Background
With the increasing demand for graph computation in the real world, a graph computation system is required to simultaneously and concurrently output a plurality of graph computation tasks in many scenarios. However, existing concurrent multitasking-oriented graph computing systems typically rely on a large-scale distributed system or a stand-alone based shared memory system. These systems face problems of high hardware cost and communication overhead, or poor scalability when processing concurrent graphics tasks on large-scale graphics data. These problems are further exacerbated by the large number of intermediate results that concurrent graphics tasks produce when executed. In this case, the external memory mode diagram with high cost performance and good expandability is adopted for calculation, and a potential feasible option is provided.
However, existing external memory pattern graph computing systems face the following challenges when handling concurrent graph tasks. Firstly, due to different I/O access characteristics, concurrent graph tasks access graph data in a disk according to different traversal paths during execution. These accesses tend to produce many random and redundant data reads that greatly affect the performance of the system. Second, since concurrent graph tasks simultaneously issue I/O requests to the operating system. This causes more intense competition for the otherwise limited disk bandwidth, resulting in severe I/O collisions that impact the throughput of the system.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a concurrent multitasking-oriented disk map processing method, aiming at solving the problem of high I/O (input/output) overhead when the existing concurrent graph tasks are executed.
In order to achieve the above object, the present invention provides a concurrent multitasking-oriented disk graph processing method, which includes the following steps:
storing the edge data block and the vertex value set converted from the input graph data into a disk;
when a plurality of graph tasks are executed, loading the vertex value set into a memory, and loading the edge data blocks in the memory in an in-and-out mode;
updating a target vertex value by using a task updating function based on the edge data block and the vertex value set which are accessed by a plurality of graph tasks concurrently;
when the target vertex values of all the accessed edge data blocks are updated and meet the convergence condition, outputting the final vertex value; otherwise, circularly loading the edge data block in the memory and updating the target vertex value;
wherein the graph data sub-blocks comprise edge data blocks; the edge data block is used for storing emergent edge data of the vertex.
Preferably, the mode of the multiple graph tasks concurrently accessing the side data block is as follows:
in the access process, the multiple graph tasks skip the side data blocks in the inactive state in a selective data access mode and only access the side data blocks containing the active sides; the inactive side data block is an edge data block that does not contain active side data.
Preferably, the specific steps include:
(1) Converting input graph data into P edge data blocks and a vertex value set; wherein each vertex in the input graph data is assigned a vertex value;
(2) Loading the vertex value set into a memory;
(3) Loading the kth edge data block to a memory; the initial value of k is 1;
(4) When the kth edge data block is in an active state, updating a target vertex value by using a task updating function based on the kth edge data block and the vertex value set which are accessed by a plurality of graph tasks concurrently; when the k-th edge data block is in an active state, turning to the step (5);
(5) Returning the k-th edge data block to the disk;
(6) Judging whether k = P, if so, turning to the step (7), otherwise, enabling k = k +1, and returning to the step (3);
(7) Judging whether the convergence condition is met, and if so, outputting a final vertex value; otherwise, let k =1, return to step (3).
Preferably, the map data subblocks further comprise an index structure; an index structure corresponds to an edge data block, and the index structure is used for recording the offset of a first emergent edge of a vertex corresponding to the corresponding edge data block in the edge data block; when executing multiple graph tasks, loading the graph tasks and corresponding edge data blocks into the memory.
Preferably, the method for loading the currently processed edge data block into the memory includes:
respectively calculating the read-write expenditure of a disk for sequentially loading all edge data and the read-write expenditure of a disk for randomly loading an active edge;
wherein a vertex is defined as an active vertex if and only if its vertex value was updated in the previous iteration; if and only if the source vertex of the edge is an active vertex, the edge is defined as an active edge; the read-write overhead is calculated by dividing the total data volume of the read-write graph data by the access bandwidth of the disk;
judging whether the read-write expenditure of the disk for sequentially loading all the edge data is less than the read-write expenditure of the disk for randomly loading the active edge; if yes, selecting to load all edge data in sequence, otherwise, selecting to load active edge data randomly.
Preferably, the specific steps of converting the input graph data into the edge data blocks and the vertex values are as follows:
allocating a vertex value to each vertex in the input graph data, and storing a vertex value set into a disk;
dividing the vertex value into P disjoint subintervals, and setting each subinterval to correspond to one edge data block; the value of P ensures that the size of each edge data block is smaller than the capacity of the memory;
and storing the edge data block and the vertex value set into a disk.
Preferably, the method for updating the destination vertex is as follows: and reading the source vertex by adopting a push model, and updating the target vertex value by adopting atomic operation according to the updating functions of a plurality of graph tasks.
Preferably, the convergence condition is that the vertex value of each subinterval no longer changes.
Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects:
the method converts input graph data into graph data sub-blocks and vertex value sets, and all tasks can uniformly access the graph data and the vertex value sets; the index structure is arranged in the graph data subblock, so that quick access of a plurality of tasks can be supported, and meanwhile, the graph data is stored in a disk and loaded to a memory when needed, so that the storage overhead of the disk can be reduced.
The method for loading the side data block in the memory provided by the invention is used for calculating the disk read-write expense for loading all the side data in the current sequence and the disk read-write expense for randomly loading the active side, and then determining which mode to load, so that the disk read-write expense is reduced.
In the invention, a plurality of graph tasks skip the side data blocks in the inactive state and access the side data blocks on the active side in a selective data method mode, thereby avoiding the loading of useless disk data and the waste of disk reading and writing.
The invention adopts the graph data to update the destination vertex, solves the problems of redundant access and storage overhead in the processing process, and avoids the competition of disk bandwidth.
Drawings
Fig. 1 is a schematic diagram of a concurrent multitasking-oriented disk map processing method according to an embodiment of the present invention;
fig. 2 (a) is a schematic diagram of a directed graph G provided by the embodiment of the present invention;
fig. 2 (b) is a schematic process diagram of organizing a directed graph G into a CSR structure according to an embodiment of the present invention;
fig. 3 is a schematic diagram of processing a vertex and an edge of a sub-section 1 in a directed graph G by a concurrent graph task according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a concurrent multitask-oriented disk graph processing method, which comprises the following steps of:
storing the edge data block and the vertex value set converted from the input graph data into a disk;
when a plurality of graph tasks are executed, loading the vertex value set into a memory, and loading the edge data blocks in the memory in an in-and-out mode;
updating a target vertex value by using a task updating function based on the edge data block and the vertex value set which are accessed by a plurality of graph tasks concurrently;
when the target vertex values of all the accessed edge data blocks are updated and meet the convergence condition, outputting the final vertex value; otherwise, circularly loading the edge data block in the memory and updating the target vertex value;
wherein the graph data subblocks comprise edge data blocks; the edge data block is used for storing emergent edge data of the vertex.
Preferably, the way for the multiple graph tasks to access the edge data block concurrently is:
in the access process, the multiple graph tasks skip the side data blocks in the inactive state in a selective data access mode and only access the side data blocks containing the active sides; the side data block in the inactive state is a side data block which does not contain active side data.
Preferably, the specific steps include:
(1) Converting input graph data into P edge data blocks and a vertex value set; wherein each vertex in the input graph data is assigned a vertex value;
(2) Loading the vertex value set into a memory;
(3) Loading the kth edge data block to a memory; the initial value of k is 1;
(4) When the kth edge data block is in an active state, updating a target vertex value by using a task updating function based on the kth edge data block and the vertex value set which are accessed by a plurality of graph tasks concurrently; when the k-th edge data block is in an active state, turning to the step (5);
(5) Returning the k-th edge data block to the disk;
(6) Judging whether k = P, if so, turning to the step (7), otherwise, enabling k = k +1, and returning to the step (3);
(7) Judging whether the convergence condition is met, and if so, outputting a final vertex value; otherwise, let k =1, return to step (3).
Preferably, the map data subblocks further comprise an index structure; an index structure corresponds to an edge data block, and the index structure is used for recording the offset of a first emergent edge of a vertex corresponding to the corresponding edge data block in the edge data block; when executing multiple graph tasks, loading the graph tasks and corresponding edge data blocks into the memory.
Preferably, the method for loading the currently processed edge data block into the memory includes:
respectively calculating the read-write expenditure of a disk for sequentially loading all edge data and the read-write expenditure of a disk for randomly loading an active edge;
wherein a vertex is defined as an active vertex if and only if its vertex value was updated in the previous iteration; if and only if the source vertex of the edge is an active vertex, the edge is defined as an active edge; the read-write overhead is calculated by dividing the total data volume of the read-write graph data by the access bandwidth of the disk;
judging whether the read-write expenditure of the disk for sequentially loading all the edge data is less than the read-write expenditure of the disk for randomly loading the active edge; if so, selecting to load all the edge data in sequence, and otherwise, selecting to load the active edge data randomly.
Preferably, the specific steps of converting the input graph data into the edge data blocks and the vertex values are as follows:
allocating a vertex value to each vertex in the input graph data, and storing a vertex value set into a disk;
dividing the vertex value into P disjoint subintervals, and setting each subinterval to correspond to an edge data block; the value of P is to ensure that the size of each edge data block is smaller than the capacity of the memory;
and storing the edge data block and the vertex value set into a disk.
Preferably, the method for updating the destination vertex is: and reading the source vertex by adopting a push model, and updating the target vertex value by adopting atomic operation according to the updating functions of a plurality of graph tasks.
Preferably, the convergence condition is that the vertex value of each subinterval no longer changes.
Examples
As shown in fig. 1, the present invention provides a concurrent multitasking-oriented disk graph processing method, including the following steps:
(1) Converting input graph data into P graph data sub-blocks and vertex value sets;
wherein, the graph data sub-blocks comprise edge data blocks based on CSR (Compressed spare Row) and an index structure; assigning a vertex value to each vertex in the input graph data; the side data block is used for storing emergent side data of the corresponding vertex; the index structure is used for recording the offset of the first emergent edge of each vertex in the edge data block;
(2) Loading the vertex value set into a memory;
(3) Selecting to load all edge data in sequence or randomly load active edge data to a memory according to an index structure by comparing read-write overheads of all edge data of a kth edge data block and an active edge data disk; k has an initial value of 1;
(4) When the kth edge data block is in an active state, updating a target vertex value by using a task updating function based on the kth edge data block and a vertex value set which are accessed by a plurality of graph tasks concurrently; when the k-th edge data block is in an active state, turning to the step (5);
(5) Returning the kth graph data sub-block to the disk;
(6) Judging whether k = P, if so, turning to the step (7), otherwise, enabling k = k +1, and returning to the step (3);
(7) Judging whether the convergence condition is met, and if so, outputting a final vertex value; otherwise, let k =1, return to step (3).
Preferably, the step (1) specifically comprises the following steps:
allocating a vertex value to each vertex in the input graph data, and storing a vertex value set into a disk;
dividing the vertex value into P disjoint subintervals, and setting each subinterval to correspond to one edge data block in the disk; the value of P ensures that the size of each edge data block is smaller than the capacity of the memory; each edge data block is used for storing emergent edge data of a corresponding vertex;
each edge data block is correspondingly provided with an index structure; the index structure is used for recording the offset of the first emergent edge of each vertex corresponding to the edge data block in the edge data block;
constructing the edge data block and the corresponding index structure into a graph data sub-block, and storing the graph data sub-block into a disk;
the sub-blocks of map data are stored in disk, and each sub-block of map data is loaded into memory in turn during the computation.
Fig. 2 (a) is a schematic diagram of a directed graph G provided by the embodiment of the present invention, and fig. 2 (b) is a schematic diagram of a structure in which the directed graph G provided by the embodiment of the present invention is organized into edge data blocks based on a CSR; as shown in fig. 2 (b), the specific process is as follows:
(1.1) partitioning the vertices in the directed graph G into two disjoint subintervals 1 (comprising vertex values 1,2, 3) and 2 (comprising vertex values 4,5, 6);
(1.2) correspondingly creating an edge block structure (edge block) in a disk for each subinterval so as to store emergent edge data of a vertex of the subinterval; fig. 2 (a) is divided into an edge data block 1 and an edge data block 2;
(1.3) creating an index structure for each edge data block, wherein the index structure is used for storing the offset of the first emergent edge of each vertex in the edge block; wherein, the edge data block and the index structure form a graph data sub-block;
(1.4) storing the 2 sub-blocks of graph data in a magnetic disk.
Preferably, the step (3) selects to load all the edge data to the memory sequentially or randomly according to the index structure according to the disk read-write overhead of the kth edge data block; the method specifically comprises the following steps:
(3.1) respectively calculating the read-write expenses of the current disk for sequentially loading all the edge data and the read-write expenses of the current disk for randomly loading the active edge;
one vertex is defined as an active vertex if and only if the vertex value of that vertex was updated in the previous iteration; an edge is defined as an active edge if and only if the source vertex of the edge is an active vertex;
the read-write overhead is calculated by dividing the total data volume of the read-write graph data by the access bandwidth of the disk;
(3.2) judging whether the read-write expense of the disk for sequentially loading all the edge data is smaller than the read-write expense of the disk for randomly loading the active edge; if so, selecting to load all the edge data in sequence, otherwise selecting to load the active edge data randomly;
and (4) loading the edge data block according to the selected disk I/O access mode to enable a plurality of concurrent graph tasks to access the graph data sub-blocks and the source vertexes and update the destination vertex values, wherein the method specifically comprises the following steps:
(4.1) side data block access: in each iteration process, each edge data block is sequentially loaded into a memory to realize the shared access of a Concurrent Graph Computing (CGP) task; in addition, the vertex value of each CGP task is also loaded at the same time in step (2); in the process of access, a certain edge data block may no longer contain active edge data, that is, all CGP tasks no longer need to access the edge data of the edge data block, in this case, the inactive edge data blocks are skipped by selective data access, and only the edge data block containing the active edge is accessed;
(4.2) processing the data blocks in parallel: after each edge data block is loaded into the memory, the related CGP task (an active edge exists in the edge block) starts to access the edge data in the edge block concurrently, and executes the updating process of the destination vertex value; after each edge block is processed by all related CGP tasks, the memory starts to load the next edge block;
(4.3) update propagation: when processing the edge data in each edge data block, reading the source vertex data and updating the destination vertex by adopting a push model; the process of updating the vertex is carried out according to a specific updating function of each CGP task; meanwhile, atomic operation (atomic operation) is used when the destination vertex is updated so as to ensure the consistency of the calculation result;
fig. 3 is a schematic diagram of processing a vertex and an edge of a subinterval 1 in a directed graph G by a concurrent graph task according to an embodiment of the present invention; the system needs to process three CGP tasks, including a PageRank diagram task, a Connected Components (CC) diagram task and a Single Source Shortest Path (SSSP) diagram task; the system also decouples the task graph from the application-based vertex attribute values so that multiple CGP tasks can share a piece of graph data; meanwhile, each CGP task can maintain a specific vertex value of the application, and the vertex values are continuously updated in the calculation process until the corresponding CGP task reaches a convergence state;
as shown in fig. 3, after the edge data block 1 is loaded into the memory, the CGP tasks concurrently access and process the edge data block as a shared subgraph; subsequently, each CGP task updates the application-specific vertex value according to the push update model, namely, data are read from the source vertex of each edge, and then a corresponding update function is called to update the target vertex value; after all the edge data blocks are processed by the CGP tasks, the system starts to execute the next iteration until all the CGP tasks reach a convergence state;
step (7) judging whether a convergence condition is reached, if so, outputting a final vertex value; otherwise, enabling k =1, and returning to the step (3); the convergence condition is preset by a user; in the present embodiment, the convergence condition is reached when the vertex values {1,2,3} and {4,5,6} of each subinterval are not changed;
the system reaches a convergence condition, ends the iterative processing, and outputs vertex values in the graph data.
Compared with the prior art, the invention has the following advantages:
the method converts input graph data into graph data subblocks and vertex value sets, and all tasks can access unified graph data and the vertex value sets; the index structure is arranged in the graph data subblock, so that quick access of a plurality of tasks can be supported, and meanwhile, the graph data is stored in a disk and loaded to a memory when needed, so that the I/O access overhead of the disk can be reduced.
The method for loading the side data block in the memory calculates the disk read-write expense for loading all the side data in the current sequence and the disk read-write expense for randomly loading the active side, and then determines which mode to load, thereby reducing the disk read-write expense.
In the invention, a plurality of graph tasks skip the side data blocks in the inactive state and access the side data blocks on the active side in a selective data method mode, thereby avoiding the loading of useless disk data and the waste of disk reading and writing.
The invention adopts the graph data to update the destination vertex, solves the problems of redundant access and storage overhead in the processing process, and avoids the competition of disk bandwidth.
It will be understood by those skilled in the art that the foregoing is only an exemplary embodiment of the present invention, and is not intended to limit the invention to the particular forms disclosed, since various modifications, substitutions and improvements within the spirit and scope of the invention are possible and within the scope of the appended claims.

Claims (2)

1. A concurrent multitask-oriented disk graph processing method is characterized by comprising the following steps:
storing the edge data block and the vertex value set converted from the input graph data into a disk;
when a plurality of graph tasks are executed, loading the vertex value set into a memory, and loading the edge data blocks in the memory in an in-and-out mode;
updating a target vertex value by using a task updating function based on the edge data blocks and the vertex value sets which are accessed by a plurality of graph tasks concurrently;
when the target vertex values of all the accessed edge data blocks are updated and meet the convergence condition, outputting the final vertex value; otherwise, circularly loading the edge data block in the memory and updating the target vertex value;
wherein the graph data sub-blocks comprise edge data blocks; the side data block is used for storing emergent side data of the vertex;
the specific steps of converting the input graph data into the edge data blocks and the vertex values are as follows:
allocating a plurality of vertex values to each vertex in the input graph data, and storing a vertex value set into a disk; the number of vertex values distributed by one vertex is consistent with the number of graph tasks;
dividing the vertex into P disjoint subintervals, and setting each subinterval to correspond to one edge data block; the value of P ensures that the size of each edge data block is smaller than the capacity of the memory;
storing the edge data block and the vertex set into a disk;
wherein the graph data sub-blocks further comprise an index structure; the index structure is used for recording the offset of a first emergent edge of a vertex corresponding to the corresponding edge data block in the edge data block; when executing a plurality of graph tasks, loading the graph tasks and the corresponding edge data blocks into a memory;
the mode of the multiple graph tasks for concurrently accessing the edge data block is as follows:
in the access process, the multiple graph tasks skip the side data blocks in the inactive state in a selective data access mode and only access the side data blocks containing the active sides; wherein, the side data block in the inactive state is the side data block which does not contain active side data;
the specific execution steps of the disk map processing method comprise:
(1) Converting input graph data into P edge data blocks and a vertex value set; each vertex in the input graph data is assigned with a plurality of vertex values, and the quantity of the vertex values assigned by one vertex is consistent with the quantity of graph tasks;
(2) Loading the vertex value set into a memory;
(3) Loading the kth edge data block to a memory; k has an initial value of 1;
(4) When the kth edge data block is in an active state, updating a target vertex value of each graph task by using a task update function based on the kth edge data block and a vertex value set which are accessed by a plurality of graph tasks concurrently; when the k-th edge data block is in an active state, turning to the step (5);
(5) Returning the k-th edge data block to the disk;
(6) Judging whether k = P, if so, turning to the step (7), otherwise, enabling k = k +1, and returning to the step (3);
(7) Judging whether the convergence condition is met, and if so, outputting a final vertex value; otherwise, let k =1, return to step (3);
the method for loading the edge data block into the memory comprises the following steps:
respectively calculating the read-write expenditure of a disk for sequentially loading all edge data and the read-write expenditure of a disk for randomly loading an active edge;
wherein a vertex is defined as an active vertex if and only if its vertex value was updated in the previous iteration; if and only if the source vertex of the edge is an active vertex, the edge is defined as an active edge; the read-write overhead is calculated by dividing the total data volume of the read-write graph data by the access bandwidth of the disk;
judging whether the read-write overhead of the disk for sequentially loading all the edge data is less than the read-write overhead of the disk for randomly loading the active edge; if so, selecting to load all the edge data in sequence, otherwise selecting to load the active edge data randomly;
the method for updating the destination vertex comprises the following steps:
reading a source vertex by adopting a push model based on the edge data block and the vertex value set;
and inputting the source vertex into an updating function of a plurality of graph tasks, and updating the destination vertex value by adopting an atomic operation.
2. The disk map processing method according to claim 1, wherein the convergence condition is that the vertex value of each subinterval does not change any more.
CN202110175548.XA 2021-02-09 2021-02-09 Concurrent multitask-oriented disk graph processing method Active CN112988064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110175548.XA CN112988064B (en) 2021-02-09 2021-02-09 Concurrent multitask-oriented disk graph processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110175548.XA CN112988064B (en) 2021-02-09 2021-02-09 Concurrent multitask-oriented disk graph processing method

Publications (2)

Publication Number Publication Date
CN112988064A CN112988064A (en) 2021-06-18
CN112988064B true CN112988064B (en) 2022-11-08

Family

ID=76392475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110175548.XA Active CN112988064B (en) 2021-02-09 2021-02-09 Concurrent multitask-oriented disk graph processing method

Country Status (1)

Country Link
CN (1) CN112988064B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116414733B (en) * 2023-03-03 2024-02-20 港珠澳大桥管理局 Data processing method, device, computer equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952032A (en) * 2015-06-19 2015-09-30 清华大学 Graph processing method and device as well as rasterization representation and storage method
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN106777351A (en) * 2017-01-17 2017-05-31 中国人民解放军国防科学技术大学 Computing system and its method are stored based on ART tree distributed systems figure
CN109240600A (en) * 2018-07-24 2019-01-18 华中科技大学 A kind of disk figure processing method based on mixing more new strategy
CN109254725A (en) * 2018-07-26 2019-01-22 华中科技大学 A kind of disk figure processing method and system based on subgraph building
CN109522428A (en) * 2018-09-17 2019-03-26 华中科技大学 A kind of external memory access method of the figure computing system based on index positioning
CN110737804A (en) * 2019-09-20 2020-01-31 华中科技大学 graph processing memory access optimization method and system based on activity level layout

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10204174B2 (en) * 2015-12-15 2019-02-12 Oracle International Corporation Efficient method for subgraph pattern matching
CN107122244B (en) * 2017-04-25 2020-02-14 华中科技大学 Multi-GPU-based graph data processing system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952032A (en) * 2015-06-19 2015-09-30 清华大学 Graph processing method and device as well as rasterization representation and storage method
CN106095552A (en) * 2016-06-07 2016-11-09 华中科技大学 A kind of Multi-Task Graph processing method based on I/O duplicate removal and system
CN106777351A (en) * 2017-01-17 2017-05-31 中国人民解放军国防科学技术大学 Computing system and its method are stored based on ART tree distributed systems figure
CN109240600A (en) * 2018-07-24 2019-01-18 华中科技大学 A kind of disk figure processing method based on mixing more new strategy
CN109254725A (en) * 2018-07-26 2019-01-22 华中科技大学 A kind of disk figure processing method and system based on subgraph building
CN109522428A (en) * 2018-09-17 2019-03-26 华中科技大学 A kind of external memory access method of the figure computing system based on index positioning
CN110737804A (en) * 2019-09-20 2020-01-31 华中科技大学 graph processing memory access optimization method and system based on activity level layout

Also Published As

Publication number Publication date
CN112988064A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
JP5575997B1 (en) Semiconductor device and entry address writing / reading method for semiconductor device
CN109522428B (en) External memory access method of graph computing system based on index positioning
US20170300592A1 (en) Bucketized Hash Tables with Remap Entries
CN105117417A (en) Read-optimized memory database Trie tree index method
CN110516810B (en) Quantum program processing method and device, storage medium and electronic device
CN102662869B (en) Memory pool access method in virtual machine and device and finger
CN111832065A (en) Software implemented using circuitry and method for key-value storage
CN111126625A (en) Extensible learning index method and system
CN112988064B (en) Concurrent multitask-oriented disk graph processing method
CN110688055B (en) Data access method and system in large graph calculation
CN109189994B (en) CAM structure storage system for graph computation application
CN114444274A (en) Method, medium and device for reconstructing original structure grid from non-structure grid
CN116431080B (en) Data disc-dropping method, system, equipment and computer readable storage medium
CN104794102A (en) Embedded system on chip for accelerating Cholesky decomposition
CN109254725B (en) Disk graph processing method and system based on subgraph construction
CN108021678B (en) Key value pair storage structure with compact structure and quick key value pair searching method
CN109240600B (en) Disk map processing method based on mixed updating strategy
CN112035380B (en) Data processing method, device and equipment and readable storage medium
CN110377601B (en) B-tree data structure-based MapReduce calculation process optimization method
KR102354343B1 (en) Spatial indexing method and apparatus for blockchain-based geospatial data
CN112068948B (en) Data hashing method, readable storage medium and electronic device
CN114237903A (en) Memory allocation optimization method, memory allocation optimization device, electronic equipment, memory allocation optimization medium and program product
CN113065035A (en) Single-machine out-of-core attribute graph calculation method
JP2023503034A (en) Pattern-based cache block compression
US20240176984A1 (en) Data processing device and method, and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant