CN111177482A

CN111177482A - Method, device and equipment for parallel processing of graph data and readable storage medium

Info

Publication number: CN111177482A
Application number: CN201911402930.9A
Authority: CN
Inventors: 梅国强; 郝锐; 王江为; 阚宏伟
Original assignee: Inspur Beijing Electronic Information Industry Co Ltd
Current assignee: Inspur Beijing Electronic Information Industry Co Ltd
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2020-05-19
Anticipated expiration: 2039-12-30
Also published as: CN111177482B

Abstract

The application discloses a method for parallel processing of graph data, which comprises the following steps: acquiring graph data; screening source addresses in the graph data to obtain independent source addresses; determining a corresponding destination address according to the independent source address, and screening the destination address to obtain an independent destination address; and acquiring corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and performing parallel processing on the independent source data and the independent destination data to obtain a parallel processing result. According to the method and the device, a plurality of processing units are not required to process simultaneously, and the parallel processing of the graph data can be completed only by one processing unit, so that the cache overhead and the communication overhead in the graph data parallel processing process are greatly reduced. The application also provides a device, equipment and a readable storage medium for parallel processing of the graph data, and the beneficial effects are achieved.

Description

Method, device and equipment for parallel processing of graph data and readable storage medium

Technical Field

The present application relates to the field of graph data processing, and in particular, to a method, an apparatus, a device, and a readable storage medium for graph data parallel processing.

Background

In the big data era, the graph is used as a basic data representation mode and widely applied to various algorithms such as deep learning, user recommendation and the like. At present, the scale of the graph is often in the order of tens of millions to hundreds of millions of nodes, and the edge (node-node contact) information of the graph is in the order of billions. The graph in actual life has the characteristics of large scale, sparse and discontinuous nodes, power law degree distribution of node network degree distribution and the like, and brings huge challenges for designing an effective graph processing algorithm.

In the prior art, the parallel processing of the graph data is realized by increasing the number of processing units. However, this approach has limited processing power of a single processing unit, and requires a larger cache unit than a single processing unit due to the need for multiple blocks to be processed simultaneously. Meanwhile, as the number of blocks is increased, the communication overhead ratio between different blocks is increased, so that the number of blocks which are simultaneously operated in parallel is limited to a certain extent.

Therefore, how to reduce the cache overhead and communication overhead in the graph data parallel processing process is a technical problem that needs to be solved by those skilled in the art at present.

Disclosure of Invention

The application aims to provide a method, a device and equipment for graph data parallel processing and a readable storage medium, which are used for reducing cache overhead and communication overhead in the graph data parallel processing process.

In order to solve the above technical problem, the present application provides a method for parallel processing of graph data, including:

acquiring graph data;

screening the source address in the graph data to obtain an independent source address; wherein the independent source addresses are source addresses with different values;

determining a corresponding destination address according to the independent source address, and screening the destination address to obtain an independent destination address; wherein the independent destination addresses are destination addresses with different values;

and acquiring corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and performing parallel processing on the independent source data and the independent destination data to obtain a parallel processing result.

Optionally, the screening the source address in the graph address to obtain an independent source address includes:

storing each source address into a corresponding input FIFO respectively;

selecting a screening channel according to the source address; the two ends of the screening channel are the input FIFO and the output FIFO respectively, and only one source address is allowed to pass through the screening channel at each moment;

determining a source address in each of the output FIFOs as the independent source address.

Optionally, selecting a screening channel according to the source address includes:

determining the highest address of the source address as the mark data;

selecting a first-stage FIFO according to the mark data, and moving the source address from the input FIFO to the first-stage FIFO;

judging whether the mark data is the last address;

if not, updating the mark data to be the next address of the mark data in the source address, selecting the next-level FIFO according to the updated mark data, moving the source address to the next-level FIFO, and returning to execute the step of judging whether the mark data is the last address;

and if so, moving the source address to the corresponding output FIFO.

Optionally, obtaining corresponding independent source data and independent destination data according to the independent source address and the independent destination address includes:

simultaneously reading independent source data corresponding to each independent source address by using a source static random access memory with a preset number of ports;

and simultaneously reading independent destination data corresponding to each independent destination address by using a destination static random access memory with the preset number of ports.

The present application further provides an apparatus for parallel processing of graph data, the apparatus comprising:

the acquisition module is used for acquiring graph data;

the first screening module is used for screening the source address in the graph data to obtain an independent source address; wherein the independent source addresses are source addresses with different values;

the second screening module is used for determining a corresponding destination address according to the independent source address and screening the destination address to obtain an independent destination address; wherein the independent destination addresses are destination addresses with different values;

and the parallel processing module is used for acquiring corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and performing parallel processing on the independent source data and the independent destination data to obtain a parallel processing result.

Optionally, the first screening module includes:

the storage submodule is used for respectively storing each source address into the corresponding input FIFO;

the selection submodule is used for selecting a screening channel according to the source address; the two ends of the screening channel are the input FIFO and the output FIFO respectively, and only one source address is allowed to pass through the screening channel at each moment;

a determining submodule for determining a source address in each of the output FIFOs as the independent source address.

Optionally, the selecting sub-module includes:

a determination unit configured to determine that a most significant address of the source address is flag data;

a selection unit, configured to select a first stage FIFO according to the tag data, and move the source address from the input FIFO to the first stage FIFO;

a judging unit for judging whether the flag data is the last bit address;

an updating unit, configured to update the tag data to a next address of the tag data in the source address when the tag data is not a last address, select a next FIFO according to the updated tag data, move the source address to the next FIFO, and return to the step of determining whether the tag data is the last address;

and the moving unit is used for moving the source address to the corresponding output FIFO when the mark data is the last bit address.

Optionally, the parallel processing module includes:

the first reading submodule is used for simultaneously reading independent source data corresponding to each independent source address by using a source static random access memory with a preset number of ports;

and the second reading submodule is used for simultaneously reading the independent destination data corresponding to each independent destination address by using the destination static random access memory with the preset number of ports.

The present application also provides a graph data parallel processing apparatus, including:

a memory for storing a computer program;

a processor for implementing the steps of the method for graph data parallel processing according to any one of the above when the computer program is executed.

The present application also provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of graph data parallel processing according to any one of the above.

The method for parallel processing of graph data provided by the application comprises the following steps: acquiring graph data; screening source addresses in the graph data to obtain independent source addresses; wherein, the independent source addresses are source addresses with different values; determining a corresponding destination address according to the independent source address, and screening the destination address to obtain an independent destination address; wherein, the independent destination addresses are destination addresses with different values; and acquiring corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and performing parallel processing on the independent source data and the independent destination data to obtain a parallel processing result.

According to the technical scheme, the source address in the graph data is screened to obtain the independent source address, the corresponding destination address is determined according to the independent source address, the destination address is screened to obtain the independent destination address, the corresponding independent source data and the independent destination data are finally obtained, parallel processing of the graph data is completed, a plurality of processing units are not needed to process simultaneously, only one processing unit is needed to complete parallel processing of the graph data, and cache overhead and communication overhead in the parallel processing process of the graph data are greatly reduced. The application also provides a device, equipment and a readable storage medium for parallel processing of the graph data, which have the beneficial effects and are not described herein again.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a flowchart illustrating a method for parallel processing of graph data according to an embodiment of the present disclosure;

FIG. 2 is a schematic structural diagram of a 64-port SRAM according to an embodiment of the present application;

FIG. 3 is a flow chart of an actual representation of S102 in a graph data parallel processing method provided in FIG. 1;

FIG. 4 is a flow chart of an actual representation of S302 of a method of graph data parallel processing provided in FIG. 3;

fig. 5 is a structural diagram of an 8-channel parallel data screening architecture according to an embodiment of the present disclosure;

FIG. 6 is a block diagram of an apparatus for parallel processing of graph data according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a graph data parallel processing device according to an embodiment of the present application.

Detailed Description

The core of the application is to provide a method, a device and equipment for graph data parallel processing and a readable storage medium, which are used for reducing cache overhead and communication overhead in the graph data parallel processing process.

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a flowchart illustrating a method for parallel processing of graph data according to an embodiment of the present disclosure.

The method specifically comprises the following steps:

s101: acquiring graph data;

the graph is an abstract data structure for representing associations between objects, described using vertices and edges: vertices represent objects and edges represent relationships between objects. Data that can be abstracted into a graph description is graph data. The graph calculation is the process of expressing and solving the problem by taking the graph as a data model.

Based on the prior art, a graph is generally decomposed into a plurality of small blocks according to a source node and a destination node, and different small blocks are processed by different processing units. The parallelism of the graph data processing is realized by increasing the number of processing units. However, this approach has limited processing power of a single processing unit, and requires a larger cache unit than a single processing unit due to the need for multiple blocks to be processed simultaneously. Meanwhile, as the number of blocks is increased, the communication overhead ratio between different blocks is increased, so that the number of blocks which are simultaneously operated in parallel is limited to a certain extent. Therefore, the present application provides a method for parallel processing of graph data, which is used to solve the above problems.

S102: screening source addresses in the graph data to obtain independent source addresses;

as shown in fig. 2, which is a schematic diagram of a 64-port sram structure, in order to enable a single processing unit to process multiple paths of data in parallel, a parallel processing apparatus needs to be able to read multiple paths of data simultaneously. Taking 64-way parallel processing as an example, 64 ports of sram can be used to read corresponding data at the same time for processing. The address of the static random access memory adopts the lower bit of the destination address of the edge data or the hash value of the destination address of the edge data to distinguish different channels, however, for given graph data, the situation that a certain specific static random access memory is read simultaneously exists, so that the parallel processing cannot be carried out, therefore, the situation that a certain specific static random access memory is read is avoided by screening the source address and the destination address in the graph data.

The independent source addresses are source addresses with different values, and the meaning of screening the source addresses in the graph data is to ensure that two same source addresses do not exist at the same time;

optionally, the screening of the source address in the graph data may be implemented by a software program, or may also be implemented by a hardware device.

S103: determining a corresponding destination address according to the independent source address, and screening the destination address to obtain an independent destination address;

the independent destination addresses mentioned here are destination addresses whose values are different from each other; as mentioned herein, the significance of screening the destination addresses is to ensure that there are no two identical destination addresses at the same time;

optionally, the filtering of the destination address may be implemented by a software program, or may also be implemented by a hardware device.

S104: and acquiring corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and performing parallel processing on the independent source data and the independent destination data to obtain a parallel processing result.

Optionally, parallel processing of independent source data and independent destination data may be completed by a multi-path floating point parallel processing unit;

preferably, the obtaining of the corresponding independent source data and independent destination data according to the independent source address and the independent destination address mentioned here may specifically be:

and simultaneously reading independent destination data corresponding to each independent destination address by using a destination static random access memory with a preset number of ports.

Based on the embodiment, the independent source data and the independent target data are read in parallel by using the source static random access memory with the preset number of ports, so that the effect of parallel processing of the graph data by a single processing unit is realized, and the cache overhead and the communication overhead in the process of parallel processing of the graph data are further reduced.

Based on the technical scheme, the method for parallel processing of the graph data obtains the independent source address by screening the source address in the graph data, then determines the corresponding destination address according to the independent source address, screens the destination address to obtain the independent destination address, and finally completes parallel processing of the graph data by the corresponding independent source data and the independent destination data.

For step S102 in the previous embodiment, the source address in the graph address is filtered to obtain an independent source address, which may specifically be the step shown in fig. 3, and is described below with reference to fig. 3.

Referring to fig. 3, fig. 3 is a flowchart illustrating an actual representation of S102 in the graph data parallel processing method of fig. 1.

The method specifically comprises the following steps:

s301: storing each source address into a corresponding input FIFO respectively;

the FIFO (First in First out, First in First out buffer) has the characteristic of First in First out, and the First stored source address is preferentially output.

S302: selecting a screening channel according to a source address;

the two ends of the screening channel are respectively an input FIFO and an output FIFO, and only one source address is allowed to pass through the screening channel at each moment;

optionally, the selecting of the screening channel according to the source address mentioned here may specifically be selecting an intermediate FIFO of the screening channel according to address bits from low to high of the source address;

preferably, the selection of the screening channel according to the source address mentioned in step S302 may specifically be a step shown in fig. 4, which is described below with reference to fig. 4.

Referring to fig. 4, fig. 4 is a flowchart illustrating an actual representation of S302 in the graph data parallel processing method of fig. 3.

The method specifically comprises the following steps:

s401: determining the highest address of the source address as the mark data;

s402: selecting a first-stage FIFO according to the mark data, and moving a source address from the input FIFO to the first-stage FIFO;

s403: judging whether the mark data is the last address;

if not, go to step S404; if yes, the process proceeds to step S405.

S404: updating the mark data to the next address of the mark data in the source address, selecting the next-level FIFO according to the updated mark data, moving the source address to the next-level FIFO, and returning to the step of judging whether the mark data is the last address;

s405: the source address is moved to the corresponding output FIFO.

Based on the embodiment, the next-level FIFO where the source address is to be stored is selected through the mark data, and when the mark data is the last address, the source address is moved to the corresponding output FIFO, so that the source address screening is completed.

S303: the source address in each output FIFO is determined to be an independent source address.

As shown in fig. 5, an 8-channel parallel data screening architecture, which is composed of an input FIFO, a three-level intermediate FIFO, a three-level processing, and an output FIFO, is taken as an example for detailed description.

When data screening is performed by using the parallel data screening architecture, source address data input by eight channels D0 to D7 can be screened at the same time, if the input source addresses are 12345677 respectively, that is, binary addresses are 001, 010, 011, 100, 101, 110, 111, 111 respectively, then the source addresses stored in the input FIFO No. 6 and the input FIFO No. 7 are both 111, and the processing procedures of the first two stages are the same:

when the first-stage processing is carried out, the mark data is 1, and then the mark data enters the No. 2 first-stage FIFO and the No. 3 first-stage FIFO of the second part along the direction indicated by the dotted line; when the second-stage processing is carried out, the mark data is 1, and then the mark data enters a No. 0 second-stage FIFO and a No. 1 second-stage FIFO of the fourth part along the direction indicated by the dotted line;

when the third set of processing is performed, the flag data is 1, at this time, the source address in the D6 channel enters the S7 output FIFO along the direction indicated by the dotted line, the source address in the D7 channel enters the S7 output FIFO along the direction indicated by the solid line, both the source address and the source address select the S7 output FIFO, at this time, the source address in the D6 channel can be stored first and output first, and the source address in the D7 channel can be output next time, so that only one source address can pass through at the same time.

Referring to fig. 6, fig. 6 is a structural diagram of an apparatus for parallel processing of graph data according to an embodiment of the present disclosure.

The apparatus may include:

an obtaining module 100, configured to obtain graph data;

the first screening module 200 is configured to screen a source address in the graph data to obtain an independent source address; wherein, the independent source addresses are source addresses with different values;

the second screening module 300 is configured to determine a corresponding destination address according to the independent source address, and screen the destination address to obtain an independent destination address; wherein, the independent destination addresses are destination addresses with different values;

the parallel processing module 400 is configured to obtain corresponding independent source data and independent destination data according to the independent source address and the independent destination address, and perform parallel processing on the independent source data and the independent destination data to obtain a parallel processing result.

Optionally, the first screening module 200 may include:

the selection submodule is used for selecting a screening channel according to a source address; the two ends of the screening channel are respectively an input FIFO and an output FIFO, and only one source address is allowed to pass through the screening channel at each moment;

a determining submodule for determining the source address in each output FIFO to be an independent source address.

Further, the selection sub-module may include:

a determination unit configured to determine that a most significant address of the source address is the tag data;

the selection unit is used for selecting the first-stage FIFO according to the mark data and moving the source address from the input FIFO to the first-stage FIFO;

a judging unit for judging whether the flag data is the last bit address;

an updating unit, configured to update the tag data to a next address of the tag data in the source address when the tag data is not the last address, select a next FIFO according to the updated tag data, move the source address to the next FIFO, and return to the step of determining whether the tag data is the last address;

The parallel processing module 400 may include:

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

Referring to fig. 7, fig. 7 is a structural diagram of a graph data parallel processing device according to an embodiment of the present application.

The graph data parallel processing apparatus 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 722 (e.g., one or more processors) and a memory 732, one or more storage media 730 (e.g., one or more mass storage devices) storing an application 742 or data 744. Memory 732 and storage medium 730 may be, among other things, transient storage or persistent storage. The program stored in the storage medium 730 may include one or more modules (not shown), each of which may include a sequence of instruction operations for the device. Further, the processor 722 may be configured to communicate with the storage medium 730 to execute a series of instruction operations in the storage medium 730 on the graph data parallel processing apparatus 700.

The graph data parallel processing apparatus 700 may also include one or more power supplies 727, one or more wired or wireless network interfaces 750, one or more input-output interfaces 757, and/or one or more operating devices 741, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, and so forth.

The steps in the method for graph data parallel processing described in fig. 1 to 5 above are implemented by a graph data parallel processing apparatus based on the structure shown in fig. 7.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses, apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another apparatus, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.

Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a function calling device, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The detailed description is given above to a method, an apparatus, a device and a readable storage medium for parallel processing of graph data provided by the present application. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for parallel processing of graph data, comprising:

acquiring graph data;

2. The method of claim 1, wherein the screening the source addresses in the graph address to obtain independent source addresses comprises:

storing each source address into a corresponding input FIFO respectively;

3. The method of claim 2, wherein selecting a screening channel based on the source address comprises:

determining the highest address of the source address as the mark data;

judging whether the mark data is the last address;

and if so, moving the source address to the corresponding output FIFO.

4. The method of claim 1, wherein obtaining corresponding independent source data and independent destination data according to the independent source address and the independent destination address comprises:

5. An apparatus for parallel processing of graph data, comprising:

the acquisition module is used for acquiring graph data;

6. The apparatus of claim 5, wherein the first screening module comprises:

7. The apparatus of claim 5, wherein the selection submodule comprises:

a judging unit for judging whether the flag data is the last bit address;

8. The apparatus of claim 5, wherein the parallel processing module comprises:

9. A graph data parallel processing apparatus, comprising:

a memory for storing a computer program;

processor for implementing the steps of the method for graph data parallel processing according to any of claims 1 to 4 when executing said computer program.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for parallel processing of graph data according to any one of claims 1 to 4.