CN114266281A

CN114266281A - Method, device and system for training graph neural network

Info

Publication number: CN114266281A
Application number: CN202010970736.7A
Authority: CN
Inventors: 林郅琦; 李�诚; 王云鹏
Original assignee: University of Science and Technology of China USTC; Huawei Technologies Co Ltd
Current assignee: University of Science and Technology of China USTC; Huawei Technologies Co Ltd
Priority date: 2020-09-15
Filing date: 2020-09-15
Publication date: 2022-04-01
Also published as: WO2022057310A1

Abstract

The application discloses a method for training a graph neural network, which is applied to a distributed system or a parallel system, and comprises the following steps: the central device acquires a first relational graph, determines N different second relational graphs according to the first relational graph, wherein the second relational graph is a subgraph of the first relational graph, the difference value of the number of training vertexes included in any two second relational graphs is smaller than a preset threshold value, and the second relational graph comprises neighbor vertexes of the training vertexes; and sending the information of the N second relational graphs to the N training execution devices, and further executing the training of the graph neural network. In the scheme of the application, the number of the training vertexes in each second relational graph is basically equivalent, and the training vertexes and the corresponding neighbor vertexes thereof are basically divided into the same second relational graph, so that the calculation balance of each training execution device is achieved, the network overhead of the cross-training execution devices is reduced, and the training efficiency of the graph neural network is improved.

Description

Method, device and system for training graph neural network

Technical Field

The application relates to the technical field of computers, in particular to a method, a device and a system for training a graph neural network.

Background

Real-world data is often organized in graphs, where relationships of entities imply strong causal relationships, and these causal graphs may be collectively referred to as relationship graphs, such as: social network diagrams, user commodity relationship diagrams, knowledge maps, protein structure diagrams, and the like. The data of these maps are applied to the training of the neural network (GNN) map, so that the trained GNN can be used to infer causal relationships in the type of maps, for example: a type of user-oriented good or a group of people for which the type of good is suitable, etc. GNN is a multi-layer neural network that operates on graph structured data, where each layer of neural network is aggregated and updated centered on vertices. Polymerization: by collecting the feature information of the neighbor vertex, and adopting the following steps: and the aggregation operation of the accumulation and the average obtains the aggregation information of the fused neighbor vertex information. Updating: and generating new output by using the aggregated information through a full connection layer as input of characteristic information of a neural network of a next layer diagram.

In the real world, relational graphs in many scenes are very large, and often a relational graph is composed of hundreds of millions of vertexes and more than hundreds of billions of edges. That is, GNNs are placed on a plurality of training execution devices, GNNs on each training execution device are the same, but GNNs on different training execution devices are trained by using data on different vertices in a relational graph, and then are fused, so that the GNNs are trained through the whole relational graph, and an applicable GNN model is obtained.

The Deep Graph Library (DGL) is an open source framework specially customized for graph neural network training, and supports a training mode of large-scale graph sampling. In the graph sampling training process, the DGL stores the whole graph data structure and the graph data feature information of the whole relational graph in a shared memory (graph store) of a Central Processing Unit (CPU). Each Graphic Processing Unit (GPU) as a training execution device needs to request a subgraph for training by accessing the whole relational graph in the shared memory, and a Central Processing Unit (CPU) allocates a training subgraph for each GPU in response to the request of each GPU, so that it takes a long time to load data from the CPU to the GPU, and the overhead in the whole training process is large.

Disclosure of Invention

The embodiment of the application provides a method for training a graph neural network, which can improve the training efficiency of the graph neural network. The embodiment of the application also provides a corresponding device, a corresponding system, a corresponding computer readable storage medium, a corresponding computer program product and the like.

A first aspect of the present application provides a method for neural network training, including: acquiring a first relational graph for graph neural network training, wherein the first relational graph comprises a plurality of vertexes and a plurality of edges, each edge is used for connecting two vertexes, and the plurality of vertexes comprise training vertexes for training a graph neural network; determining N different second relational graphs according to the first relational graph, wherein the second relational graph is a subgraph of the first relational graph, N is the number of training execution devices, and N is an integer greater than 1; the difference value of the number of training vertexes included in any two second relational graphs is smaller than a preset threshold value, and the second relational graphs include neighbor vertexes of the training vertexes; and sending information of N second relational graphs to N training execution devices, wherein the N training execution devices correspond to the N second relational graphs one by one, and the N second relational graphs are respectively used for training the neural networks of the training graphs of the corresponding training execution devices.

In the first aspect, the method may be applied to a central device of a neural network training system, where the neural network training system may be a distributed system or a parallel system, the neural network training system further includes a plurality of training execution devices, the central device may be an independent physical machine, a Virtual Machine (VM), a container, or a Central Processing Unit (CPU), and the training execution devices may be an independent physical machine, a VM, a container, a Graphics Processing Unit (GPU), a field-programmable gate array (FPGA), or a dedicated chip.

The domain to which the Graph Neural Networks (GNNs) are applied is very wide, such as: the graph-based recommendation system (e-commerce or social relationship, etc.) may also be applicable in a variety of scenarios such as traffic management or chemistry. GNNs are used for processing graph data, i.e. for processing relational graphs. Relationship graphs are used to represent the connections between entities in the real world, such as: a friendship in a social relationship, a relationship between a consumer and a good in an e-commerce, and the like. The first relational graph is a relational graph used for graph neural network training, each vertex in the first relational graph corresponds to one sample data, each edge represents the relation between sample data, the vertexes directly connected through one edge have a direct incidence relation, the direct incidence relation can also be called one-hop connection, and correspondingly, two vertexes connected through other vertexes have an indirect incidence relation, and the indirect incidence relation can also be called multi-hop connection.

The second relation graph is divided from the first relation graph, the central device divides N second relation graphs from the first relation graph according to the number of the training execution devices, and the vertexes in each second relation graph may be partially overlapped but not completely overlapped. Vertices in the second relationship graph have a direct relationship (one-hop relationship), or have both a direct relationship and an indirect relationship (multi-hop relationship). The multi-hop in the present application includes two hops, or more than two hops.

The neighbor vertices of the training vertices in this application may also be training vertices.

The graph neural network included in the training execution apparatus in the present application refers to a graph neural network Model (GNN Model).

From the first aspect, when the central device partitions the first relational graph, it is considered that the number of vertexes to be partitioned in each second relational graph is equal as much as possible, and the neighbor vertexes of the training vertexes are partitioned into the same second relational graph as much as possible, so that not only is the calculation balance in each training execution device achieved, but also the process that the training execution device needs to be frequently crossed to other training execution devices to read sample data of the related neighbor vertexes in the training process of the graph neural network is reduced, the network overhead of the cross-training execution device is reduced, and the training efficiency of the graph neural network is improved.

In a possible implementation manner of the first aspect, the steps include: determining N different second relationship graphs from the first relationship graph, including: and dividing the target vertex and a plurality of neighbor vertices of the target vertex into a partition with the highest evaluation score of the target vertex according to the evaluation score of the target vertex corresponding to each partition in the N partitions, wherein the target vertex is a training vertex in a first relational graph, and the evaluation score is used for indicating the correlation degree of the target vertex and a vertex which is allocated in each partition before the target vertex is allocated, each partition in the N partitions corresponds to one training execution device, and after each training vertex in the first relational graph is allocated, the vertex in each partition is included in a second relational graph of the training execution device of the corresponding partition.

In this possible implementation, a plurality of neighbor vertices of the target vertex in the first relationship graph may also be determined, and the evaluation score of each partition of the N partitions corresponding to the target vertex is determined according to the set of neighbor vertices and the assigned vertices of the N partitions, where the set of neighbor vertices includes the plurality of neighbor vertices of the target vertex.

In the application, only the vertex is partitioned, and the edge for connecting the two vertexes is not changed, namely, the edge is not added, deleted or changed.

In this possible implementation, the target vertex may be any one of the training vertices in the first relationship graph. The set of neighbor vertices may be referred to as a set of neighbor vertices, that is, the set of neighbor vertices includes neighbor vertices that are all target vertices. The training vertices in the first relationship graph may be partitioned from training vertex to training vertex in a round-robin manner. Before dividing the training vertices in the first relation graph, a partition may be configured for each training execution device, and the partitions may be located in a storage space of the central device or a storage space corresponding to the central device. Then, starting from the first training vertex, the training vertices are divided into partitions one by one, and the specific division of each training vertex into which partition needs to be determined according to the evaluation score. After all the training vertices in the first relational graph are divided, the vertices in each partition can be formed into a second relational graph.

In the present application, the neighbor vertices included in the neighbor vertex set can be divided into two cases. One is as follows: the vertexes directly connected with the training vertex through one edge are called neighbor vertexes, namely the definition of the neighbor vertexes is a one-hop relation, and only the vertexes connected with the training vertex through one edge can belong to the neighbor vertex set. The other is as follows: besides the previous one-hop relationship, the method also includes transferring with a training vertex through one or more other vertices, and a vertex that can be connected to the training vertex through at least two edges may also be called a neighbor vertex, that is, the definition of the neighbor vertex is a multi-hop relationship, except for a vertex directly connected to the training vertex through an edge, such as: vertices that are reached by "two hops" or "three hops" from a target vertex can all be attributed to multiple neighbor vertices of the target vertex. Typically, the number of hops between the neighbor vertex and the target vertex is less than a threshold, such as less than five hops. In some implementations of the present application, vertices whose number of hops from a target vertex is less than or equal to a value of some specified number of hops information may be used as neighbor vertices of the target vertex by specifying the number of hops information.

In the present application, the degree of correlation represents the weight of the neighboring vertex of the vertex to which the target vertex is assigned in each partition, and the evaluation score is a numerical indicator, and a specific value reflects the degree of closeness between the target vertex and the vertex to which the partition is assigned, that is, the weight of the neighboring vertex of the target vertex included in the partition. The higher the evaluation score is, the higher the weight of the neighbor vertex containing the target vertex among the assigned vertices in the partition is, the more suitable the target vertex is for the partition. The assignment may not need to be repeated for the vertex in the set of neighbor vertices that has been assigned the highest scoring score. According to the possible implementation mode, the vertexes with high correlation degrees are distributed to the same partition and then belong to the same second relation graph, so that the network overhead of scheduling the data of the correlated vertexes across the training execution device in the training process can be effectively avoided.

In a possible implementation manner of the first aspect, the method further includes: a plurality of neighbor vertices of the target vertex are obtained according to the hop count information, which indicates a maximum number of edges in a path from the target vertex to each of the plurality of neighbor vertices.

In this possible implementation, the hop count information refers to the aforementioned one-hop relationship or multi-hop relationship. If the hop count information is 1, it indicates that the neighbor vertex set includes vertices having a direct association relationship with the target vertex, and if the hop count information is 2, it indicates that the neighbor vertex set includes vertices connected by vertices having a direct association relationship in addition to the vertices having a direct association relationship with the target vertex, and similarly, if the hop count information is 3, it indicates that the neighbor vertex set may further include vertices associated with the target vertex by three hops, and so on. When there are a plurality of neighbor vertices of one target vertex, the hop count information indicates the hop count of the farthest vertex, that is, the maximum number of edges included in a single path formed from the target vertex to each neighbor vertex. Known from this possible implementation manner, vertices can be allocated by controlling the hop count information, which is beneficial to improving the directional requirements of the neural network of the graph, such as: and finding the best friends and finding the commodities in which the user is most interested.

In a possible implementation manner of the first aspect, the evaluation score of the target vertex in the first partition is positively correlated with the coincidence number of the first partition, the coincidence number of the first partition is used to indicate the number of the multiple neighbor vertices coinciding with the allocated vertex in the first partition, and the first partition is any one of the N partitions.

In this possible implementation, the following may also be described first: determining the coincidence number of the vertex distributed in the first partition of the neighbor vertex set and the N partitions, and determining the evaluation score of each partition of the N partitions corresponding to the target vertex according to the coincidence number of the vertex distributed in each partition of the neighbor vertex set.

Vertex coincidence in this application means that there is a vertex in the plurality of neighboring vertices that is the same as the vertex that has been assigned in the first partition.

In this possible implementation, some or all of the neighbor vertices of the target vertex may have been allocated to the partition, so that there are vertices in the set of neighbor vertices that coincide with the allocated vertices in the partition, and the correlation between the target vertex and the partition may be determined by the number of the coinciding vertices. The larger the number of coincidence, the higher the correlation between the target vertex and the partition. The method for determining the evaluation score through the coincidence number of the neighbor vertex and the distributed vertex can effectively divide the vertex with high correlation degree into one partition, and further effectively avoid network overhead of frequently scheduling data of the associated vertex across the training execution device in the training process.

In a possible implementation manner of the first aspect, the evaluation score of the first partition is a product of a coincidence number of the first partition and an equilibrium ratio of the first partition, where the equilibrium ratio is used to indicate a probability that the target vertex is partitioned into the first partition, the equilibrium ratio is a ratio of a first difference value and a number of vertices of the first partition after adding multiple neighbor vertices, and the first difference value is a difference value between a preconfigured upper limit value of the number of vertices of the first partition and a number of vertices already allocated in the first partition.

In this possible implementation, the following may also be described first: and determining the balance ratio of the first partition, wherein the balance ratio is the ratio of a first difference value to the number of vertexes of the first partition after the neighbor vertex set is added, and the first difference value is the difference value between the preset upper limit value of the number of vertexes of the first partition and the number of vertexes distributed in the first partition. Then, according to the product of the coincidence number of the plurality of neighbor vertices and the assigned vertices in the first partition and the balance ratio corresponding to the first partition, the evaluation score of the first partition corresponding to the target vertex is determined.

In this possible implementation, the balance ratio represents a difference between the number of vertices allocated in the first partition and a preconfigured upper limit value of the number of vertices of the first partition, and then a ratio between the number of vertices allocated in the partition and the number of vertices added to the neighbor vertex set, where the larger the number of vertices allocated in the partition, the smaller the ratio, and from the viewpoint of storage balance, it is indicated that too many vertices are unsuitable to be subdivided in the partition. Therefore, the equalization ratio is introduced to the superposition number, and the calculation equalization and the storage equalization are considered more comprehensively.

In a possible implementation manner of the first aspect, the out-degree of the vertices in the N second relational graphs satisfies a first preset condition, and the out-degree represents the number of edges connected by one vertex.

In this possible implementation, in consideration of the fact that the number of vertices divided into each partition is large and the storage space of the central device may be limited, in this case, vertices with higher degrees of departure than the first preset condition may be preferentially placed in the second relation graph, and vertices with degrees of departure smaller than the first preset condition may be discarded. The first preset condition may be pre-configured or dynamically generated, and may be a specific value.

In a possible implementation manner of the first aspect, the method further includes: and sending sample data corresponding to the vertex with the out-degree meeting the second preset condition in the second relation graph to the training execution device, wherein the out-degree represents the number of edges connected with one vertex.

In this possible implementation manner, considering that the storage space of the training execution device is limited, when the number of vertices on the second relationship graph is large, the sample data of vertices with a large degree of emergence (meeting the second preset condition), that is, vertices that will be frequently used in the training process, may be preferentially sent to the training execution device. Sample data for vertices with a low degree of outing (not meeting the second preset condition), that is, vertices that are not frequently used, may be stored in the central device, where the central device is a CPU, the sample data may be stored in a disk or a memory corresponding to the CPU, and if the CPU is located on a server, the sample data may be stored in a hard disk or a memory of the server. When the vertex with the smaller emittance is used, the sample data corresponding to the vertex with the smaller emittance is called from the center device. The second preset condition may be that the data is sorted according to the degree of departure of each vertex in the second relation graph, and then, in combination with the storage space of the training execution device, sample data of the vertex sorted before is preferentially sent to the training execution device until the available storage space of the training execution device reaches the upper limit. The second preset condition may also be a preset threshold, and the setting of the second preset condition may be various, which is not specifically limited in this application. The second predetermined condition may be the same as the first predetermined condition or different from the first predetermined condition, and usually the second predetermined condition is higher than the first predetermined condition. In the possible implementation mode, the cache space of the training execution device is used in a priority-out mode, so that the cache hit rate can be effectively improved, and the time consumption for loading sample data is reduced.

In a possible implementation manner of the first aspect, the method further includes: receiving information which is sent by a training execution device and used for indicating available cache space; and determining the vertex with the degree meeting a second preset condition according to the information for indicating the available cache space.

In this possible implementation manner, the training execution device may perform a round of test first, determine the size of the available cache space that can be used for caching the sample data through the test, and then send the size of the available cache space to the central device, so that the central device may determine the vertex whose degree of occurrence satisfies the second preset condition.

A second aspect of the present application provides a method of graph neural network training, comprising: receiving information of a second relation graph obtained from a first relation graph, wherein the first relation graph comprises a plurality of vertexes and a plurality of edges, each edge is used for connecting two vertexes with a direct association relation, the plurality of vertexes comprise training vertexes used for training a neural network of the graph, and the second relation graph comprises neighbor vertexes with target association relations with the training vertexes; calling sample data corresponding to the vertex in the second relational graph according to the information of the second relational graph; the neural network of the graph is trained according to the sample data.

The method of the second aspect may be applied to a training execution device of a graph neural network training system, and as can be seen from the introduction of the first aspect to the graph neural network training system, the graph neural network is trained on the training execution device, the central device may determine N second relationship graphs from the first relationship graph, each training execution device corresponds to one second relationship graph, and after each training execution device trains the graph neural network, a target graph neural network (which may also be described as a target graph neural network model) for inference may be obtained.

The target association relationship of the vertices in the second relationship graph received by the training performing apparatus means that the vertices in the second relationship graph have a direct association relationship (one-hop relationship), or have both a direct association relationship and an indirect association relationship (multi-hop relationship). The multi-hop in the present application includes two hops, or more than two hops.

As can be seen from the second aspect, the second relational graph includes the training vertices and the corresponding neighbor vertices, so that a process of frequently crossing the training execution device to other training execution devices to read sample data of the related neighbor vertices is not required during the training of the graph neural network, thereby reducing the network overhead of the crossing training execution device and improving the training efficiency of the graph neural network.

In one possible implementation manner of the second aspect, the method further includes: receiving sample data corresponding to a vertex with the out-degree meeting a second preset condition in the second relational graph; caching sample data corresponding to the vertex with the degree meeting a second preset condition in local; the steps are as follows: according to the second relational graph, calling sample data corresponding to the vertex in the second relational graph, wherein the calling sample data comprises the following steps: dispatching sample data corresponding to the vertex with the degree meeting a second preset condition from the local cache; and dispatching sample data corresponding to the vertex with the degree not meeting the second preset condition from the central device.

In this possible implementation manner, considering that the storage space of the training execution device is limited, when the number of vertices on the second relationship graph is large, the sample data of vertices with a large degree of emergence (meeting the second preset condition), that is, vertices that will be frequently used in the training process, may be preferentially sent to the training execution device. Sample data for vertices with a low degree of outing (not meeting the second preset condition), that is, vertices that are not frequently used, may be stored in the central device, where the central device is a CPU, the sample data may be stored in a disk or a memory corresponding to the CPU, and if the CPU is located on a server, the sample data may be stored in a hard disk or a memory of the server. When the vertex with the smaller emittance is used, the sample data corresponding to the vertex with the smaller emittance is called from the center device. The second preset condition may be that the data is sorted according to the degree of departure of each vertex in the second relation graph, and then, in combination with the storage space of the training execution device, sample data of the vertex sorted before is preferentially sent to the training execution device until the available storage space of the training execution device reaches the upper limit. The second preset condition may also be a preset threshold, and the setting of the second preset condition may be various, which is not specifically limited in this application. In the possible implementation mode, the cache space of the training execution device is used in a priority-out mode, so that the cache hit rate can be effectively improved, and the time consumption for loading sample data is reduced.

In one possible implementation manner of the second aspect, the method further includes: performing a round of testing on the graph neural network to determine an available cache space for storing sample data; and sending information for indicating the available cache space to the central device, wherein the information for indicating the available cache space is used for indicating the central device to send sample data corresponding to the vertex with the output degree meeting the second preset condition.

A third aspect of the present application provides an apparatus for neural network training, which has the function of implementing the method according to the first aspect or any one of the possible implementations of the first aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, such as: the device comprises an acquisition unit, a processing unit and a sending unit.

A fourth aspect of the present application provides an apparatus for neural network training, which has the function of implementing the method according to the second aspect or any one of the possible implementations of the second aspect. The function can be realized by hardware, and can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, such as: the device comprises a receiving unit, a first processing unit and a second processing unit.

A fifth aspect of the present application provides a computer device comprising at least one processor, a memory, an input/output (I/O) interface, and computer executable instructions stored in the memory and executable on the processor, wherein when the computer executable instructions are executed by the processor, the processor performs the method according to the first aspect or any one of the possible implementation manners of the first aspect.

A sixth aspect of the present application provides a computer device comprising at least one processor, a memory, an input/output (I/O) interface, and computer executable instructions stored in the memory and executable on the processor, wherein when the computer executable instructions are executed by the processor, the processor performs the method according to any one of the possible implementation manners of the second aspect or the second aspect.

A seventh aspect of the present application provides a computer-readable storage medium storing one or more computer-executable instructions that, when executed by a processor, perform a method according to the first aspect or any one of the possible implementations of the first aspect.

An eighth aspect of the present application provides a computer-readable storage medium storing one or more computer-executable instructions that, when executed by a processor, perform a method according to any one of the possible implementations of the second aspect or the second aspect as described above.

A ninth aspect of the present application provides a computer program product storing one or more computer executable instructions that, when executed by a processor, perform a method as described in the first aspect or any one of the possible implementations of the first aspect.

A tenth aspect of the present application provides a computer program product storing one or more computer executable instructions that, when executed by a processor, perform a method as set forth in any one of the possible implementations of the second aspect or the second aspect.

An eleventh aspect of the present application provides a chip system, which includes at least one processor, where the at least one processor is configured to implement the functions recited in the first aspect or any one of the possible implementations of the first aspect. In one possible design, the system-on-chip may further include a memory, the memory being configured to store program instructions and data necessary for the apparatus for neural network training. The chip system may be constituted by a chip, or may include a chip and other discrete devices.

A twelfth aspect of the present application provides a chip system, where the chip system includes at least one processor, and the at least one processor is configured to implement the functions recited in the second aspect or any one of the possible implementations of the second aspect. In one possible design, the system-on-chip may further include a memory, the memory being configured to store program instructions and data necessary for the apparatus for neural network training. The chip system may be constituted by a chip, or may include a chip and other discrete devices.

A thirteenth aspect of the present application provides a distributed system, which includes a central device and a plurality of training execution devices, wherein the central device is configured to execute the method according to any one of the above-mentioned first aspect or first possible implementation manner, and any one of the plurality of training execution devices is configured to execute the method according to any one of the above-mentioned second aspect or second possible implementation manner.

In the embodiment of the application, when the central device divides the first relational graph, the number of vertexes to be divided in each second relational graph is considered to be equal as much as possible, and the neighbor vertexes of the training vertexes are divided into the same second relational graph as much as possible, so that the calculation balance in each training execution device is achieved, the process that the training execution device needs to be frequently crossed to other training execution devices to read sample data of the related neighbor vertexes in the training process of the neural network of the graph is reduced, the network overhead of the cross-training execution device is reduced, and the training efficiency of the neural network of the graph is improved.

Drawings

FIG. 1 is a schematic diagram of an embodiment of a distributed system provided by an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a server provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of an embodiment of a method for neural network training provided by an embodiment of the present application;

FIG. 4 is an exemplary diagram of a first relationship diagram provided by an embodiment of the application;

FIG. 5A is an exemplary diagram of a second relationship diagram provided by an embodiment of the application;

FIG. 5B is another exemplary diagram of a second relationship diagram provided by an embodiment of the application;

FIG. 6 is a schematic diagram illustrating an example scenario of graph partitioning provided by an embodiment of the present application;

FIG. 7A is a graph comparing experimental results provided by the examples of the present application;

FIG. 7B is a graph comparing the results of another experiment provided by the examples of the present application;

FIG. 8 is a graph comparing the results of another experiment provided by the examples of the present application;

FIG. 9 is a diagram illustrating an example of a scenario for neural network training provided by an embodiment of the present application;

FIG. 10 is a graph comparing the results of another experiment provided by the examples of the present application;

FIG. 11 is a graph comparing the results of another experiment provided by the examples of the present application;

FIG. 12 is a schematic diagram of an embodiment of an apparatus for neural network training provided by an embodiment of the present application;

FIG. 13 is a schematic diagram of an embodiment of an apparatus for neural network training provided by an embodiment of the present application;

fig. 14 is a schematic diagram of an embodiment of a computer device provided in an embodiment of the present application.

Detailed Description

Embodiments of the present application will now be described with reference to the accompanying drawings, and it is to be understood that the described embodiments are merely illustrative of some, but not all, embodiments of the present application. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The embodiment of the application provides a method for training a graph neural network, which can improve the training efficiency of the graph neural network. The embodiment of the application also provides a corresponding device, a corresponding system, a corresponding computer readable storage medium, a corresponding computer program product and the like. The following are detailed below.

With the development of Artificial Intelligence (AI), deep neural networks have been effectively applied in various aspects such as image processing, voice recognition, or language translation. However, real-world data is often organized in graphs, where entity relationships imply strong causal relationships, and these graphs with causal relationships may be collectively referred to as relationship graphs, such as: social network diagrams, user commodity relationship diagrams, knowledge maps, protein structure diagrams, and the like. Relationship graphs typically include a plurality of vertices and a plurality of edges, each edge connecting two vertices, two vertices connected by the same edge having a direct associative relationship.

Because the graphs are irregular, each graph has many vertices (data samples), where each vertex has a different number of adjacent vertices, resulting in some important operations (e.g., convolution) that are easily computed on the image, but not directly applicable to the graphs. In addition, one core of the learning algorithm of the existing deep neural network is that data samples are independent from one another. However, for a relational graph, each vertex in the graph will have edges associated with other vertices in the graph, and the information of these edges can be used to capture the interdependencies between vertices representing different entities. Such as: in electronic commerce, the entities represented by the vertices can be users and goods, so that the goods liked by the users or the users to which the goods are applicable can be inferred by the edges.

To be suitable for relational maps, Graph Neural Networks (GNNs) have emerged which exploit the ideas of convolutional networks, cyclic networks, and depth autocodes, the GNNs being used to process graph data, i.e. to process relational maps. GNNs are used in a wide variety of fields, such as: graph-based recommendation systems (e-commerce or social relationships, etc.). The GNNs are also applicable to others, such as: each sensor on the link is a vertex in the graph, the edge is represented by the distance between pairs of vertices above a threshold, and each vertex contains a time series as a feature. The method aims to predict the average speed of a road in a time interval, and can be applied to taxi demand prediction, so that the method is beneficial to the effective utilization of resources and energy conservation of an intelligent traffic system. The GNNs can also be applied in chemistry, such as: GNNs were used to study the structure of the molecule. In the graph structure, the atoms are vertices and the bonds are edges in the graph. Graph classification and graph generation are the main tasks of molecular graphs, which can be used to learn molecular fingerprints, predict molecular properties, infer protein structures, synthesize compounds.

GNNs have also been explored to be applicable to other areas, such as: program verification, program reasoning, social impact prediction, adversarial attack prevention, electronic health record modeling, brain networks, event detection or combinatorial optimization, and the like.

The developer can construct an initial GNN according to the application requirements, then train the initial GNN by adopting a relational graph corresponding to the corresponding requirements, obtain a target GNN suitable for the requirements, and then correspondingly reason the target GNN.

The neural network training system for the process of training GNNs may be a distributed system or a parallel system, which may be an architecture as shown in fig. 1, including a central device and a plurality of training performing devices, such as: training executive device 1 to training executive device N, N is an integer greater than 1. Each training execution device is loaded with an initial GNN, and the central device or a corresponding storage device (e.g., a disk or a memory corresponding to a CPU in a server) stores a first relation graph for training the initial GNN. In this embodiment of the application, the central apparatus may determine N different second relationship graphs according to the first relationship graph, and then send the N different second relationship graphs to the N training execution apparatuses, so that the N training execution apparatuses may train the initial GNN on the training execution apparatus using the respective second relationship graphs, and after each training execution apparatus has trained the respective initial GNN, the GNNs trained by the N training execution apparatuses may be fused by one training execution apparatus or by the central apparatus, so as to obtain the target GNN.

The central device may be a separate physical machine, a Virtual Machine (VM), a container, or a Central Processing Unit (CPU), and the training execution device may be a separate physical machine, VM, container, Graphics Processing Unit (GPU), field-programmable gate array (FPGA), or a dedicated chip.

As shown in fig. 2, taking the neural network training system as a server as an example, the central device may be a CPU of a hardware portion of the server, and the training executing device may be a GPU of the hardware portion of the server, such as GPU1 to GPU (n), and of course, the GPU portion may also be implemented by an FPGA or a dedicated chip. The part realized by software in the embodiment of the application can comprise a graph partition and a graph cache, wherein the graph partition can be realized by a CPU, the graph cache can be realized by a GPU, and the GPU part can also be realized by an FPGA or a special chip. The first relational graph can be stored in a memory or a disk of the server, when graph partitioning is needed, the CPU obtains the first relational graph from the memory or the disk, then determines N second relational graphs according to the first relational graph, then respectively sends information of the N second relational graphs to N GPUs, sample data corresponding to each vertex in the first relational graph can also be stored in the memory or the disk, and sample data corresponding to part of the vertices in the corresponding second relational graph can also be sent to the GPU according to a storage space of the GPU.

If the neural network training system comprises a plurality of servers, each server comprises a plurality of GPUs, distributed training can be performed among the servers, and the GPUs in each server can perform parallel training.

In the embodiment of the present application, the process of the graph neural network training may include the following two parts: firstly, dividing a first relation graph into a second relation graph by a central device; secondly, caching the sample data in the training execution device according to a strategy of priority of out degree. The following are described separately.

Firstly, the central device divides the first relational graph into a second relational graph.

As shown in fig. 3, an embodiment of a method for neural network training provided in the embodiment of the present application includes:

101. the central device obtains a first relationship graph for graph neural network training.

The first relationship graph includes a plurality of vertices and a plurality of edges, wherein each edge is used to connect two vertices, two vertices connected by the same edge and having a direct association relationship, that is, each edge is used to connect two vertices having a direct association relationship.

The structure of the first relationship diagram can be understood with reference to fig. 4. Vertices 1 through 18 in FIG. 4 are examples only, and the number of vertices in an actual graph typically ranges from thousands or even hundreds of millions of vertices. The number of edges is also typically thousands or even billions.

Most vertexes in the first relational graph of the present application all have a relationship, some are direct relationships (one-hop relationships) connected through the same edge, and some are indirect relationships (multi-hop relationships) relayed through commonly connected vertexes, such as: vertex 1 and vertex 5 create an indirect relationship directly through either vertex 3 or vertex 6.

One-hop relations are directly connected through one edge, and a multi-hop relation refers to a relation that the relation can be related through at least two edges through the transfer of other vertexes.

102. The central device determines N different second relational graphs according to the first relational graph.

The second relational graph is a subgraph of the first relational graph, N is the number of training execution devices, and N is an integer greater than 1; the difference value of the number of training vertexes included in any two second relational graphs is smaller than a preset threshold, and the second relational graphs include neighbor vertexes of the training vertexes.

The training vertices refer to the vertices participating in the neural network training of the graph.

In the embodiment of the present application, the vertices in the second relationship graph have a direct association relationship (one-hop relationship), or have both a direct association relationship and an indirect association relationship (multi-hop relationship). The multi-hop in the present application includes two hops, or more than two hops.

The second relation graph is divided from the first relation graph, the central device divides N second relation graphs from the first relation graph according to the number of the training execution devices, and the vertexes in each second relation graph may have overlapped parts.

Taking the example of dividing the first relationship diagram into two second relationship diagrams, the two second relationship diagrams can be understood by referring to fig. 5A and 5B in combination with the above example of fig. 4.

As shown in fig. 5A, 10 vertices including vertex 1 to vertex 8, and vertex 11 and vertex 12, and edges between the 10 vertices are included. As shown in fig. 5B, 10 vertices from vertex 7 to vertex 18, and edges between the 10 vertices are included. Three types of vertices may be included in fig. 5A and 5B: one is training vertices used for training, such as vertices 1 through 3, vertices 5, and vertices 6 in FIG. 5A, such as

vertices

8, 9, 13, 14, 16 through 18 in FIG. 5B. One is verification vertices for verification, such as vertex 4, vertex 7, and vertex 12 in FIG. 5A, such as vertex 10, vertex 11, and vertex 15 in FIG. 5B. One is redundant vertices, such as vertex 8 and vertex 11 in FIG. 5A, and vertex 7 and vertex 12 in FIG. 5B. As can be seen from fig. 5A and 5B, there are individual duplicate vertices in the two second relationship graphs, and these redundant vertices in different second relationship graphs can be understood as mirror vertices, which can avoid frequent cross-partition accesses.

In the embodiment of the application, the number of the training vertexes in each second relational graph is basically equivalent, so that the calculation balance can be ensured, in addition, frequent visits across the training execution devices can be avoided in a redundant vertex mode, and the training efficiency can be improved.

If only one-hop relations are considered when dividing the second relation graphs, the vertexes in each second relation graph have direct association relations, and if multi-hop relations are considered, the vertexes in the second relation graphs also have indirect association relations besides the direct association relations. "one hop" refers to a direct connection, e.g., vertex 1 and vertex 3 are directly connected, which is a one hop relationship. "Multi-hop" refers to indirect connections, such as vertex 1 to vertex 5, which need to be connected through vertex 3 or vertex 6, and which need to go through two hops to go from vertex 1 to vertex 5, and this kind of relationship that needs to go through two or more hops is called a multi-hop relationship.

Alternatively, considering that the number of vertices divided into each partition is large and the storage space of the central device may be limited, in this case, vertices with higher out degrees than the first preset condition may be preferentially placed in the second relationship diagram, so that the out degrees of the vertices in the second relationship diagram satisfy the first preset condition, and the out degrees represent the number of edges connected by one vertex. Otherwise, vertices with out degrees less than the first predetermined condition may be discarded. The first preset condition may be pre-configured or dynamically generated, and may be a specific value, such as: the first predetermined condition is that the out-degree is greater than 50, although this is by way of example only.

103. The central device transmits the information of the N second relational graphs to the N training execution devices, and correspondingly, the training execution devices receive the information of the second relational graphs.

The information of the second relation graph may be a summary or metadata of the second relation graph, and the information of the second relation graph includes the identification of the vertices in the second relation graph and the relation between the vertices.

The N training execution devices are in one-to-one correspondence with the N second relational graphs, and the N second relational graphs are respectively used for neural network training of the corresponding training execution device graphs.

Optionally, in step 102, the second relation graph may be determined by performing vertex-by-vertex polling on the training vertices in the first relation graph, and in the process of dividing the training vertices by vertex-by-vertex, the second relation graph is not yet formed, and a corresponding partition may be mapped to each training execution device in the storage space of the central device. In the training vertex dividing process, firstly, dividing the training vertices into corresponding partitions, and after all the training vertices are divided, forming a second relational graph according to the training vertices in the partitions and the relations of the training vertices in the partitions in the first relational graph.

The process of determining the second relationship diagram according to the first relationship diagram may include: and dividing the target vertex and a plurality of neighbor vertices of the target vertex into a partition with the highest evaluation score of the target vertex according to the evaluation score of the target vertex corresponding to each partition in the N partitions, wherein the target vertex is a training vertex in a first relational graph, and the evaluation score is used for indicating the correlation degree of the target vertex and a vertex which is allocated in each partition before the target vertex is allocated, each partition in the N partitions corresponds to one training execution device, and after each training vertex in the first relational graph is allocated, the vertex in each partition is included in a second relational graph of the training execution device of the corresponding partition.

The process of determining the second relationship graph from the first relationship graph can also be described as: determining a plurality of neighbor vertices of a target vertex in a first relational graph; determining an evaluation score of the target vertex corresponding to each partition in the N partitions according to the plurality of neighbor vertices and the assigned vertex in the N partitions, wherein the evaluation score is used for indicating the correlation degree of the target vertex and the assigned vertex in each partition before the target vertex is assigned; and dividing the target vertex and a plurality of neighbor vertices into the partitions with the highest evaluation scores according to the evaluation scores of the target vertex corresponding to each partition in the N partitions, wherein each partition in the N partitions corresponds to one training execution device, and after each training vertex in the first relation graph is allocated, the vertex in each partition is included in the second relation graph of the training execution device of the corresponding partition.

In the embodiment of the present application, the target vertex may be any one of the training vertices in the first relation graph. The set of neighbor vertices may be referred to as a set of neighbor vertices, that is, the set of neighbor vertices includes neighbor vertices that are all target vertices. Multiple neighbor vertices of the target vertex may be obtained according to the hop count information, which indicates the number of edges in the path from the target vertex to the corresponding neighbor vertices. Then, the coincidence number of the vertex distributed by the neighbor vertex set and the first partition in the N partitions can be determined, the coincidence number of the first partition is used for indicating the coincidence number of the vertex set distributed by the neighbor and the vertex distributed by the first partition, the first partition is any one of the N partitions, and the coincidence number is positively correlated with the correlation degree; and determining the evaluation score of each partition in the N partitions corresponding to the target vertex according to the coincidence number of the neighbor vertex set and the distributed vertices in the first partition.

That is to say: the evaluation score of the first partition of the target vertex is positively correlated with the coincidence number of the first partition, the coincidence number of the first partition is used for indicating the coincidence quantity of the neighbor vertex set and the distributed vertexes in the first partition, and the first partition is any one of the N partitions.

If the definition of the neighbor vertex is a one-hop relationship, only the vertex with the direct association relationship connected by one edge can belong to the neighbor vertex set, and if the definition of the neighbor vertex is a multi-hop relationship, except the vertex with the direct association relationship, the following steps are carried out: vertices that are reached by "two hops" or "three hops" from a target vertex can all be attributed to multiple neighbor vertices of the target vertex. Such as: in the social relationship, aiming at a target user, a one-hop relationship can find friends of the target user, and friends of the target user can be found through a two-hop relationship. Hop count information is a numerical description of the neighbor relationships. If the hop count information is 1, it indicates that the neighbor vertex set includes vertices having a direct association relationship with the target vertex, and if the hop count information is 2, it indicates that the neighbor vertex set includes vertices connected by vertices having a direct association relationship in addition to the vertices having a direct association relationship with the target vertex, and similarly, if the hop count information is 3, it indicates that the neighbor vertex set may further include vertices associated with the target vertex by three hops, and so on. When there are a plurality of neighbor vertices of one target vertex, the hop count information indicates the hop count of the farthest vertex, that is, the maximum number of edges included in a single path formed from the target vertex to each neighbor vertex.

As shown in fig. 4, if the target vertex is vertex 3 and the hop count information L is 1, the set of neighbor vertices that can be specified includes { vertex 1, vertex 2, vertex 4, and vertex 5}, and if the hop count information L is 2, the set of neighbor vertices that can be specified includes { vertex 1, vertex 2, vertex 4, vertex 5, vertex 6, vertex 7, and vertex 12 }.

The relevance represents the proportion of the adjacent vertex of which the assigned vertex is the target vertex in each partition, and the evaluation score is a numerical index which reflects the closeness degree of the target vertex and the assigned vertex in the partition through a specific numerical value, namely the proportion of the adjacent vertex of the target vertex contained in the partition is high or low. The higher the evaluation score is, the higher the weight of the neighbor vertex containing the target vertex among the assigned vertices in the partition is, the more suitable the target vertex is for the partition.

As shown in fig. 6, if there are two partitions, the first partition has vertex 1 and vertex 2 assigned thereto, the second partition has vertex 7 assigned thereto, and the hop count information L is 2, the set of neighbor vertices that can be specified includes { vertex 1, vertex 2, vertex 4, vertex 5, vertex 6, vertex 7, and vertex 12 }. Thus, the neighbor vertex set has two vertices coinciding with the assigned vertices in the first partition and only one vertex coinciding with the assigned vertex in the second partition, it can be seen that the number of the vertices coinciding with the first partition is higher than that of the second partition, and it also indicates that the neighbor vertex set has a higher correlation with the vertices in the first partition than with the second partition.

The evaluation score is a numerical index, and reflects the closeness degree of the target vertex and the assigned vertex in the partition through a specific value, namely the specific weight of the neighbor vertex of the target vertex contained in the partition. The higher the evaluation score is, the higher the weight of the neighbor vertex containing the target vertex among the assigned vertices in the partition is, the more suitable the target vertex is for the partition. The assignment may not need to be repeated for the vertex in the set of neighbor vertices that has been assigned the highest scoring score. According to the possible implementation mode, the vertexes with high correlation degrees are distributed to the same partition and then belong to the same second relation graph, so that the network overhead of scheduling the data of the correlated vertexes across the training execution device in the training process can be effectively avoided.

The process of determining the evaluation score may also consider balanced distribution of the vertices synchronously, and an equilibrium ratio is used in the process of determining the evaluation score, where the equilibrium ratio is used to indicate a probability that the target vertex is divided into the first partition, the equilibrium ratio is a ratio of a first difference value to the number of vertices of the first partition after adding multiple neighbor vertices, and the first difference value is a difference value between a preconfigured upper limit value of the number of vertices of the first partition and the number of vertices already distributed in the first partition.

Thus, the evaluation score of the first partition corresponding to the target vertex can be determined according to the product of the superposition number of the plurality of neighbor vertices and the distributed vertices in the first partition and the balance ratio corresponding to the first partition.

That is to say: the evaluation score of the first partition is the product of the coincidence number of the first partition and the balance ratio of the first partition, the balance ratio is used for indicating the probability that the target vertex is divided into the first partition, the balance ratio is the ratio of a first difference value and the number of vertexes of the first partition after the plurality of neighbor vertexes are added, and the first difference value is the difference value between the preset upper limit value of the number of vertexes of the first partition and the number of vertexes distributed in the first partition.

The above evaluation score can be expressed by the following formula:

wherein, | TV_iAnd | represents the set of already assigned vertices in the ith partition.

Representing the target vertex V_tI.e. a plurality of neighbor vertices, the target vertex is a training vertex.

The number of coincidences of a plurality of neighbor vertices representing the target vertex with the assigned vertex in the ith partition.

Representing the equilibrium ratio, PV_iFor controlling memory balancing, the total number of vertices that have been allocated for the ith partition, including the joined neighbor vertices, is represented. TV (television)_avgIs the desired number of vertices for each partition. To achieve computational equalization, the present application may adapt the TV_avgIs set to

Where N represents the number of partitions and | TV | represents the total number of training vertices in the first relationship graph, this ensures that each partition can obtain substantially the same number of training vertices. This ensures that the computations for each partition are balanced.

In the above process of the embodiment of the present application, if the first relationship graph is represented by G, the total number of vertices in the first relationship graph is represented by TV, and the first relationship graph G is taken as input, given the value of the hop count L, the value of TV, and the value of the partition N, starting from the fact that each partition is an empty set, a set of neighbor vertices is determined according to the above-mentioned manner of determining the partition of the target vertex, then the evaluation score of each partition is calculated, then the target vertex and the corresponding set of neighbor vertices are divided into the partition with the highest evaluation score until all vertices are divided into the corresponding partitions, and then a second relationship graph, that is, a second relationship graph { G } shown in fig. 5A and 5B is formed according to the vertices in each partition and the relationship of the vertices in the first relationship graph₁、G₂，…，G_N}。

104. And the training execution device calls sample data corresponding to the vertex in the second relational graph according to the information of the second relational graph.

The sample data relates to different types of application data according to the application requirements of the graph neural network, such as: in electronic commerce, the sample data may be data of consumers and data of commodities, in social relations, the sample data may be information of users having a friendship, and in chemistry, the sample data may be molecules or atoms.

The sample data may be stored in the memory or hard disk of the center device, or may be cached in the cache of the training execution device.

105. The training execution device trains the neural network of the graph according to the sample data.

Experiments are respectively carried out on a single GPU and a plurality of GPUs.

As shown in fig. 7A and 7B, which are single accelerator acceleration effects, in a training period (epoch), compared with the training process of the existing Deep Graph Library (DGL), the training speed of the scheme provided in this embodiment of the present invention achieves 1.6-4.8 times of training efficiency improvement on datasets 1 to 6 (these 6 datasets may be reddit, wiki-talk, livejournal, lj-link, lj-large, and enwiki in sequence).

FIG. 8 is the speed-up ratio for multiple GPUs. Compared with the existing DGL, the scheme of the application has higher throughput and training speed-up ratio. The application can achieve a super-linear speed-up ratio due to the introduction of a cache mechanism, for example, on a data set (en-wiki), the speed-up ratio of 4.9 times that of a single accelerator can be obtained under the condition of 4 accelerators. Fig. 8 is an example of a training set, and the overall trend of other training sets is the same as that of fig. 8, and the specific values are slightly different.

Secondly, caching the sample data in the training execution device according to a strategy of priority of out degree.

The cache of the training execution device is usually limited, especially in the case that the training execution device is a GPU, an FPGA, or a dedicated chip, at this time, the training execution device usually cannot store the sample data corresponding to each vertex in the second relationship graph. In this case, the center device transmits, to the training execution device, sample data corresponding to vertices in the second relational graph whose out-degree satisfies the second preset condition, where the out-degree indicates the number of edges connected to one vertex.

Receiving sample data corresponding to a vertex with the out-degree meeting a preset condition in the second relation graph by the training executing device; and locally caching the sample data corresponding to the vertex with the out-degree meeting the preset condition.

Optionally, before sending the sample data, the training performing apparatus may perform a round of testing to determine an available cache space for storing the sample data; and sending information for indicating available cache space to the central device, wherein the information for indicating the available cache space is used for indicating the central device to send sample data corresponding to the vertex with the degree meeting a second preset condition.

Thus, when the sample data is called in the step 104, the sample data corresponding to the vertex whose out-degree meets the second preset condition may be scheduled from the local cache; and dispatching sample data corresponding to the vertex with the degree not meeting the second preset condition from the central device.

Considering that the storage space of the training executing apparatus is limited, when the number of vertices on the second relation graph is large, the sample data of vertices with a large degree of exitance (meeting the second preset condition), that is, vertices that will be frequently used in the training process, may be preferentially sent to the training vertices. The sample data for the vertex with a smaller out-degree (which does not satisfy the second preset condition), that is, the vertex which is not frequently used, may be stored in the center device, and when the vertex with the smaller out-degree is used, the sample data corresponding to the vertex with the smaller out-degree may be called from the center device. The second preset condition may be that the data is sorted according to the degree of departure of each vertex in the second relation graph, and then, in combination with the storage space of the training execution device, sample data of the vertex sorted before is preferentially sent to the training execution device until the available storage space of the training execution device reaches the upper limit. The second preset condition may also be a preset threshold, and the setting of the second preset condition may be various, which is not specifically limited in this application. The second preset condition in this application may be the same as the first preset condition, or may be different from the first preset condition.

This process can be understood with reference to fig. 9. As shown in fig. 9, if the sequence numbers of the vertices included in the second training diagram corresponding to a certain training execution apparatus are vertex 3 to vertex 408, according to the principle of priority of out-degree, sample data corresponding to the vertex whose out-degree satisfies the preset condition is cached in the memory of the training execution apparatus, such as: the F-3 and S-3 corresponding to the vertex 3 are sample data corresponding to the vertex 3. Similarly, the sample data for vertex 4, vertex 8, vertex 102, and vertex 408 are cached in the memory of the training execution apparatus. In training using a graph neural network, vertices are typically selected in batches (batch). As shown in fig. 9, vertex 3, vertex 5, vertex 8, …, vertex 102, and vertex 421 are selected, wherein the sample data corresponding to vertex 5 and vertex 421 are not cached in the training execution device, and need to be obtained from the central device, and then the GNN training is performed.

According to the method and the device, a cache mechanism of the training execution device is added on the basis of the graph partitioning, and a cache mode with a priority out is adopted, namely, sample data corresponding to frequently accessed vertexes is cached in a GPU memory. Therefore, the interactive expense caused by loading the sample data of each vertex by the central device and the training executing device can be reduced, and the time consumption of the graph neural network training is effectively reduced.

For ease of illustration, fig. 10 shows a set of experimental data in which pagoph represents the hit rate using the out-priority caching strategy of the present application. Optimal represents the hit rate of a theoretically Optimal caching strategy that is decided by analyzing access behavior afterwards. Random represents the hit rate of the Random cache strategy. AliGraph represents the hit rate of the cache strategy employed by AliGraph. As can be seen from fig. 10, the Hit rate (Cache Hit ratio) of the Cache policy of the present application is already almost close to the Hit rate of the theoretically most available Cache policy. Compared with a random strategy and a cache strategy of AliGraph, the hit rate is more than twice of that of AliGraph under the condition that 40% of vertexes of cache Data (Cached Data) are provided; the training performance is 1.4 times that of AliGragh.

Hit rate in this application refers to the probability that a cached vertex is selected for GNN training.

In addition, as can be seen from fig. 11, at the same cache Percentage (Cached Percentage), in one iteration period (Epoch Time) in seconds(s), the cache policy scheme of the present application can effectively reduce the Time overhead during GNN training compared to the cache policy of AliGraph.

The distributed system or the parallel system of the present application and the method for training the neural network are described above, and the apparatus for training the neural network of the present application is described below with reference to the accompanying drawings.

As shown in fig. 12, an embodiment of the apparatus 30 for neural network training provided in the embodiment of the present application includes:

an obtaining unit 301, configured to obtain a first relation graph for graph neural network training, where the first relation graph includes multiple vertices and multiple edges, each edge is used to connect two vertices, and the multiple vertices include a training vertex for training the graph neural network.

A processing unit 302, configured to determine N different second relationship graphs according to the first relationship graph acquired by the acquiring unit 301, where the second relationship graph is a sub-graph of the first relationship graph, N is the number of training execution devices, and N is an integer greater than 1; the difference value of the number of training vertexes included in any two second relational graphs is smaller than a preset threshold, and the second relational graphs include neighbor vertexes of the training vertexes.

A sending unit 303, configured to send the N second relationship graphs determined by the processing unit 302 to the N training execution apparatuses, where the N training execution apparatuses are in one-to-one correspondence with the N second relationship graphs, and the N second relationship graphs are respectively used for the training execution apparatuses training graph neural networks corresponding to the N second relationship graphs.

In the embodiment of the application, when the first relational graph is divided, the number of vertexes to be divided in each second relational graph is considered to be equal as much as possible, and the neighbor vertexes of the training vertexes are divided into the same second relational graph as much as possible, so that the calculation balance in each training execution device is achieved, the process that the training execution device needs to be frequently crossed to other training execution devices to read sample data of the related neighbor vertexes in the training process of the graph neural network is also reduced, the network overhead of the cross-training execution device is reduced, and the training efficiency of the graph neural network is improved.

Optionally, the processing unit 302 is configured to divide the target vertex and a plurality of neighbor vertices of the target vertex into a partition with a highest evaluation score of the target vertex according to the evaluation score of the target vertex corresponding to each partition of the N partitions, where the target vertex is a training vertex in a first relation graph, and the evaluation score is used to indicate a degree of correlation between the target vertex and an assigned vertex in each partition before the target vertex is assigned, where each partition of the N partitions corresponds to one training execution device, and after each training vertex in the first relation graph is assigned, a vertex in each partition is included in a second relation graph of the training execution device of the corresponding partition.

Optionally, the processing unit 302 is configured to obtain a plurality of neighbor vertices of the target vertex according to the hop count information, where the hop count information indicates a maximum number of edges in a path from the target vertex to each of the plurality of neighbor vertices.

Optionally, the evaluation score of the target vertex in the first partition is positively correlated with the coincidence number of the first partition, the coincidence number of the first partition is used to indicate the number of the multiple neighbor vertices coinciding with the assigned vertex in the first partition, and the first partition is any one of the N partitions.

Optionally, the evaluation score of the first partition is a product of a coincidence number of the first partition and an equilibrium ratio of the first partition, where the equilibrium ratio is used to indicate a probability that the target vertex is partitioned into the first partition, the equilibrium ratio is a ratio of a first difference value and a number of vertices of the first partition after adding multiple neighbor vertices, and the first difference value is a difference value between a preconfigured top point number upper limit value of the first partition and a number of vertices already allocated in the first partition.

Optionally, the out-degree of the vertex in the N second relational graphs satisfies a first preset condition, and the out-degree represents the number of edges connected by one vertex.

Optionally, the sending unit 303 is further configured to send, to the training execution apparatus, sample data corresponding to a vertex whose out-degree satisfies a second preset condition in the second relation graph, where the out-degree represents the number of edges connected to one vertex.

Optionally, the obtaining unit 301 is further configured to receive information, sent by the training performing apparatus, for indicating an available buffer space.

The processing unit 302 is configured to determine, according to the information indicating the available cache space, a vertex whose degree meets a second preset condition.

The device 30 for neural network training in the above-described diagram can be understood by referring to the corresponding description of the foregoing method embodiment, and will not be described repeatedly herein.

Fig. 13 is a schematic diagram of an embodiment of an apparatus for neural network training provided in an embodiment of the present application.

As shown in fig. 13, an embodiment of the apparatus 40 for neural network training provided in the embodiment of the present application includes:

a receiving unit 401, configured to receive information of a second relationship diagram obtained from a first relationship diagram, where the first relationship diagram includes a plurality of vertices and a plurality of edges, where each edge is used to connect two vertices having a direct association relationship, the plurality of vertices includes a training vertex used for training a neural network of the graph, and the second relationship diagram includes a neighbor vertex having a target association relationship with the training vertex.

A first processing unit 402, configured to invoke sample data corresponding to a vertex in the second relational graph according to the information of the second relational graph received by the receiving unit 401;

and the second processing unit 403 is configured to train the neural network according to the sample data graph called by the first processing unit 402.

In the embodiment of the application, the vertexes in the second relational graph are vertexes with a target incidence relation, and a process of frequently crossing the training execution device to other training execution devices to read sample data of related neighbor vertexes is not needed during the training of the graph neural network, so that the network overhead of the crossing training execution device is reduced, and the training efficiency of the graph neural network is improved.

Optionally, the receiving unit 401 is further configured to receive sample data corresponding to a vertex whose out-degree satisfies a second preset condition in the second relation graph.

The storage unit 404 is configured to locally cache sample data corresponding to a vertex whose degree meets a second preset condition.

A first processing unit 402, configured to schedule sample data corresponding to a vertex whose degree meets a second preset condition from a local cache; and dispatching sample data corresponding to the vertex with the degree not meeting the second preset condition from the central device.

Optionally, the second processing unit 403 is further configured to perform a round of test on the graph neural network to determine an available cache space for storing the sample data.

The apparatus 40 may further include a sending unit, configured to send, to the center apparatus, information indicating an available buffer space, where the information indicating the available buffer space is used to indicate that the center apparatus sends sample data corresponding to a vertex whose degree of departure meets a second preset condition.

The device 40 for neural network training in the above-described diagram can be understood by referring to the corresponding description of the foregoing method embodiment, and will not be described repeatedly herein.

Fig. 14 is a schematic diagram illustrating a possible logical structure of a computer device 50 according to an embodiment of the present application. The computer device 50 may be a central device or a training performing device. Or may be a distributed system including a central apparatus and a training performing apparatus. The computer device 50 includes: a processor 501, a communication interface 502, a memory 503, and a bus 504. The processor 501, the communication interface 502, and the memory 503 are connected to each other by a bus 504. In an embodiment of the application, the processor 501 is configured to control and manage the actions of the computer device 50, for example, the processor 501 is configured to perform steps 101, 102, 104 and 105 in the method embodiment of fig. 3, and the communication interface 502 is configured to support the computer device 50 for communication. A memory 503 for storing program codes and data of the computer device 50.

The processor 501 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the processor 501 may also be a general-purpose processor, a digital signal processor, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 501 may also be a combination of implementing computing functionality, e.g., comprising one or more microprocessors, a combination of digital signal processors and microprocessors, and so forth. The bus 504 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 14, but this is not intended to represent only one bus or type of bus.

In another embodiment of the present application, a computer-readable storage medium is further provided, in which computer-executable instructions are stored, and when the processor of the device executes the computer-executable instructions, the device performs the method for neural network training of the graphs of fig. 3 to 11.

In another embodiment of the present application, there is also provided a computer program product comprising computer executable instructions stored in a computer readable storage medium; when the computer executes the instructions, the device performs the method of neural network training of the graphs of fig. 3-11 described above.

In another embodiment of the present application, a chip system is further provided, which includes a processor for implementing the method for neural network training of the above-mentioned figures 3 to 11. In one possible design, the system-on-chip may further include a memory, storage, for storing program instructions and data necessary for the means for interprocess communication. The chip system may be constituted by a chip, or may include a chip and other discrete devices.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the embodiments of the present application, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of a unit is merely a logical division, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application, which essentially or partly contribute to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only a specific implementation of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application. Therefore, the protection scope of the embodiments of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of graph neural network training, comprising:

acquiring a first relational graph for graph neural network training, wherein the first relational graph comprises a plurality of vertexes and a plurality of edges, each edge is used for connecting two vertexes, and the plurality of vertexes comprise training vertexes for training the graph neural network;

determining N different second relational graphs according to the first relational graph, wherein the second relational graphs are subgraphs of the first relational graph, N is the number of training execution devices, and N is an integer greater than 1; the difference value of the number of the training vertexes included in each of any two second relational graphs is smaller than a preset threshold value, and the second relational graphs include neighbor vertexes of the training vertexes;

and sending information of N second relational graphs to the N training execution devices, wherein the N training execution devices correspond to the N second relational graphs one by one, and the N second relational graphs are respectively used for the corresponding training execution devices to train the graph neural network.

2. The method of claim 1, wherein determining N different second maps from the first map comprises:

and dividing the target vertex and a plurality of neighbor vertices of the target vertex into a partition with the highest evaluation score of the target vertex according to the evaluation score of the target vertex corresponding to each partition in the N partitions, wherein the target vertex is a training vertex in the first relation graph, the evaluation score is used for indicating the correlation degree of the target vertex and a vertex which is allocated in each partition before the target vertex is allocated, each partition in the N partitions corresponds to one training execution device, and after each training vertex in the first relation graph is allocated, the vertex in each partition is included in a second relation graph of the training execution device corresponding to the partition.

3. The method of claim 2, further comprising:

obtaining the plurality of neighbor vertices of the target vertex according to hop count information indicating a maximum number of edges in a path from the target vertex to each of the plurality of neighbor vertices.

4. The method according to claim 2 or 3,

the evaluation score of the target vertex in a first partition is positively correlated with the coincidence number of the first partition, the coincidence number of the first partition is used for indicating the coincidence quantity of the plurality of neighbor vertices and the distributed vertices in the first partition, and the first partition is any one of the N partitions.

5. The method of claim 4,

the evaluation score of the first partition is a product of a coincidence number of the first partition and an equilibrium ratio of the first partition, wherein the equilibrium ratio is used for indicating the probability that the target vertex is partitioned into the first partition, the equilibrium ratio is a ratio of a first difference value and the number of vertexes of the first partition after a plurality of neighbor vertexes are added, and the first difference value is a difference value between a preconfigured upper limit value of the number of vertexes of the first partition and the number of vertexes allocated in the first partition.

6. The method according to any one of claims 2 to 5, wherein the out-degrees of the vertices in the N second relational graphs satisfy a first preset condition, and the out-degrees represent the number of edges connected by one vertex.

7. The method according to any one of claims 1-6, further comprising:

and sending sample data corresponding to the vertex with the out-degree meeting a second preset condition in the second relation graph to the training executing device, wherein the out-degree represents the number of edges connected with one vertex.

8. The method of claim 7, further comprising:

receiving information which is sent by the training execution device and used for indicating available cache space;

and determining the vertex with the degree meeting the second preset condition according to the information for indicating the available cache space.

9. An apparatus for neural network training, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a first relational graph used for graph neural network training, the first relational graph comprises a plurality of vertexes and a plurality of edges, each edge is used for connecting two vertexes, and the plurality of vertexes comprise training vertexes used for training the graph neural network;

the processing unit is used for determining N different second relational graphs according to the first relational graph acquired by the acquisition unit, wherein the second relational graph is a subgraph of the first relational graph, N is the number of training execution devices, and N is an integer greater than 1; the difference value of the number of the training vertexes included in each of any two second relational graphs is smaller than a preset threshold value, and the second relational graphs include neighbor vertexes of the training vertexes;

and the sending unit is used for sending the N second relational graphs determined by the processing unit to the N training execution devices, wherein the N training execution devices are in one-to-one correspondence with the N second relational graphs, and the N second relational graphs are respectively used for the corresponding training execution devices to train the graph neural network.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-8.

11. A computing device comprising a processor and a computer readable storage medium storing a computer program;

the processor is coupled with the computer-readable storage medium, the computer program realizing the method of any of claims 1-8 when executed by the processor.

12. A chip system, comprising a processor, the processor being invoked for performing the method of any one of claims 1-8.

13. A graph neural network training system, comprising: the training device comprises a central device and a plurality of training executing devices;

the central device is configured to perform the method of any one of claims 1-8;

each of the plurality of training execution devices is used for training a neural network of a graph.