CN117218459B - Distributed node classification method and device - Google Patents

Distributed node classification method and device Download PDF

Info

Publication number
CN117218459B
CN117218459B CN202311483982.XA CN202311483982A CN117218459B CN 117218459 B CN117218459 B CN 117218459B CN 202311483982 A CN202311483982 A CN 202311483982A CN 117218459 B CN117218459 B CN 117218459B
Authority
CN
China
Prior art keywords
node
nodes
order vector
graph
file system
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311483982.XA
Other languages
Chinese (zh)
Other versions
CN117218459A (en
Inventor
朱仲书
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202311483982.XA priority Critical patent/CN117218459B/en
Publication of CN117218459A publication Critical patent/CN117218459A/en
Application granted granted Critical
Publication of CN117218459B publication Critical patent/CN117218459B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification relates to a distributed node classification method and a device, wherein the method is applied to any first working equipment in a plurality of working equipment of a distributed system, and comprises the following steps: obtaining first sub-graph data of full graph data, wherein the data in the first sub-graph can be privacy data, then performing K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all nodes, wherein the ith-round model processing comprises the steps of obtaining i-1-order vector representations of all nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all nodes, storing the i-order vector representations of all nodes into the distributed file system, obtaining the K-order vector representations of target nodes to be classified from the distributed file system, and inputting the K-order vector representations into a node classification model to obtain classification results of the target nodes.

Description

Distributed node classification method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of graph processing, and in particular, to a method and apparatus for classifying distributed nodes.
Background
In recent years, as a tool for expressing complex relationships between data in the real world, graph data has received increasing attention, and one important application is to model nodes in a graph using a graph neural network (Graph Neural Networks, GNNs), and then predict whether the nodes have a certain attribute, i.e., node classification, by using a trained model. The graph data may be private data, such as data generated by a user during a transaction.
As the scale of graph data continues to expand and graph models continue to complicate, performing node classification tasks on graph data at the billions or even billions of levels requires significant resources. Since GNNs are essentially computed layer by layer in the form of a paradigm of information transfer, traditional sample-by-sample computing modes introduce a large number of repeated computations during the model prediction phase, thereby limiting their scalability.
Disclosure of Invention
One or more embodiments of the present disclosure describe a method and apparatus for classifying distributed nodes, which aim to store intermediate results generated in a computing process into a distributed file system for reuse in combination with computing characteristics of a graph neural network, thereby reducing data redundancy and improving operation efficiency.
In a first aspect, a distributed node classification method is provided, which is applied to any first working device in a plurality of working devices in a distributed system, and includes:
acquiring first sub-graph data of the full graph data;
performing K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all nodes, wherein the ith-round model processing comprises the steps of acquiring i-1-order vector representations of all nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all nodes, and storing the i-order vector representations of all nodes into the distributed file system; the distributed file system is shared by the plurality of working devices;
and obtaining the K-order vector representation of the target node to be classified from the distributed file system, and inputting the K-order vector representation into a node classification model to obtain a classification result of the target node.
In one possible embodiment, the method further comprises:
for any node in the first subgraph, a graph sampling algorithm is used to determine N neighbor nodes from all the one-hop neighbor nodes.
In one possible implementation, obtaining an i-1 order vector representation of each node and its multiple neighbor nodes from a distributed file system includes:
the i-1 order vector representations of each node and N neighbor nodes thereof are obtained from the distributed file system.
In one possible implementation, the graph sampling algorithm includes: random sampling, uniform sampling, weighted sampling, and type sampling.
In one possible implementation manner, the first sub-graph data is obtained by dividing the full-graph data by a plurality of working devices in a distributed system through a graph segmentation algorithm.
In one possible implementation, the graph cut algorithm includes: METIS, distributed neighbor extension algorithm.
In one possible implementation, the node classification model is a multi-layer neural network MLP.
In a second aspect, there is provided a distributed node classification apparatus deployed on any first working device of a plurality of working devices of a distributed system, including:
an acquisition unit configured to acquire first sub-image data of the full-image data;
the vector calculation unit is configured to perform K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all the nodes, wherein the ith-round model processing comprises the steps of acquiring i-1-order vector representations of all the nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all the nodes, and storing the i-order vector representations of all the nodes into the distributed file system; the distributed file system is shared by the plurality of working devices;
the node classification unit is configured to acquire a K-order vector representation of a target node to be classified from the distributed file system, and input the K-order vector representation into a node classification model to obtain a classification result of the target node.
In one possible embodiment, the method further comprises:
and the neighbor sampling unit is configured to determine N neighbor nodes from all the one-hop neighbor nodes by using a graph sampling algorithm for any node in the first subgraph.
In a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect.
In a fourth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has executable code stored therein, and wherein the processor, when executing the executable code, implements the method of the first aspect.
According to the distributed node classification method and device provided by the embodiment of the specification, the method combines the calculation characteristics of the graph neural network, and intermediate results generated in the calculation process are stored in the distributed file system for repeated use, so that the data redundancy is reduced, the operation efficiency and expandability are improved, and the node classification task of the ultra-large scale graph data can be completed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only examples of the embodiments disclosed in the present specification, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 illustrates a schematic diagram of a conventional GNN-computing node vector representation, in accordance with one embodiment;
FIG. 2 illustrates an implementation scenario diagram of a distributed node classification method according to one embodiment;
FIG. 3 illustrates a flow diagram of a distributed node classification method according to one embodiment;
fig. 4 shows a schematic block diagram of a distributed node classification apparatus according to an embodiment.
Detailed Description
The following describes the scheme provided in the present specification with reference to the drawings.
In graph inference, a node classification task refers to predicting whether a node to be predicted has a certain specific type according to a known graph structure and node attributes. For example, given a natural person node, it is predicted whether there is a house under the natural person name.
The link prediction task relies on coding characterizations of nodes in the graph. Specifically, for link prediction, nodes in a graph are encoded through a graph neural network GNN to obtain an encoding representation of the nodes; in the prediction, a prediction network is utilized to obtain predictions about the connected edges according to the coded representation of two given nodes.
In the conventional coding process of the graph neural network GNN model on the nodes, M-hop neighbor nodes of each node in the graph are sampled in batches, and then M-layer GNN is used for respectively carrying out aggregation calculation on each node to obtain M-order vector representation of each node. In this process, intermediate results of the computation on any node u are discarded directly, leaving only the final M-order vector representation of node u. In this way, when calculating the M-order vector representation of the node v adjacent to the node u, the intermediate result in calculating the M-order vector representation of the node u cannot be used, but the calculation is performed again from scratch, which is inefficient and reduces the efficiency of the overall node classification task. Meanwhile, the M-hop neighbor nodes of each node in the batch sampling graph can also cause a great deal of data redundancy.
For example, FIG. 1 shows a schematic representation of a conventional GNN-computing node vector representation, in accordance with one embodiment. As shown in fig. 1, 2-order vector representations of node 1 and node 2 are calculated using a 2-layer GNN model, respectively. When the node 1 and the node 2 are respectively sampled in the 2-hop neighbor subgraphs, the node 3 is sampled into the 2-hop neighbor subgraphs of the node 1 and the node 2 at the same time. When the 2-level vector representation of node 1 is calculated by the 2-layer GNN model, an intermediate calculation result of the 1-level vector representation of node 3 is generated, but this intermediate calculation result is directly discarded after the calculation of the 2-level vector representation of node 1 is completed. When the 2-order vector representation of the node 2 is calculated subsequently, the 1-order vector representation of the node 3 is recalculated again, and the repeated calculation results in overall inefficiency. Thereby affecting the efficiency of subsequent node classification tasks.
To solve the above problem, fig. 2 shows a schematic diagram of an implementation scenario of a distributed node classification method according to one embodiment. In the example of fig. 2, the full graph data for performing node classification tasks is partitioned by a graph splitting algorithm into multiple sub-graphs, which are sent to multiple working devices in the distributed system, respectively. A K-layer graph neural network GNN is running on any one of the working devices. The layer 1 GNN is configured to calculate a 1-order vector representation of each node according to an initial vector representation (0-order vector representation) of each node and a plurality of neighboring nodes in the subgraph, and store the 1-order vector representation in a distributed file system, where data in the distributed file system may be shared by a plurality of working devices in the distributed system. Then, when the layer 2 GNN calculates the 2 nd order vector representation according to the 1 st order vector representations of each node and its multiple neighboring nodes in the subgraph, the layer 2 GNN does not need to start calculation from the head, but only needs to directly obtain from the distributed file system, and then stores the calculated 2 nd order vector representation into the distributed file system again, and so on. The neural network of the ith layer acquires i-1 order vector representations of each node and a plurality of neighbor nodes thereof from the distributed file system, calculates the i-order vector representations of each node, and stores the i-order vector representations in the distributed file system. And finally, the K-level GNN stores the calculated K-order vector representation of each node into a distributed file system for the subsequent node classification model. The steps are respectively executed by a plurality of working devices in the distributed system, so that K-order vector representations of all nodes in the whole graph can be obtained and stored in the distributed file system.
The node classification model is also operated on any piece of working equipment, for any target node to be classified, the K-order vector representation of the target node is obtained from the distributed file system and is input into the node classification model, and a classification result of the target node is obtained, which specifically can include whether the target node has a target attribute.
The following describes specific implementation steps of the above-described distributed node classification method in connection with specific embodiments. Fig. 3 illustrates a flow chart of a distributed node classification method, according to an embodiment, the execution subject of which may be any platform or server or cluster of devices with computing, processing capabilities, etc. It should be noted that the distributed system includes a plurality of working devices, and fig. 3 only shows implementation steps on any first working device. The steps performed on other operating devices in the distributed system can be deduced with reference to the steps in fig. 3.
Fig. 3 illustrates a distributed node classification method according to an embodiment, which is applied to any first working device of multiple working devices of a distributed system, and includes at least: step 302, obtaining first sub-graph data of full graph data; step 306, performing K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all the nodes, wherein the ith-round model processing comprises the steps of acquiring i-1-order vector representations of all the nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all the nodes, and storing the i-order vector representations of all the nodes into the distributed file system; the distributed file system is shared by the plurality of working devices; step 312, obtaining a K-order vector representation of the first node and the second node from the distributed file system, and inputting the K-order vector representation into a node classification model to obtain a prediction result about the target relationship between the first node and the second node.
First, in step 302, first sub-graph data of full graph data is acquired.
The first sub-graph data may be graph structure data indicating only the connection relationships of the nodes in the sub-graph, and not including vector representations of the nodes, to conserve storage resources. The vector representation of the node may be obtained from a distributed file system.
In one embodiment, the first sub-graph data is obtained by dividing the full-graph data by a plurality of working devices in a distributed system by a graph segmentation algorithm. The full graph may be partitioned using a variety of graph cut algorithms, such as METIS, distributed neighbor expansion algorithm DistributedNE (Distributed Neighbor Expansion), and the like. By using the graph splitting algorithm, the nodes adjacent to each other in the whole graph can be divided into the same sub-graph, so that when the neighbor nodes of each node are sampled in the subsequent step 306, the sampling can be directly performed from the sub-graph of a single working device, the communication between the working devices is reduced, and the operation efficiency is further improved.
Then, in step 306, performing K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of each node, where the i-th round model processing includes obtaining i-1-order vector representations of each node and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the i-th layer graph neural network to obtain i-order vector representations of each node, and storing the i-order vector representations of each node in the distributed file system; the distributed file system is shared by the plurality of work devices.
Wherein, a plurality of neighbor nodes of any node are direct neighbors or one-hop neighbor nodes. Specifically, first, an initial vector representation (0-order vector representation) of each node and a plurality of neighboring nodes in the first sub-graph is obtained, then the initial vector representation is input into a layer 1 graph neural network to calculate 1-order vector representations of each node, and the 1-order vector representations of each node are stored in a distributed file system. The initial vector representation may be a one-hot coding feature, or may be an embedded vector obtained by coding text or pictures via a corresponding encoder, which is not limited herein.
And then, acquiring 1-order vector representations of each node and a plurality of neighbor nodes in the first sub-graph from the distributed file system, inputting the 1-order vector representations into the layer 2 graph neural network, calculating 2-order vector representations of each node, and storing the 2-order vector representations of each node into the distributed file system.
And by analogy, the ith round of model processing comprises the steps of obtaining i-1 order vector representations of each node and a plurality of neighbor nodes thereof in a first sub-graph from a distributed file system, inputting the i-1 order vector representations into an ith layer of graph neural network, calculating the i-order vector representations of each node, and storing the i-order vector representations of each node into the distributed file system.
The distributed file system is shared by a plurality of working devices. The steps 302 and 306 are performed by a plurality of working devices in the distributed system, so that the K-order vector representation of all nodes in the whole graph can be obtained and stored in the distributed file system.
The intermediate data generated in the calculation process of the GNN of each layer of graph neural network is cached by using the distributed file system in step 302 and step 306, and the intermediate calculation results of each node can be multiplexed when different nodes are calculated, and can be shared among a plurality of working devices, so that a large number of repeated calculations in the traditional GNN are eliminated. For example, taking fig. 1 as an example, when the scheme of the above embodiment is adopted, the 1 st order vector representation of the node 3 is stored in the distributed file system for reading when calculating the 2 nd order vectors of the node 1 and the node 2. For another example, assume that in the full graph data, node a and node B are second order neighbors, but are divided into a first working device and a second working device, respectively. The intermediate order vector of node a calculated by the first operating device may be stored in the distributed file system and read by the second operating device to calculate the higher order vector of node B.
Furthermore, according to the above embodiment, the working device only needs to collect the one-hop neighbor node of any node in the first subgraph, instead of collecting the N-hop neighbor node in the conventional GNN, so that the calculation amount is further reduced in the sampling process.
In some possible implementations, before step 306, the method further includes step 304 of determining, for any node in the first subgraph, N neighbor nodes from all its one-hop neighbor nodes using a graph sampling algorithm.
Wherein the graph sampling algorithm may include: random sampling, uniform sampling, weighted sampling, and type sampling.
At this time, the i-1 order vector representation of each node and its multiple neighboring nodes obtained from the distributed file system in step 306 specifically includes: the i-1 order vector representations of each node and N neighbor nodes thereof are obtained from the distributed file system.
By using the graph sampling algorithm, data expansion caused when the graph data size is excessively large can be prevented.
Finally, in step 308, a K-order vector representation of the target node to be classified is obtained from the distributed file system, and is input into a node classification model to obtain a classification result for the target node.
The output of the node classification model may be a probability value indicating the probability that the target node has a certain target attribute, and when the probability value is greater than a preset first threshold, the classification result indicates that the target node has the target attribute.
In one embodiment, any node may have multiple attributes at the same time, where the output of the node classification model may be multiple probability values corresponding to the multiple attributes, and multiple attributes with probability values greater than the corresponding preset threshold are determined as the classification result of the node according to the comparison result of the probability value of each attribute and the corresponding preset threshold.
In one embodiment, the node classification model may be a Multi-Layer neural network (MLP) model.
Sub-sampling and model reasoning of the embodiment of the specification can be performed in a pipeline mode, N-hop neighbor sub-image data of each node do not need to be produced in advance, and therefore time consumption of sub-sampling is saved. In addition, one-hop neighbor subgraphs are produced and consumed in real time, and extra storage resources are not needed. Meanwhile, the scheme caches the intermediate calculation result into the distributed file system, and does not depend on a MapReduce framework of distributed calculation, so that various drawing learning frameworks can be adapted seamlessly.
Meanwhile, the scheme avoids a large number of repeated calculation problems in the traditional GNN model in a mode of caching intermediate results, thereby improving performance and expansibility. In addition, since the information of the multi-hop neighbors can be represented by the cached intermediate result, only one-hop neighbors are needed when sub-sampling is performed, thereby further reducing the calculation amount.
According to an embodiment of another aspect, a distributed node classification apparatus is also provided. Fig. 4 illustrates a schematic block diagram of a distributed node classification apparatus according to an embodiment, which may be deployed in any device, platform or cluster of devices having computing, processing capabilities. As shown in fig. 4, the apparatus 400 is deployed on any first working device of the plurality of working devices of the distributed system, and includes:
an acquisition unit 401 configured to acquire first sub-image data of the full-image data;
the vector calculation unit 403 is configured to perform K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of each node, where the i-th round model processing includes obtaining i-1-order vector representations of each node and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the i-th layer graph neural network to obtain i-order vector representations of each node, and storing the i-order vector representations of each node in the distributed file system; the distributed file system is shared by the plurality of working devices;
the node classification unit 404 is configured to obtain a K-order vector representation of a target node to be classified from the distributed file system, and input the K-order vector representation into a node classification model to obtain a classification result for the target node.
In some possible embodiments, the method further comprises:
the neighbor sampling unit 402 is configured to determine, for any node in the first sub-graph, N neighbor nodes from all its one-hop neighbor nodes using a graph sampling algorithm.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in any of the above embodiments.
According to an embodiment of yet another aspect, there is also provided a computing device including a memory and a processor, wherein the memory has executable code stored therein, and the processor, when executing the executable code, implements the method described in any of the above embodiments.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (11)

1. A distributed node classification method is applied to any first working equipment in a plurality of working equipment of a distributed system, and comprises the following steps:
acquiring first sub-graph data of the full graph data;
performing K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all nodes, wherein the ith-round model processing comprises the steps of acquiring i-1-order vector representations of all nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all nodes, and storing the i-order vector representations of all nodes into the distributed file system; the distributed file system is shared by the plurality of working devices; the vector representations of each order of each node can be multiplexed when different nodes are calculated;
and obtaining the K-order vector representation of the target node to be classified from the distributed file system, and inputting the K-order vector representation into a node classification model to obtain a classification result of the target node.
2. The method of claim 1, further comprising:
for any node in the first subgraph, a graph sampling algorithm is used to determine N neighbor nodes from all the one-hop neighbor nodes.
3. The method of claim 2, obtaining an i-1 order vector representation of each node and its plurality of neighbor nodes from a distributed file system, comprising:
the i-1 order vector representations of each node and N neighbor nodes thereof are obtained from the distributed file system.
4. The method of claim 2, wherein the graph sampling algorithm comprises: random sampling, uniform sampling, weighted sampling, and type sampling.
5. The method of claim 1, wherein the first sub-graph data is partitioned by a plurality of working devices in a distributed system performing a graph cut algorithm on the full graph data.
6. The method of claim 5, wherein the graph cut algorithm comprises: METIS, distributed neighbor extension algorithm.
7. The method of claim 1, wherein the node classification model is a multi-layer neural network, MLP.
8. A distributed node classification apparatus deployed on any first working device of a plurality of working devices of a distributed system, comprising:
an acquisition unit configured to acquire first sub-image data of the full-image data;
the vector calculation unit is configured to perform K-round model processing on all nodes in the first sub-graph data by using a K-layer graph neural network to obtain K-order vector representations of all the nodes, wherein the ith-round model processing comprises the steps of acquiring i-1-order vector representations of all the nodes and a plurality of neighbor nodes thereof from a distributed file system, inputting the i-1-order vector representations into the ith-layer graph neural network to obtain i-order vector representations of all the nodes, and storing the i-order vector representations of all the nodes into the distributed file system; the distributed file system is shared by the plurality of working devices; the vector representations of each order of each node can be multiplexed when different nodes are calculated;
the node classification unit is configured to acquire a K-order vector representation of a target node to be classified from the distributed file system, and input the K-order vector representation into a node classification model to obtain a classification result of the target node.
9. The apparatus of claim 8, further comprising:
and the neighbor sampling unit is configured to determine N neighbor nodes from all the one-hop neighbor nodes by using a graph sampling algorithm for any node in the first subgraph.
10. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-7.
11. A computing device comprising a memory and a processor, wherein the memory has executable code stored therein, which when executed by the processor, implements the method of any of claims 1-7.
CN202311483982.XA 2023-11-08 2023-11-08 Distributed node classification method and device Active CN117218459B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311483982.XA CN117218459B (en) 2023-11-08 2023-11-08 Distributed node classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311483982.XA CN117218459B (en) 2023-11-08 2023-11-08 Distributed node classification method and device

Publications (2)

Publication Number Publication Date
CN117218459A CN117218459A (en) 2023-12-12
CN117218459B true CN117218459B (en) 2024-01-26

Family

ID=89051501

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311483982.XA Active CN117218459B (en) 2023-11-08 2023-11-08 Distributed node classification method and device

Country Status (1)

Country Link
CN (1) CN117218459B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021082681A1 (en) * 2019-10-29 2021-05-06 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of graph neural network
WO2021179838A1 (en) * 2020-03-10 2021-09-16 支付宝(杭州)信息技术有限公司 Prediction method and system based on heterogeneous graph neural network model
CN113867983A (en) * 2021-09-14 2021-12-31 杭州海康威视数字技术股份有限公司 Graph data mining method and device, electronic equipment and machine-readable storage medium
CN116681104A (en) * 2023-05-11 2023-09-01 中国地质大学(武汉) Model building and realizing method of distributed space diagram neural network
CN117235032A (en) * 2023-11-08 2023-12-15 支付宝(杭州)信息技术有限公司 Distributed link prediction method and device
CN117241215A (en) * 2023-06-13 2023-12-15 南京蜘蛛网络科技有限公司 Wireless sensor network distributed node cooperative positioning method based on graph neural network

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021082681A1 (en) * 2019-10-29 2021-05-06 支付宝(杭州)信息技术有限公司 Method and device for multi-party joint training of graph neural network
WO2021179838A1 (en) * 2020-03-10 2021-09-16 支付宝(杭州)信息技术有限公司 Prediction method and system based on heterogeneous graph neural network model
CN113867983A (en) * 2021-09-14 2021-12-31 杭州海康威视数字技术股份有限公司 Graph data mining method and device, electronic equipment and machine-readable storage medium
CN116681104A (en) * 2023-05-11 2023-09-01 中国地质大学(武汉) Model building and realizing method of distributed space diagram neural network
CN117241215A (en) * 2023-06-13 2023-12-15 南京蜘蛛网络科技有限公司 Wireless sensor network distributed node cooperative positioning method based on graph neural network
CN117235032A (en) * 2023-11-08 2023-12-15 支付宝(杭州)信息技术有限公司 Distributed link prediction method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Cambricon-G: A Polyvalent Energy-Efficient Accelerator for Dynamic Graph Neural Networks;Xinkai Song et al.;IEEE;全文 *
Dynamic depth-width optimization for capsule graph convolutional network;Wu, SW et al.;FRONTIERS OF COMPUTER SCIENCE;全文 *
Optimizing Task Placement and Online Scheduling for Distributed GNN Training Acceleration;Luo, ZY et al.;IEEE;全文 *
图神经网络加速结构综述;李涵等;计算机研究与发展;第58卷(第06期);全文 *
大规模图神经网络研究综述;肖国庆等;http://kns.cnki.net/kcms/detail/11.1826.tp.20230817.0856.002.html;全文 *

Also Published As

Publication number Publication date
CN117218459A (en) 2023-12-12

Similar Documents

Publication Publication Date Title
CN111382868B (en) Neural network structure searching method and device
CN113128678A (en) Self-adaptive searching method and device for neural network
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
CN117235032B (en) Distributed link prediction method and device
CN110287820B (en) Behavior recognition method, device, equipment and medium based on LRCN network
CN111813539B (en) Priority and collaboration-based edge computing resource allocation method
CN115066694A (en) Computation graph optimization
CN113705811A (en) Model training method, device, computer program product and equipment
CN112580789B (en) Training graph coding network, and method and device for predicting interaction event
CN113988464A (en) Network link attribute relation prediction method and equipment based on graph neural network
US20230289618A1 (en) Performing knowledge graph embedding using a prediction model
KR102189811B1 (en) Method and Apparatus for Completing Knowledge Graph Based on Convolutional Learning Using Multi-Hop Neighborhoods
CN109471971B (en) Semantic prefetching method and system for resource cloud storage in education field
CN117218459B (en) Distributed node classification method and device
CN113674152A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111258968B (en) Enterprise redundant data cleaning method and device and big data platform
WO2023179609A1 (en) Data processing method and apparatus
WO2023143570A1 (en) Connection relationship prediction method and related device
Kaushik et al. Traffic prediction in telecom systems using deep learning
CN117223005A (en) Accelerator, computer system and method
CN115048425A (en) Data screening method and device based on reinforcement learning
CN110188219A (en) Deeply de-redundancy hash algorithm towards image retrieval
CN115809372B (en) Click rate prediction model training method and device based on decoupling invariant learning
CN117875520B (en) Public safety event prediction method and system based on dynamic graph space-time evolution mining
CN117409209B (en) Multi-task perception three-dimensional scene graph element segmentation and relationship reasoning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant