CN115879507A

CN115879507A - Large-scale graph generation method based on deep confrontation learning

Info

Publication number: CN115879507A
Application number: CN202211167773.XA
Authority: CN
Inventors: 程大伟; 许辰昊; 蒋昌俊
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2022-09-23
Filing date: 2022-09-23
Publication date: 2023-03-31

Abstract

The invention discloses a large-scale graph generation method based on deep confrontation learning, which comprises the steps of giving an adjacent matrix A and a characteristic matrix X of a graph G, sampling the adjacent matrix A and the characteristic matrix X, inputting a graph attention encoder to obtain structural information of the graph, and obtaining a true value of a community label by applying a community detection algorithm; feeding the community information and the graph representation output by the graph attention encoder to a community decoder to generate community labels corresponding to the nodes; adjusting parameters of the graph attention encoder and the community decoder by utilizing back propagation to guide the graph attention encoder and the community decoder to a potential space kept by a community; feeding the community information and graph representation output by the graph attention encoder to a graph encoder, generating edge probabilities; utilizing the edge probability simulation graph fractional matrix to finally sample to obtain a new graph generated by a model; the model provided by the invention can realize good balance between the quality and the efficiency (expandability) of the graph simulation.

Description

Large-scale graph generation method based on deep confrontation learning

Technical Field

The invention relates to the technical field of large-scale graph generation models, in particular to a large-scale graph generation method based on deep confrontation learning.

Background

The research on the graph generation model has been long, and the traditional methods such as ase:Sub>A B-A model, ase:Sub>A Chung-Lu model, ase:Sub>A Kronecker graph model, ase:Sub>A BTER model, an exponential random graph, ase:Sub>A random block model and the like are well designed to simulate ase:Sub>A specific graph family; for example, the exponential stochastic graph model (ERGM) relies on an expressive probabilistic model that learns the weights on node features to model the likelihood of edges in a graph; in practice, however, this approach is limited to a certain extent because it can only capture graph structures with sufficient statistical information, and the Kronecker graph model relies on the Kronecker matrix product to efficiently generate large adjacency matrices, although this approach is scalable and can learn some graph properties (e.g., degree distributions) from data, it is still limited to a large extent in terms of the graph structures it can represent; the BTER model is used for correcting the average clustering coefficient in each community and correcting degree distribution through a two-stage edge sampling process, and the BTER considers the community structure of the image display type by modeling the image display type into a two-stage E-R image; notably, SBM and its variants DCSBM and MMSB also consider community structure of graphs, but they suffer from the limitation of stochastic model simplicity, resulting in underperforming community structure preservation of graph generation tasks in real life, specifically, they have only one parameter for capturing each community (i.e., an edge within the community), one parameter for representing the probability of connectivity for each pair of communities (i.e., an edge between the two communities).

In recent years, some deep neural network-based techniques (e.g., VGAE, deepGMG, graphnn, graphite, GRAN, condGen) have been proposed to solve the map generation problem, which significantly improve the quality of map generation compared to conventional methods; for example, graph and VGAE use a variational auto-encoder (VAE) technique, where graph neural networks are used for inference (encoding) and generation (decoding), since graph and VGAE assume a fixed set of vertices so they can only learn from a single graph, netGAN performs more efficiently than VGAE by learning random walks on the graph, but it is not scalable since fixed-size graphs are generated; in DeepGMG, graph neural networks are used to represent probabilistic dependencies between nodes and edges of a graph, which can be correctly learnedWhile we have tried distributions over graph, generating a graph with m edges, n vertices, and a diameter D (G) requires a complexity of O (mn) ² D (G)), which also has the problem of poor scalability.

Currently, graphnn generates graphs through a Recurrent Neural Network (RNN) sequence, but it is not permutation-invariant, since computing likelihood requires possible permutations of the node order of the marginalized adjacency matrix, GRAN improves the scalability of graphnn by generating one node block and associated edges at each step in the autoregressive approach, which is still not permutation-invariant, condGen overcomes this permutation-invariance challenge by using GCN as an encoder and dealing with graph generation problems in the embedding space, graphnu-Nets select specific nodes to implement upsampling and downsampling of the graph to obtain a graph representation, however, they do not consider the observed community structure of the graph in the learning process, SBMGNN is a variant of SBM, equipped with deep learning techniques, but its graph neural network is used to infer parameters of the overlapping random block model, which are not directly related to the retention property, and thus there is no improvement in community performance in terms of retention compared to other deep learning based graph generation models.

The generation of a countermeasure network (GAN) shows remarkable effects in various tasks such as image generation, image translation, super-resolution imaging and multimedia synthesis, and GAN is recently used for network scientific tasks such as network embedding, semi-supervised learning and graph generation, for the graph generation task, prior structure knowledge specified by a sample data set is crucial to graph generation, especially under the condition of community structure maintenance; for the community structure of the graph, some models using the pooling strategy can be trained to represent communities (clusters), but it is still a challenge to represent and generate these community structures simultaneously, for example, netGAN generates graphs by random walk, which is very important for maintaining the community structure.

Disclosure of Invention

This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.

The present invention has been made in view of the above-mentioned conventional problems.

Therefore, the technical problem solved by the invention is as follows: limited by complexity, the scalability of the deep learning model is often poor, and the community retention characteristic of the graph is not concerned by the traditional method, so the correlation performance of the graph generation is also poor.

In order to solve the technical problems, the invention provides the following technical scheme: comprises the steps of (a) preparing a substrate,

for the graph G, an adjacent matrix A and a characteristic matrix X are given, after sampling is carried out on the graph G, the graph G is input into a graph attention encoder to obtain structural information of the graph, and a real value of a community label is obtained by applying a community detection algorithm;

feeding the community information and the graph representation output by the graph attention encoder to a community decoder to generate community labels corresponding to the nodes;

adjusting parameters of the graph attention encoder and the community decoder by utilizing back propagation to guide the graph attention encoder and the community decoder to a potential space kept by a community;

feeding the community information and graph representation output by the graph attention encoder to a graph encoder, generating edge probabilities;

and finally sampling to obtain a new graph generated by the model by utilizing the edge probability simulation graph fractional matrix.

As a preferable aspect of the large-scale map generation method based on deep confrontation learning according to the present invention, wherein: there is also a need to enhance the reconstruction of the graph structure prior to sampling, including,

for the graph G, the adjacency matrix A and the feature matrix X are given, the adjacency matrix A and the feature matrix X are input into a ladder encoder to obtain structure information of the graph, and the real value of the community label is obtained by applying the community detection algorithm;

feeding the community information and graph representation output by the ladder encoder to a discriminator to determine whether the input graph is a false graph distinct from the true graph;

meanwhile, the coarsening graph of each level distributes the community structure characteristics to the original nodes through a differentiable layer message transmission process;

and decoding a series of community information of each original node to enhance the reconstruction of the graph structure.

As a preferable aspect of the large-scale map generation method based on deep confrontation learning according to the present invention, wherein: the ladder encoder includes graph convolution, graph pooling, graph readout, and graph transpose pooling.

As a preferable scheme of the large-scale map generation method based on deep confrontation learning of the present invention, wherein: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

generating a new graph with observed hierarchical community structure distribution by using variational reasoning before decoding node characteristics;

and selecting a multilayer perceptron as a reasoning model for reasoning, and finishing the mapping from the reconstruction characteristics to the prior distribution.

As a preferable aspect of the large-scale map generation method based on deep confrontation learning according to the present invention, wherein: the discriminator comprises a plurality of discriminators which are connected in series,

the identification task needs the graph characteristics obtained by the ladder encoder, and a layer output matrix is read according to the graph;

and optimizing the discriminator, performing form calculation through game, combining training and updating parameters through increasing gradient.

As a preferable aspect of the large-scale map generation method based on deep confrontation learning according to the present invention, wherein: comprises the steps of (a) preparing a substrate,

by passing from A _out The parameterized class distribution sampling of the ith row generates an edge for the node i;

selection A _out Until the number of edges reaches a predefined number;

the total temporal complexity of generating the new graph is O (n) ² )。

As a preferable scheme of the large-scale map generation method based on deep confrontation learning of the present invention, wherein: the image decoder includes a decoder for decoding a plurality of encoded data streams,

decoding the hierarchical representation sequence;

and predicting node linkage.

given input node characteristics

And a self map;

aggregating messages from graph structures using a multi-headed attention mechanism to obtain hidden variables h for central nodes u of a coherent autograph _u 。

As a preferable scheme of the large-scale map generation method based on deep confrontation learning of the present invention, wherein: for each of the self graphs, message aggregation includes,

wherein the content of the first and second substances,

a row of hidden variables representing the graph attention coding layer, namely hidden variables on the node u;

representing an output projection matrix; h is _attn Indicating the number of heads of attention; d _att Representing the dimension of the attention vector a.

As a preferable aspect of the large-scale map generation method based on deep confrontation learning according to the present invention, wherein: probability distribution based on node degree is taken as an initial node sampling strategy, comprising,

wherein deg is _u Representing the degree of node u, assuming sampling in each traversaln _s Each node is used as an initial time sequence node, and n is sampled _s The self graph is used as the input of the coding process, and the initial node set is expressed as

The invention has the beneficial effects that: the invention provides a new graph generation model, namely a large-scale graph generation method (LSGEN), which not only can keep community structure and other important attributes in a real graph, but also can reduce graph simulation time and improve scalability compared with other graph generation models based on learning, wherein a generator and a discriminator are carefully designed in a unified GAN framework, wherein the generator is positioned in a hierarchical graph variation automatic encoder, can learn the replacement-invariant representation of an input graph and can generate a new graph from node representation, the discriminator is used for judging whether embedding is from a real graph or a simulated graph, and a micro-ladder network is introduced to realize graph pooling and message transmission of different community structure levels, which is more effective than simply stacking deeper graph volume layers; meanwhile, the invention also introduces an extensible version SLSGEN which has a shorter production line and a faster training speed, and in the SLSGEN, an efficient graph attention automatic encoder framework is designed for generating a community maintenance graph; the self-graph (EgoGraph) sampling and two-part computation graph assembling strategy are adopted, and a small-batch-based method is realized to train a graph generator; the data parallel and model parallel architecture for training and reasoning of the scalable graph generation model is provided, extensive experiments are carried out on the composite graph and the real graph, and results show that compared with a baseline method, the proposed model can achieve good balance between the quality and the efficiency (expandability) of graph simulation.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:

FIG. 1 is a block diagram of a LSGEN of a large-scale graph generation method based on deep confrontation learning according to an embodiment of the present invention;

FIG. 2 is a flow diagram of an SLSGEN of a large-scale graph generation method based on deep confrontation learning according to an embodiment of the present invention;

fig. 3 is a schematic diagram of two computation graphs of a large-scale graph generation method based on deep confrontation learning according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.

Furthermore, the references herein to "one embodiment" or "an embodiment" refer to a particular feature, structure, or characteristic that may be included in at least one implementation of the present invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not necessarily enlarged to scale, and are merely exemplary, which should not limit the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.

Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Example 1

For ase:Sub>A long time, the traditional methods (such as B-A model, chung-Lu model, etc.) are well designed to simulate ase:Sub>A specific graph family, but they often rely on ase:Sub>A priori distribution of graphs and are limited by model simplicity and do not perform well in terms of generation quality.

In recent years, some technologies based on deep neural networks (such as VGAE, deep gmg models, etc.) have been proposed to solve the problem of graph generation, which significantly improve the quality of graph generation compared with the conventional methods, but are limited by complexity, the scalability of deep learning models is often poor, and the community retention characteristics of graphs are not focused by the past methods, so the related performance of graph generation is also poor.

Referring to fig. 1, 2 and 3, a large-scale graph generation method based on deep confrontation learning is provided for an embodiment of the present invention, and includes:

s1: for the graph G, an adjacent matrix A and a characteristic matrix X are given and input into a ladder encoder to obtain structural information of the graph, and a real value of the community label is obtained by applying a community detection algorithm.

S2: the community information and graph representation output by the ladder encoder are fed to a discriminator to determine if the input graph is a false graph distinct from the real graph, while the coarsened graph of each level will distribute its community structure characteristics to the original nodes through a differentiable layer messaging process.

S3: and decoding a series of community information of each original node to enhance the reconstruction of the graph structure.

S4: and for the graph G, the adjacent matrix A and the characteristic matrix X are given, after sampling, the graph is input into a graph attention encoder to obtain the structure information of the graph, and the real value of the community label is obtained by applying a community detection algorithm.

S5: community information and graph representations output by the graph attention encoder are fed to a community decoder, and community labels corresponding to the nodes are generated.

S6: parameters of the attention encoder and community decoder are adjusted by back propagation to guide the parameters to potential space maintained by the community.

S7: the community information and graph representation output by the graph attention encoder are fed to a graph encoder, generating edge probabilities.

S8: and (4) simulating the graph fractional matrix by using the edge probability, and finally sampling to obtain a new graph generated by the model.

Referring to fig. 1, in order to better explain the implementation principle of the LSGEN algorithm provided by the present invention, the following detailed description is made in this embodiment:

(1) Ladder messaging encoder

Introducing a ladder encoder, wherein the model can adaptively adjust the pooling strategy and extract the community structure information of the nodes, and the node characteristic X and the adjacency matrix A are belonged to {0,1} ^n×n As input to the proposed encoder, for each graph G, the identity matrix is used as its default node feature X ∈ R ^n×d Each node has d-dimensional features and an adjacency matrix a, and the input graph G will be coarsened using the superimposed convolution and pooling layers.

Graph convolution: the classical messaging model is represented by the Graph Convolution Network (GCN).

Propagated message Z ∈ R ^n×d' The calculation is as follows:

wherein the content of the first and second substances,

represents->

Based on the degree matrix, is greater than or equal to>

A contiguous matrix containing self-loops is represented,

W∈R ^d×d' is a trainable parameter in the graph volume layer, with a kernel size of d'; σ denotes the activation function (ReLU by default); x denotes a node feature derived from spectral embedding of the adjacency matrix a of X = X (a). />

If used, the

In a cell line (e.g. < u >)>

) To improve the connectivity of the graph, the information can flow faster between nodes, the time complexity of graph convolution is O (m + n), where m represents the number of edges.

Pooling of the pictures: simply passing messages through the GCN may result in a very deep network being able to capture structural information, and particularly when large sparse graphs of low connectivity are encountered, there is a need for an efficient way to obtain a hierarchical representation of the graph, with graphs being paired in a hierarchyCoarsening and learning through a series of distribution matrices

Strategy for roughing a set of drawings, where n _l 、n _l+1 And k represent the number of input nodes, output nodes, and layers, respectively.

The assignment matrix is calculated as follows:

Z ^(l) ＝σ(GCN _l，embed (X ^(l) ，A ^(l) ))

S ^(l) ＝softmax(GCN _l，pool (Z ^(l) ，A ^(l) ))

where, σ is the ReLU activation function,

and &>

Respectively represent n _l Characteristic matrix and adjacency matrix of a cluster node->

Feature matrices representing the l-th layer with structural information, two GCNs used to collect structural information and infer pooling strategies for the l-layer, respectively, pairNorm is used after each GCN due to the multiple operations of graph convolution and pooling of one graph to allow stacking of deep GCNs without excessive smoothing, the assignment matrix can be viewed as a predictive node community assignment and will be constrained by the true label, given the assignment matrix S ^(l) Coarsening the adjacency matrix A ^(l+1) And a new insertion X ^(l+1) Can be generated as follows:

the stack graph volume and pooling layer may obtain a series of node representations at different levels, and if layer k has only one node after pooling, the corresponding allocation matrix will be

So that pooling of the map corresponds to the sum of the map readouts, the total temporal complexity of the map pooling is O (m + n).

Reading the graph: the node representation of each graph is folded into a graph representation by graph readout, and therefore, the output characteristic s of the ith-level coarsened graph _i The read out of (c) is calculated as follows:

where k is the number of layers per graph, x _ij A representation of a jth node representing an ith level graph, an

Representation combining all representations in a new dimension, graph read time complexity O (n), final graph representation s ∈ R ^k×d Is the input to the graph discriminator.

And (3) image transposition pooling: unlike upsampling of coarsened graphs, which requires proper depolarization of the graph to reconstruct the node representation, the present embodiment introduces a differentiable method to distribute information from a coarsened graph to a detailed graph, the proposed distribution method uses a transposed version of a similar distribution matrix, the transposed distribution matrix

Is calculated as follows:

thus, reconstructing the node representation Z _rec ∈R ^n×k×d Is calculated as follows:

wherein the content of the first and second substances,

representation all node representations are combined in a new dimension, after which Z _rec As an input to the decoder D proposed in this embodiment, the total time complexity of transpose pooling is O (m + n), and it is noted that in this work, this embodiment adds a variational inference module to the potential distribution of conjugate nodes to control the output of the encoder.

(2) Variational reasoning

The embodiment utilizes variational reasoning before decoding node characteristics to generate a new graph with observed hierarchical community structure distribution, and uses

Achieving a priori distribution N (mu, diag (sigma)) from reconstructed features ² ) ) that selects a multi-layer perceptron (MLP) as the inference model. The reasoning process formula is as follows:

g(Z _rec ，φ)＝σ(Z _rec φ ₀ )φ ₁

where φ represents a set of parameters in MLP, g (-) _i Line i, Z, representing g (-) _vae ∈R ^n×k×d The time complexity of the reasoning module is O (kn), and the probabilistic variational reasoning can enable the node representation to be far away from the zero center, so that the node representation is intuitively made to be more sparse, and the community structure of the node can be kept.

After the variation reasoning module, a new node feature is selected from the prior distribution to generate a new graph, but the fully connected network itself is found to be unable to handle the task of generating a graph with a complex and hierarchical community structure, so this embodiment proposes a graph decoder to solve this problem.

(3) Graphic decoder

The decoder proposed in this embodiment comprises two steps: decoding the hierarchical representation sequence first, then predicting the node link, embedding the hierarchical community structure with Gating Recursive Unit (GRU), and obtaining the node characteristic h _k Where k represents the number of community structures, the decoding characteristics are obtained by the following formula:

wherein h is _l Represents the hidden state of the coarsened graph, h ₀ Is a matrix of zeros, which is,

represents the node characteristics of the first coarsened graph, h _k And representing the characteristics of the decoding nodes with the hierarchical community information, and after the node representation is obtained, giving the following link prediction:

g _θ (h _k )＝σ(h _k θ ₀ )θ ₁

p _θ (A _ij |h _k，i ，h _k，j )＝σ(g _θ (h _k，i ) ^T g _θ (h _k，j ))

wherein, g _θ (h _k，i ) Is a two-layer MLP to extract community information to help generate edges, h _k，i Features representing the ith node, A _rec ∈R _n×n Representing the probability matrix of the link prediction, n is sampled in order to speed up this process when the decoder is trained on a large graph _s (n _s N) nodes to obtain A _rec ∈R ^ns×ns 。

Specifically, the nodes are sampled without replacement to assemble the subgraph according to the node degree policy, as follows:

wherein, P _i Is the probability, deg, of selecting node i _i Represents the degree of node i, thus illustrating the temporal complexity of the decoder as

(4) Discriminator and optimization

The graph discriminator: the discrimination task requires the features of the graph obtained by the encoder, i.e. the output matrix s e R of the graph readout layer ^k×d And S = E (a), using a two-layer MLP classifier as discriminator D, which is defined as:

D _φ (A)＝σ(MLP(s，φ))

where φ represents the parameters of the MLP and σ represents the sigmoid activation function.

Optimizing the discriminator: formally, G and D play the minimax game using the value function V (G, D) as follows:

wherein, Z _vae And Z _s Sampling from the approximate distribution and gaussian prior distribution, respectively, and furthermore, in order to improve the level of discriminators with the clustering results, a clustering consistency is introduced:

wherein S is ^l The distribution matrix to be introduced is represented,

representing real community division of the observation graph, under the default condition, obtaining a layered community detection result by utilizing a louvain community detection algorithm to serve as the real community division, and updating phi through increasing gradient in the training process _D ：

When judging the graph in the real data set, using the upper half of the above formula to update the parameters, in order to ensure that the community structure maintenance and other optimization objectives are fully considered, the training process is only L _clus And log (D (A)) converge; when the generated graph is judged, the parameters are updated using the lower half of the above equation.

Generator optimization: the generator aims at minimizing the logarithmic probability of the discriminator correctly assigning to the graph reconstructed by G, introducing from CycleGAN a mapping consistency L in order to improve the performance of the decoder D and guarantee the invariance of permutations _rec Such asThe following:

wherein, A' _i Denotes from A _i The reconstructed pseudo-adjacency matrix, in practice, the collapse of the encoder E can be controlled by mapping consistency, by computing the decoder gradient with respect to φ _D Gradient (2):

where A' represents the reconstructed adjacency matrix, and the encoder is computed with respect to φ by reducing the gradient after updating the decoder _E Gradient (2):

wherein, the Gaussian prior p (Z) is set as

And L is _prior (. Is) represents the calculation of the Kullback-Leibler (KL) divergence between the two distributions, with this improved encoder and decoder, the generation process can generate new graphs of arbitrary size and similar community structure.

(5) Generating new graphs

After training, the present embodiment samples n _s (n _s N) nodes to obtain A _sub ∈R ^ns×ns And combining the output matrix A obtained from the generator and verified by the discriminator to generate the adjacency matrix _out ∈R ^n×n Specifically, a null A is initialized _out And filling in the adjacency matrix A of each subgraph _sub Until the number of generated edges meets the requirement, selecting a threshold value to determine the binarization strategy of each edge and passing A _out Parameterized Bernoulli-distributed sampling strategies may each lead to overlookingSlightly lower-order nodes and high variance output; to solve these problems, the present embodiment uses the following strategy: by passing from A _out The parameterized class distribution sampling of the ith row generates an edge for the node i; selection A _out Until the number of edges reaches a predefined number; the total temporal complexity of generating the new graph is O (n) ² )。

Referring to fig. 2, in order to reduce the parameter and computation costs, the hierarchical ladder encoder and discriminator are abandoned, and the core codec part is retained: the graph attention encoder, the community encoder and the graph decoder together form an automatic coding-based framework of SLSGEN, and the embodiment is described in detail below in order to better explain a large-scale graph generation method based on SLSGEN.

(1) Simplified autoencoder architecture

To avoid the entire graph computation in GCN, graph attention networks are used to measure edge importance in sampled local structures, specifically given input node characteristics

And an Ego Graph (Ego Graph) that uses a multi-head attention mechanism to aggregate messages from Graph structures to obtain hidden variables h for a central node u of the associated Ego Graph _u Wherein d is _enc Representing the dimensions of the hidden variables after the encoding process, for each autograph, the message aggregation formula is as follows:

/>

wherein the content of the first and second substances,

representing an output projection matrix; h is _attn Indicating the number of heads of attention; d _att Indication noteThe dimension of the gravity vector a; each head of the chart attention tier>

The calculation formula of (c) is as follows:

where σ denotes the activation function, N (u) denotes the neighbor of node u,

expressing the importance of the ith head middle edge (u, v), the calculation formula is as follows:

wherein the content of the first and second substances,

denotes the attention vector of the ith attention head and the leakyrev denotes a non-linear activation function with negative input slope α = 0.2.

The present embodiment uses community tags to direct latent variables to a latent space maintained by the community, and furthermore, to avoid complexity of O (n) in the decoder ² ) The inner product decoder is abandoned, a linear decoder is used to obtain the fraction of each edge, and the formula of the two decoders is as follows:

wherein the content of the first and second substances,

representing a predicted community allocation probability matrix; />

Representing a predicted edge probability matrix; in practice, the real community label of a node is calculated through a lovain community detection algorithm, after the community label of a sampling node is generated, cross entropy loss is calculated according to the real community label, parameters of a community decoder and an encoder are updated, an adjacent vector of the sampling node is generated, and approximate loss is calculated according to an adjacent matrix of a real graph, wherein an approximate loss function calculation formula is as follows:

wherein the content of the first and second substances,

representing a set of sampled central nodes, n _s Represents->

The size of (d); />

Representing a fractional matrix from a graph decoder; y is _c Representing real community tags by adjusting n _s Can be balanced between generating a high quality temporal map and fast model training, assuming that n is sampled for model training _s The spatial complexity of the proposed autoencoder architecture is O (n × (n) for each node _s +d _in ) In a time complexity of @) for each training iteration>

(2) Scalable graph sampling

In order to realize the generation of the scalable graph and maintain the generation performance of the model, the complete graph is decomposed into a plurality of self graphs, and the approximation of the task of generating the complete graph is realized by running an SLSGEN core architecture on the self graphs for a plurality of times; specifically, a representative node in a training phase is selected to model a complete graph structure, and nodes with higher degrees are associated with outliers with lower probability, so in order to pay attention to the representative nodes and edges and generate a high-quality graph, a probability distribution based on the node degrees is used as an initial node sampling strategy, and the formula is as follows:

wherein deg is _u Representing the degree of node u, assuming that n is sampled in each traversal _s Each node is used as an initial time sequence node, and n is sampled _s The self graph is used as the input of the coding process, and the initial node set is expressed as

Referring to fig. 3, to reduce the time consumption of the encoding process, a node representation of parallel node encoding is computed over multiple autographs, as shown in fig. 3 (a), to reduce the computational complexity from O (n) to

Wherein b represents the parallel number of the time self graph, namely the batch size; for efficient training, the batch size is set to the size of the initial sampling center node set, where

Thus, the computational complexity is parallelized into ≦ ≦ the ≦ computation complexity>

In order to further reduce space consumption, the present embodiment uses a truncation mechanism to control space usage, ignores duplicate nodes when sampling replacement nodes, uses th as a threshold to control worst-case space requirements, and once the total number of neighbors of a node exceeds th, the algorithm will convert from a full neighbor sampling strategy to a th neighbor sampling strategy.

In order to avoid repeated computation on high-order nodes, after sampling the self graphs, all the self graphs are merged into k-dichotomy computation graphs as shown in fig. 3 (b), and message passing and edge significance computation are performed on the computation graphs at the same time.

(3) Data parallel and model parallel

When training is performed on a large graph with more than one million nodes, GPU memory is easily exhausted, and therefore, it is necessary to perform data parallel training on multiple GPU machines, and assuming that there is one dual GPU machine, according to fig. 3 (c), model parameters and sampled self graphs are respectively placed into two GPUs, and in the two GPUs, the respective self graphs are assembled into two computation graphs, in this embodiment, representations of central nodes, prediction score matrices, and gradient values of parameters to be updated are computed, and self graph merging and encoder-decoder modules can be completed completely in parallel; only when there is a significant difference in training time between the two GPUs will serial dependency be generated on the update embedding of certain nodes.

For example, in FIG. 3 (c), three node embeddings require synchronization between two GPUs, limiting the number of embeddings to O (t) according to a self graph sampling policy ^k ) Hereinafter, where k is the self-graph depth of the sample and t is the size of the cutoff value, so that the communication cost is controllable, the upper limit of SLSGEN is 20 hundred million nodes (i.e. 2) by this data parallel strategy ³¹ 1 node), which is far more than other learning-based approaches.

If necessary, have more than 2 ³¹ Large graph of nodes using model parallelism, in particular, dividing the model into c copies and dividing the training data into blocks, each block containing less than 2 ³¹ And a node, training the model on a plurality of machine clusters, in this case, assuming that the depth of the sampled self-graph is 1, the time complexity of communication between the machines is less than O (t), wherein t is a truncated value when the self-graph is sampled, and the communication cost is manageable.

Preferably, it should be described again in this embodiment that, as with other learning-based graph generators, LSGEN also uses some classical models, such as VAE and CycleGAN, so as to achieve graph generation quality far exceeding that of the conventional graph generation method, and be capable of being competent for various graph generation tasks, but the present embodiment focuses on two new objectives: efficiency (scalability) and community-keeping based on learned models, which requires developing new techniques in the present invention, e.g., inferring hierarchy information separately with VAEs, which facilitates community-keeping for graph generation; furthermore, the consistency of the mappings of CycleGAN and VAE is used for permutation invariant graph generation, which is crucial for scalable implementation of sampling strategies.

Preferably, the embodiment further needs to explain that, in SLSGEN, a learning-based solution with better expandability is provided by using a CPU memory or even a hard disk, and specifically, an efficient graph attention automatic encoder architecture is designed for generating a community maintenance graph; the self-graph sampling and the second computation graph assembling strategy are adopted, and a small-batch-based method is realized to train the graph generator; a data parallel and model parallel architecture for training and reasoning of a scalable graph generation model is proposed; in summary, SLSGEN reduces the training time of the original LSGEN model by 4 times and reduces the memory usage by 4 times.

It should be recognized that embodiments of the present invention can be realized and implemented in computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.

Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.

Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.

As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).

It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims

1. A large-scale map generation method based on deep confrontation learning is characterized in that: comprises the steps of (a) preparing a substrate,

2. The large-scale map generation method based on deep confrontation learning according to claim 1, characterized in that: there is also a need to enhance the reconstruction of the graph structure prior to sampling, including,

meanwhile, the coarsening graph of each level distributes the community structure characteristics to the original nodes through the differentiable layer message transmission process;

3. The large-scale map generation method based on deep confrontation learning according to claim 2, characterized in that: the ladder encoder includes graph convolution, graph pooling, graph readout, and graph transpose pooling.

4. The large-scale map generation method based on deep confrontation learning according to claim 2 or 3, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

and selecting a multilayer perceptron as a reasoning model for reasoning, and completing the mapping from the reconstruction characteristics to the prior distribution.

5. The large-scale map generation method based on deep confrontation learning according to claim 4, characterized in that: the discriminator comprises a plurality of discriminators which are connected in series,

6. The large-scale map generation method based on deep confrontation learning according to claim 5, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

selection A _out Until the number of edges reaches a predefined number;

the total temporal complexity of generating the new graph is O (n) ² )。

7. The large-scale map generation method based on deep confrontation learning according to claim 1, characterized in that: the image decoder includes a decoder for decoding a plurality of encoded data streams,

decoding the hierarchical representation sequence;

and predicting node linkage.

8. The large-scale map generation method based on deep confrontation learning according to claim 1 or 7, characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,

given input node characteristics

And a self map;

using a multi-head attention mechanism to gather data from graph structuresMessage, hidden variable h of central node u of obtaining relevant self-graph _u 。

9. The large-scale map generation method based on deep confrontation learning according to claim 8, characterized in that: for each of the self graphs, message aggregation includes,

wherein the content of the first and second substances,

representing an output projection matrix; h is _attn Indicating the number of attention heads; d _att Representing the dimension of the attention vector a.

10. The large-scale map generation method based on deep confrontation learning according to claim 9, characterized in that: the probability distribution based on the node degree is used as an initial node sampling strategy, including,

wherein deg. de _u Representing the degree of node u, assuming that n is sampled in each traversal _s Each node is used as an initial time sequence node, and n is sampled _s The self graph is used as the input of the coding process, and the initial node set is expressed as

/>