CN112633499A

CN112633499A - Unsupervised graph topology transformation covariant representation learning method and unsupervised graph topology transformation covariant representation learning device

Info

Publication number: CN112633499A
Application number: CN202110035423.7A
Authority: CN
Inventors: 胡玮; 高翔; 郭宗明
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-01-12
Filing date: 2021-01-12
Publication date: 2021-04-09

Abstract

The invention discloses a method and a device for unsupervised graph topology transformation covariant representation learning, and relates to the field of unsupervised learning. The invention discloses a general framework capable of being applied to GCNN for learning graph node feature representation, which formalizes graph topology transformation covariant representation by maximizing graph topology transformation and mutual information between the node representations of graphs before and after transformation. At the same time, the present invention demonstrates that maximizing this mutual information can be approximated as minimizing the cross entropy between the graph topology transformation and the graph topology transformation estimated from the node representations of the graph before and after the transformation. Specifically, the invention samples partial node pairs from an original graph, inverts the connectivity of edges between the node pairs to realize graph topology transformation, and then self-trains a representation encoder to learn the feature representation of the nodes by reconstructing the graph topology transformation from the feature representations of the original graph and the transformed graph. The method is applied to node classification and graph classification tasks and is superior to the latest unsupervised method.

Description

Unsupervised graph topology transformation covariant representation learning method and unsupervised graph topology transformation covariant representation learning device

Technical Field

The invention relates to the field of unsupervised learning, in particular to a method and a device for unsupervised graph topology transformation covariant representation learning.

Background

Graph (Graph) is a natural and efficient representation of irregular/non-euclidean data (e.g., 3D point clouds, social networks, citation networks, brain networks, etc.). Machine learning of Graph data is becoming more and more important due to the powerful expressive power of graphs, such as Graph Convolutional Neural Network (GCNN), which has been proposed in recent years. However, most of the existing GCNN models are trained in a supervised or semi-supervised manner, which requires a large number of labeled samples to learn a valid feature representation. The existing methods are difficult to apply widely due to the high cost of labeling, especially on large scale maps. Therefore, we need to learn graph feature representations in an unsupervised manner in order to accommodate the learning task of more graphs.

Representative unsupervised learning methods include Auto-Encoders (AEs) and generation of countermeasure Networks (GANs). An auto-encoder aims to learn a characteristic representation of data by reconstructing the input data through a coder-decoder network. In contrast, GAN uses a generator and discriminator network to generate data from input noise to learn a feature representation in an unsupervised manner, where these "noises" can be considered representations of the data.

Based on AE and GAN, many methods further improve the quality of unsupervised feature learning by learning "Transformation invariant Representations" (TERs). In TER learning, it is generally assumed that applying a transformation to data causes a co-variation in the data feature space, and therefore the transformation applied to the data can be reconstructed from the feature representations of the data before and after transformation, so that the feature representation of the data can be learned. The idea of "transformation covariant characterization" learning first appeared in Hinton's "transformation capsule network". After this, many approaches to "transformation covariant characterization" learning have been proposed. However, these models are limited to discrete transforms and need to be trained in a fully supervised fashion, which limits their ability to learn continuous transform-based "covariate characterizations". To generalize to more general transformations, Zhang et al proposed learning unsupervised feature representations by means of automatic code transformation (AET). AET first randomly applies transforms to the images, and then reconstructs the transforms from the feature representations of the original and transformed images to train the auto-encoder network. However, AET focuses on learning of transform covariant representations for images and is difficult to directly extend to non-euclidean space graph data. In addition, GraphTER extends AET onto non-Euclidean data, learning graph transform covariant representations by automatically encoding node transforms in an unsupervised manner. However, GraphTER only explores the transformation of node features, while the topology of the graph has not been fully explored, but this is crucial in unsupervised graph representation learning.

Disclosure of Invention

In response to the above-described problems and deficiencies of the related art, the present invention provides a method and apparatus for graph topology transformation covariant characterization (TopoTER) learning, which learns a feature representation of a graph in an unsupervised manner by estimating a graph topology transformation.

Unlike the way node features are transformed in GraphTER, TopoTER learns the transformed covariant characterization by transforming graph topology. Then, the graph signal is used as input, and graph convolution operation is respectively carried out on the graph before the transformation and the graph after the transformation, so that different graph node characteristic representations are obtained. Formally, we propose TopoTER from an Information theory perspective, aiming to maximize graph topology transformation and Mutual Information between node representations of the graph before and after transformation (Mutual Information). At the same time, we demonstrate that maximizing this mutual information can be approximated as minimizing the cross entropy between the graph topology transformation and the graph topology transformation estimated from the node representations of the graph before and after the transformation.

The technical scheme adopted by the invention is as follows:

a method for unsupervised graph topology transformation covariant characterization learning comprises the following steps:

establishing a graph convolution automatic encoder network comprising an encoder and a decoder;

the encoder learns the node characteristic representations of the original graph and the graph after the topological transformation respectively, and the decoder estimates the topological transformation applied to the original graph from the node characteristic representations of the original graph and the graph after the topological transformation;

the entire graph-rolled autoencoder network is trained by minimizing the cross entropy between the original topological transform and the estimated topological transform.

Further, partial node pairs are sampled from the original graph, and then the connectivity of edges between the node pairs is overturned with a certain probability to realize topology transformation.

Further, given a graph signal and an adjacency matrix (X, A) corresponding thereto, and the graph signal and the adjacency matrix subjected to topology transformation t

The function E (-) is said to be transform invariant if it satisfies the following equation:

where ρ (t) represents the homomorphic transformation of t in the feature space.

Further, by maximizing the sum of (H, Δ A)

Mutual information between

To ensure transformation co-variability of graph topology transformation, where Δ A is the graph topology transformation matrix, H and

the characteristic representation of the graph signals before and after the graph topology transformation is shown.

Further, mutual information will be maximized

Is approximated as a minimized probability distribution

And

cross over betweenEntropy H (p | | q):

wherein,

representation solving

In the probability distribution

The following expectations are that,

a graph topology transformation matrix representing the decoder estimate.

Further, the graph topology transformation parameters are divided into four types, thereby dividing the graph topology transformation parameters into four types

The topology transformation parameter problem in the estimation Δ a is converted into a classification problem of parameter types, the four types including:

(a) add edges for broken vertex pairs:

(b) delete edge of connected vertex pair:

(c) keeping the original disconnect relationship:

(d) keeping the original connection relation:

further, the estimating of the topology transformation applied on the original graph from the node feature representation of the original graph and the topology transformed graph comprises:

the difference between the feature representations before and after the transformation is first calculated:

wherein, δ h_iA difference value representing the feature representation before and after the transformation of the vertex i; then, the topological transformation between the node i and the node j is predicted through the difference delta H of the node characteristics, and the representation of the edge is firstly constructed:

wherein | · | | | non-conducting phosphor |, which represents the Hadamard product of two vectors₁Representing vector l₁A norm; edge representation e_i,jThen, the graph topology transformation parameters are predicted by being sent into a plurality of linear layers:

wherein softmax (·) represents an activation function;

the entire auto-encoder network is trained by minimizing the following cross-entropy:

wherein f represents the type of graph topology transformation and y represents the real type of graph topology transformation.

The device for unsupervised graph topology transformation covariant representation learning by adopting the method comprises a graph convolution automatic encoder network consisting of an encoder and a decoder; the encoder learns the node characteristic representations of the original graph and the graph after the topological transformation respectively, and the decoder estimates the topological transformation applied to the original graph from the node characteristic representations of the original graph and the graph after the topological transformation; the entire graph-rolled autoencoder network is trained by minimizing the cross entropy between the original topological transform and the estimated topological transform.

The invention has the beneficial effects that: experimental results show that the performance of the Topoter is superior to that of the existing unsupervised model, and even results equivalent to (semi-) supervised methods are obtained in node classification and graph classification tasks. Meanwhile, in terms of model complexity, the number of parameters of the Topoter model is far less than that of the existing latest unsupervised model based on the contrast learning method.

Drawings

FIG. 1: graph topology transformation example graphs.

FIG. 2: topoter network model schematic.

Detailed Description

The present invention will be described in further detail below with reference to specific examples and the accompanying drawings. Before the main steps of the method of the present invention are introduced, the basic concept of the graph and the graph topology transformation are first introduced.

(1) Graph and graph signals:

an undirected graph is defined which,

is a collection of vertices on the graph,

n is the number of vertices on the graph; ε is the set of edges. Graph signals refer to data residing at the vertices of a graph, such as social networks, traffic networks, sensor networks, and neuron networks, represented as matrices

Wherein the ith row of the matrix represents a C dimension bit at vertex iAnd (5) carrying out characterization. To represent connectivity between nodes, we define the adjacency matrix as

The matrix is a real symmetric matrix. If a_i,j1 means that vertices i and j are connected; if a_i,j0 indicates that vertices i and j are not connected.

(2) And (3) graph topology transformation:

defining a topological transformation t as a slave graph

Add or delete edges in the original edge set epsilon in (c). Such an operation can be done by sampling, i.e. using a set of independent identically distributed "switching parameters" σ_i,jThe parameter determines whether to modify an edge (i, j) in the adjacency matrix. Suppose we have a Bernoulli distribution

Where p represents the probability that each edge is modified, we follow

Sampling a random matrix sigma ═ sigma [ [ sigma ] ]_i,j}_N×NI.e. by

We can then get the perturbed adjacency matrix:

wherein

Is an exclusive or operation. The method generates a transformed graph adjacency matrix through graph topology transformation t, namely

Transformed graph adjacency matrix

The sum of the original adjacency matrix a and the graph topology transformation matrix Δ a can also be written:

wherein Δ a ═ { δ a ═_i,j}_N×NInvolving perturbations of opposite sides, δ a_i,jE { -1,0,1 }. When delta a is shown in FIG. 1_i,jWhen 0, the edge between vertex i and vertex j remains unchanged (solid black line in the figure); when delta a_i,jWhen the value is-1 or 1, an edge is deleted or added in the original graph (in the graph, a gray solid line indicates an added edge, and a dotted line indicates a deleted edge).

The process of the present invention is described below. Given a graph and its associated graph signals, the present invention samples some node pairs from the original graph and then flips the connectivity of the edges between these node pairs with a certain probability to implement the graph topology transformation. Then, a graph convolution auto-encoder network is designed, the encoder learns the node representations of the original graph and the transformed graph respectively, and the decoder will estimate the topological transformation applied on the original graph from the feature representations of the pre-and transformed graphs. Finally, the entire auto-encoder network is trained by minimizing the cross entropy between the graph topology transformation and the estimated topology transformation.

Algorithm framework

Given map signals and their corresponding adjacency matrices (X, A), and t-transformed map signals and adjacency matrices

We call the function E (-) to satisfy "transform covariances" if the function E (-) satisfies the following equation:

Our goal is to learn a function E (·), which extracts a covariant representation of the graph signal X. To this end, we have designed an encoder-decoder network: we training graph encoder

The encoder encodes a feature representation of a node in the graph, wherein

Representing the mapping, H represents the feature matrix of the nodes in the original graph,

and representing a feature matrix of the nodes after the graph topology transformation. To ensure that the resulting transformation of the node feature representation is co-variant, we train a decoder

To estimate the graph topology transformation deltaa from the representation of the original graph signal and the transformed graph signal. From an information theory point of view, this requires that (H, Δ A) should contain the relevant information together

All necessary information of (a).

Then we can do this by maximizing (H, Δ A) and

mutual information between

To ensure transform co-variability of the map topology transformation. The larger the mutual information quantity is, the more the slave representation

Can infer more about Δ a. Therefore, we propose to maximize mutual information to learn graph topology transformation covariant representation as follows:

where θ represents a parameter in the autoencoder network.

However, it is difficult to directly calculate the mutual information. Instead, we derive that maximizing mutual information can be approximated as minimizing cross entropy, as described by the following theorem.

Theorem: maximizing mutual information

Can be approximated as minimizing a probability distribution

And

cross entropy between H (p | | q):

wherein,

representation solving

In the probability distribution

The following expectations are that,

a graph topology transformation matrix representing the decoder estimate.

And (3) proving that: by the chain rule of mutual information we have:

thus, mutual information

Is when

Mutual information of time reaching its minimum

The lower bound of (c). We can now relax the target to a maximized representation

Lower bound mutual information between graph topology transformation delta A

Where H (-) represents the conditional entropy. Since Δ a and H are independent, H (Δ a | H) ═ H (Δ a). At this time, mutual information is maximized

The following steps are changed:

according to the chain rule of conditional entropy, we have

Wherein

Can be viewed as conditional entropy

The upper bound of (c). At this time, the minimization problem in equation (6) can be rewritten as:

solving equation (7) requires a posteriori probability distribution

And (6) solving. Next, we introduce a conditional probability distribution

To approximate a posterior probability distribution

By definition of Kullback-Leibler divergence we have

Wherein D_KL(| q) represents the non-negative Kullback-Leibler divergence of p and q, and H (p | | q) represents the cross-entropy of p and q. At this time, equation (6) can be written to minimize the cross entropy as an upper bound:

at this time, we approximate the maximization problem in equation (4) to the optimization problem in equation (5).

Based on the above theorem, we train decoder D to learn probability distributions

To express from

In which the features are represented

And the map topology transformation Δ A may be derived from

And obtaining the intermediate sample. This allows us to minimize the size in equation (5)

And

cross entropy between. We therefore describe TopoTER as representing the joint optimization problem of encoder E and transform decoder D.

Network model

We have designed a graph convolution autoencoder network for TopoTER as shown in fig. 2. Given a graph signal X and a graph

The TopoTER unsupervised learning algorithm includes three steps: 1) and (3) graph topology transformation: sampling and disturbing some edges in epsilon to obtain a transformed adjacency matrix

And a graph topology transformation matrix Δ A; 2) representing the code; 3) transform decoding: estimating graph topology transformations from learned feature representationsAnd (4) parameters. We describe these three steps in detail below.

(1) Graph topology transformation

A subset is randomly selected from all vertex pairs to carry out topology disturbance, namely edges are added or deleted, so that the topological structure of the graph can be carved on different scales, and transformation parameters can be reduced to improve the calculation efficiency. In each training iteration, we sample all pairs of connected vertices in ε, using S₁Represents; and randomly sampling pairs of partially unconnected vertices, using S₀Represents:

S₀＝{(i,j)|a_i,j＝0},S₁＝{(i,j)|a_i,j＝1} (8)

wherein | S₀|＝|S₁And M. Next, we randomly place S₀And S₁Divided into two disjoint subsets:

where r represents the disturbance rate of the edge. Then, for

And

for each vertex pair (i, j), we flip the element of the original graph adjacency matrix a at the corresponding position. That is, if a_i,jThen we set the transformed adjacency matrix to 0

Element (1) of

Otherwise, set up

For the

And

for each vertex pair (i, j), we keep the original connectivity unchanged, i.e.

By accessing from Δ A

And

at position (i, j), we can obtain the sampled map topology transformation parameters. In addition, we can classify the transformation parameters into four types:

(a) add edges for broken vertex pairs:

(b) delete edge of connected vertex pair:

(c) keeping the original disconnect relationship:

(d) keeping the original connection relation:

therefore, will be selected from

The transformation parameter problem in estimating Δ a translates into a classification problem for the parameter type. The ratio of the four types is r: r (1-r) to (1-r).

(2) Representation encoder

An encoder E (-) is trained to encode the feature representation of each node in the graph. We extract a feature representation of each node in the graph signal using GCNN with shared weights. Taking GCN (Graph volume Network) as an example, the Graph volume in GCN is defined as:

where D represents the degree matrix of A + I,

is a learnable parameter matrix, and

a node characterization matrix with F output channels is represented. Similarly, the nodes of the transformed graph with shared weight W are characterized by:

in this case, characteristic representations H and H of the graph signals before and after the graph topology conversion are obtained

(3) Transform decoder

Comparing equations (10) and (11), H and

the difference between them lies in the second term in equation (11). This enables us to train the decoder

To estimate the graph topology transformation from the joint representation before and after the transformation. We first compute the features before and after the transformCharacterization of the differences between representations:

wherein, δ h_iRepresenting the difference in the feature representation before and after transformation of vertex i. Therefore, we can predict the topology transformation between node i and node j by the difference Δ H of the node characteristics. We first construct a representation of the edges:

wherein | · | | | non-conducting phosphor |, which represents the Hadamard product of two vectors₁Representing vector l₁And (4) norm. Edge representation e_i,jThen, the graph topology transformation parameters are predicted by being sent into a plurality of linear layers:

wherein,

represents the predicted graph topology transformation parameters and softmax (·) represents the activation function.

According to equation (5), the entire auto-encoder network will be trained by minimizing the following cross-entropy:

The method can be used in social networks, citation networks and other networks which can be represented by graphs (Graph), and analysis of large-scale complex Graph structures such as the social networks, the citation networks, brain networks and the like is achieved. For example, the present invention may classify small groups in a social network formed between people, analyze connections between various brain functional regions forming a brain network, and the like. Meanwhile, the method provided by the invention is an unsupervised method, and compared with the existing method, the Topoter provided by the invention can save the labeling cost, which has important significance on large-scale graph data existing in real life.

Table 1 and table 2 are the graph node classification task experiment results and the graph classification task experiment results, respectively. Experimental results show that the performance of the Topoter is superior to that of the existing unsupervised model, and results equivalent to (semi) supervision methods are obtained in node classification and graph classification tasks.

Table 1: graph node classification task results

Table 2: graph classification task results

In table 2, "> 1 day" indicates that the algorithm has not output the result after running for more than 24 hours, and "OOM" indicates that the method has an error of insufficient memory during running.

Based on the same inventive concept, another embodiment of the present invention provides an unsupervised graph topology transformation covariant characterization learning apparatus using the method of the present invention, which comprises a graph convolution automatic encoder network composed of an encoder and a decoder; the encoder learns the node characteristic representations of the original graph and the graph after the topological transformation respectively, and the decoder estimates the topological transformation applied to the original graph from the node characteristic representations of the original graph and the graph after the topological transformation; the entire graph-rolled autoencoder network is trained by minimizing the cross entropy between the original topological transform and the estimated topological transform.

Based on the same inventive concept, another embodiment of the present invention provides an electronic device (computer, server, smartphone, etc.) comprising a memory storing a computer program configured to be executed by the processor, and a processor, the computer program comprising instructions for performing the steps of the inventive method.

Based on the same inventive concept, another embodiment of the present invention provides a computer-readable storage medium (e.g., ROM/RAM, magnetic disk, optical disk) storing a computer program, which when executed by a computer, performs the steps of the inventive method.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for unsupervised graph topology transformation covariant characterization learning is characterized by comprising the following steps:

2. The method of claim 1, wherein the topology transformation is implemented by sampling partial node pairs from the original graph and then flipping the connectivity of edges between the node pairs with a certain probability.

3. Method according to claim 1, characterized in that the graph signals and the adjacency matrices (X, A) corresponding thereto are given, and the graph signals and the adjacency matrices subjected to the topological transformation t

4. The method of claim 3, characterized by maximizing the sum of (H, Δ A)

Mutual information between

5. The method of claim 4, wherein mutual information is to be maximized

Is approximated as a minimized probability distribution

And

cross entropy between H (p | | q):

wherein,

representation solving

In the probability distribution

The following expectations are that,

a graph topology transformation matrix representing the decoder estimate.

6. The method of claim 1, wherein the graph topology transformation parameters are divided into four types, such that the secondary graph topology transformation parameters are divided into four types

(a) add edges for broken vertex pairs:

(b) delete edge of connected vertex pair:

(c) keeping the original disconnect relationship:

(d) keeping the original connection relation:

7. the method of claim 6, wherein estimating the topology transformation applied to the original graph from the node feature representation of the original graph and the topology transformed graph comprises:

wherein softmax (·) represents an activation function;

8. An unsupervised graph topology transformation covariant characterization learning device adopting the method of any one of claims 1 to 7, characterized by comprising a graph convolution automatic encoder network consisting of an encoder and a decoder; the encoder learns the node characteristic representations of the original graph and the graph after the topological transformation respectively, and the decoder estimates the topological transformation applied to the original graph from the node characteristic representations of the original graph and the graph after the topological transformation; the entire graph-rolled autoencoder network is trained by minimizing the cross entropy between the original topological transform and the estimated topological transform.

9. An electronic apparatus, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for performing the method of any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a computer, implements the method of any one of claims 1 to 7.