CN114880538A

CN114880538A - Attribute graph community detection method based on self-supervision

Info

Publication number: CN114880538A
Application number: CN202210639903.9A
Authority: CN
Inventors: 刘磊; 苏伟; 刘永升; 马永强; 原彦平; 史春燕; 张久文; 张烜
Original assignee: Reader Publishing Group Co ltd; Lanzhou University
Current assignee: Reader Publishing Group Co ltd; Lanzhou University
Priority date: 2022-06-08
Filing date: 2022-06-08
Publication date: 2022-08-09

Abstract

The invention relates to an attribute graph community detection method based on self-supervision, which comprises the following steps: extracting an initial adjacency matrix and an initial feature matrix in the graph attention network, and respectively preprocessing the initial adjacency matrix and the initial feature matrix to obtain a converted adjacency matrix and a filtered feature matrix; reconstructing through a self-encoding unit based on the converted adjacent matrix and the filtered characteristic matrix to obtain a reconstructed adjacent matrix and a reconstructed characteristic matrix; inputting the reconstructed feature matrix into a clustering unit for calculation to obtain a clustering result; and respectively merging the loss functions in the self-coding unit and the clustering unit to obtain a final overall loss function, carrying out clustering calculation on the final overall loss function to obtain a final clustering result, and obtaining a partitioning result based on the final clustering result. Compared with a graph convolution network, the method has the advantage of adding weight and has better effect in a network with dense node connection.

Description

Attribute graph community detection method based on self-supervision

Technical Field

The invention relates to the technical field of computer networks, in particular to an attribute graph community detection method based on self-supervision.

Background

Graph neural networks apply to a number of aspects of graph-related tasks, including node classification, graph clustering, and the like. In the node classification of the graph, the method generally applied to an automatic encoder performs back propagation by taking the error of a real label and a predicted label as a loss function. In the graph classification task, a pooling mode is commonly used, nodes in a complex network are aggregated and treated as one point, the whole network is gradually represented, and the downstream network representation classification task is gradually performed. In the graph clustering task, no real label exists, so a loss function is often constructed in different ways to obtain a division result of a specified category number.

The task of deep learning can be divided in most cases into two types, one is supervised learning and one is unsupervised learning, with the difference being whether manually labeled tag values are used. The goal of the self-supervision learning is to mine self-supervision information from large-scale unsupervised data, and the constructed supervision information is beneficial to extracting data representation suitable for downstream tasks. That is, the task of the self-supervised learning does not use artificial labeling, and is mostly called as one of the unsupervised learning, but a proper calling method should be the self-supervised learning. In the actual deep learning task, labels marked by people are not available in most cases, and the cost of marking by people is high, so the importance of the self-supervision learning is self-evident.

How to accomplish the task of community division without labels is referred to as graph clustering task in graph-related tasks. In the graph clustering task, input data comprises two types, wherein one type is topology structure information of a graph, the structure information of the graph is usually represented by using an adjacency matrix, the information of edges between nodes in the graph is represented, and the other type is characteristic information of the graph, and the characteristic matrix is usually used for representing the characteristic information of each node. And the data of the adjacency matrix and the characteristic matrix are respectively preprocessed, so that the execution of downstream tasks is facilitated. Most graph clustering tasks can be divided into two parts, wherein the former part is a process for learning graph information, and the latter part is a process for obtaining clustering results through a clustering algorithm. Dividing a separate task into an upstream and a downstream two-stage tasks inevitably causes precision loss.

Disclosure of Invention

The invention aims to provide an attribute graph community detection method based on self-supervision, which solves the problems in the prior art.

In order to achieve the purpose, the invention provides the following scheme:

the attribute graph community detection method based on self-supervision comprises the following steps:

extracting an initial adjacency matrix and an initial feature matrix in the graph attention network, and respectively preprocessing the initial adjacency matrix and the initial feature matrix to obtain a converted adjacency matrix and a filtered feature matrix;

reconstructing through a self-coding unit based on the converted adjacent matrix and the filtered characteristic matrix to obtain a reconstructed adjacent matrix and a reconstructed characteristic matrix; inputting the reconstructed feature matrix into a clustering unit for calculation to obtain a clustering result;

and respectively merging the loss functions in the self-coding unit and the clustering unit to obtain a final overall loss function, carrying out clustering calculation on the final overall loss function to obtain a final clustering result, and obtaining a partitioning result based on the final clustering result.

Preferably, the process of preprocessing the initial adjacency matrix and the initial feature matrix respectively comprises:

and adding a diffusion function into the initial adjacency matrix, smoothing the initial characteristic matrix, then putting the initial characteristic matrix into the self-encoding unit and the clustering unit, obtaining low-dimensional representation of network data through the self-encoding unit, and performing clustering tasks through the clustering unit to optimize a target function.

Preferably, adding the diffusion function to the initial adjacency matrix includes:

based on the sum of the multi-order adjacent matrixes, simulating the relation between different orders of adjacent matrixes through a diffusion function, and converting the utilization matrixes to obtain converted matrixes; processing the converted matrix through symmetrical conversion to obtain an initial adjacent matrix added with a diffusion function; wherein the initial adjacency matrix is the multi-order adjacency matrix.

Preferably, the diffusion function is a personalized webpage ranking function

Wherein k is a neighbor order; alpha and t are hyper-parameters and are adjusted according to the data set.

Preferably, the process of smoothing the initial feature matrix includes:

and performing low-pass filtering processing on the initial feature matrix in a Laplacian Smoothing Filter mode, wherein the low-pass filtering processing is used for enabling the filtered feature matrix to embody the overall node features of the graph, and obtaining the filtered feature matrix.

Preferably, a filter H is used for performing the low-pass filtering, where the filter H is:

H＝I-kL (2)

in the formula, I is an identity matrix, L is a laplacian matrix, and k is a laplacian matrix coefficient.

Preferably, the self-coding unit includes a self-coding layer, the self-coding layer is a graph self-coding layer, and the graph self-coding layer uses two layers of GATs as a basic coding layer unit, where the first layer of GAT uses the initial feature matrix as an input, the input of the second layer of GAT is the output of the first layer, and meanwhile, topology information of the network and feature information of the node are added to the two layers of GATs, so that the output of the encoder obtains all information of the network completely.

Preferably, the self-coding unit further includes a graph code layer symmetric to the graph self-coding layer, and the graph code layer is configured to reconstruct different graph information, obtain a reconstructed adjacent matrix and a reconstructed feature matrix, and further obtain a reconstruction error of the adjacent matrix and a reconstruction error of the feature matrix.

Preferably, in the clustering unit, the KL divergence is selected as an objective function for measuring the clustering result:

wherein p is _iu For the target distribution, q _iu Embedding h for coding layer output _i And cluster center mu _u The similarity between the nodes is measured, t distribution is used as the standard of measurement, the t distribution obtains a distribution function aiming at different node clusters, and Q refers to the embedding h output by a coding layer _i And cluster center mu _u P is a distribution in which the resolution is increased by using Q distribution as a soft label using a square method.

Preferably, the overall loss function includes a reconstruction error of the adjacency matrix, a reconstruction error of the feature matrix, and an error function for measuring the clustering result, and the overall loss function is used as a final loss function L, which is expressed by the following formula (4):

L＝L _F +βL _A +γL _KL (4)

wherein L is _F Is the reconstruction error of the feature matrix, L _A Is the reconstruction error of the adjacency matrix, L _KL Is an error function that measures the clustering effect.

The beneficial effects of the invention are as follows:

compared with the traditional automatic encoder, the automatic encoder of the invention adds two reconstruction errors, not only reconstructs the structure information adjacent matrix of the network, but also reconstructs the node characteristic information characteristic matrix in the network, and can obtain high-quality node representation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a method of an embodiment of the present invention;

FIG. 2 is a diagram of an algorithm model architecture according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the present invention provides an attribute graph community detection method based on self-supervision, which includes:

extracting an initial adjacency matrix and an initial feature matrix in the graph attention network, and respectively carrying out preprocessing operation on the initial adjacency matrix and the initial feature matrix to obtain a converted adjacency matrix and a filtered feature matrix;

As shown in fig. 2, where a represents structure information of the network, X represents node information in the network, Z represents hidden layer dimension representation of network node feature information, and then performs back propagation calculation according to the hidden layer representation of the feature information: firstly, preprocessing data, adding a diffusion function into an adjacent matrix, smoothing a characteristic matrix, then putting the characteristic matrix into a downstream automatic encoder and a clustering module, obtaining low-dimensional representation of network data by using the automatic encoder, performing a clustering task, combining the low-dimensional representation and the clustering task into the same task, and optimizing a target function to obtain a final clustering result.

Further optimizing the scheme, the process of performing the preprocessing operation comprises:

The adjacency matrix refers to a matrix reflecting the relationship between adjacent nodes among the nodes, and reflects the structure information of the graph, in other words, the information of the edges in the graph is represented, and the two ends to which the edges belong, the directions of the edges, and the weights of the edges can be represented in the adjacency matrix. Although the adjacency matrix has good representation effect, the structural information of the graph can be well represented, and the adjacency matrix can directly participate in the operation in the graph neural network.

In this embodiment, the sum of the multi-order adjacency matrices is selected:

n is a specified positive integer value for constituting a multi-order adjacency matrix.

Selecting a mode based on a diffusion function, and simulating the relation between different orders of neighbors by using the diffusion function, wherein the diffusion function is a personalized webpage ranking function

The personalized webpage ranking function is reduced along with the increase of the order of the neighbor, and accords with the state that the influence of the node on the neighbor node in the same composition is reduced along with the distance, so that the diffusion function can be applied to the improvement of the adjacency matrix:

and converting the utilization degree matrix to obtain a converted matrix which can better express the topological structure information of the network:

in this embodiment, a PPR diffusion function and a symmetric transformation matrix are selected, and the transformation matrix is added to the diffusion function, so as to obtain:

the resulting transformed adjacency matrix may be processed by symmetric transformation as input to the model:

wherein the content of the first and second substances,

is shown in the formula (5) in the specification,

is made of

The corresponding power of the diagonal matrix generated.

Different nodes in the feature matrix have different correlations, and the correlation between adjacent nodes should be higher, i.e. the correlation between adjacent nodes should be smoother.

In this embodiment, the initial feature matrix is subjected to low-pass filtering by using a Laplacian smooth Laplacian Smoothing Filter, so that the filtered feature matrix represents the overall node features of the graph, and the filtered feature matrix is obtained.

Preferably, the filter H used when performing the low-pass filtering process is:

H＝I-kL (7)

in the formula, I is an identity matrix and L is a laplace matrix.

The filter can be piled up, realizes filtering capability:

in the formula: k is a hyper-parameter which can be selected according to the specific situation of the graph, and practice proves that a symmetrical transformation matrix L is used _sym In order to achieve the best effect of the filter, the filter can be used

When the temperature is higher than the set temperature

When the filter is at

The corresponding node has no low-pass performance, while

When the filter is used, the high-frequency part cannot be completely taken out; therefore, select

Is the best choice.

Auto-encoders (Auto-encoders) are a method that can learn from data itself to a satisfactory effective representation, and therefore do not require a preset tag, and this method of Learning by data itself is called Self-supervised Learning (Self-supervised Learning). For graph clustering tasks, efficient representation of learning data is necessary, and based on this situation, a self-encoding layer needs to be designed, and the encoding result is used as input data of the next layer.

In the present embodiment, the graph-based task uses information extracted from the graph encoder, and the graph automatic encoder is selected for use. The graph automatic encoder uses a graph neural network as a basic encoding and decoding layer unit, GAT improves the mode of looking at all the neighbor nodes of a node in GCN identically, calculates the correlation between the GAT and the neighbor nodes, and improves the performance of the GCN as a calculation parameter, and GAT is used as the basic encoding and decoding layer unit in order to obtain better performance. The propagation mode between GAT layers is as follows:

the state of each node in the next iteration is based on the state of the neighbor node, wherein sigma is an activation function, the operation is finished at each layer,

data representation at the n-th level for each node, α _ij Is a representation of the correlation of node i with its neighbor node j, α _ij Is expressed as:

wherein e is _ij Meaning the importance of the neighbor node j of the node i to the node i, normalizing the neighbor node by using a Softmax activation function, wherein e _ij Is defined as:

e _ij ＝LeakyReLU(a(Wh _i ，Wh _j )) (11)

wherein W is a conversion weight matrix and a is a single-layer feedforward network, and the correlation of two nodes is obtained.

Two layers of GAT were used as the encoder part of the model:

the first layer uses the initial feature matrix X as input, while the input of the second layer is the output of the first layer. And simultaneously adding the topological structure information of the network and the characteristic information of the nodes, so that the output of the encoder can completely obtain all information of the network.

In this embodiment, two decoders are used to reconstruct two different types of graph information, so that the obtained data representation is more reasonable.

The first reconstruction method used is a symmetric architecture, and the decoding layer also uses two layers of GAT:

with a symmetric decoder, a data representation is obtained with the same dimensions as the input information, where the difference from the input feature matrix can be computed as a loss function:

the meaning of the equation is that the mean square error of the value of each corresponding position is calculated as the error of the feature matrix, and the more similar the reconstruction result is, the smaller the error is. In addition, another reconstruction method obtains a reconstructed adjacency matrix in the form of an inner product:

where H is the result of the coding layer, which is N in a templateD _H Matrix of D _H Is the dimension of the hidden layer. The Sigmoid function is an activation function, and a reconstructed adjacent matrix is obtained in the form of inner product

Based on this loss function can be written as:

the difference between the two is calculated using cross entropy to measure the performance of the adjacency matrix reconstruction.

The encoding layer outputs a data representation of a specified dimension, and the result is output as an input of the clustering layer. Clustering is used as an unsupervised algorithm, and if the clustering is used only without a given optimization target, the effect is not necessarily appropriate.

In this embodiment, KL divergence is selected as an objective function for measuring a clustering result:

KL divergence is used to measure the degree of fit between probability distributions, where p _iu For the target distribution, q _iu Embedding h for coding layer output _i And cluster center mu _u The similarity between the nodes is measured, and t distribution is used as a measurement standard, and the t distribution can obtain a distribution function for different node clusters, so that the current distribution Q is obtained:

in the formula: v is the degree of freedom in the t distribution, which affects the shape of the t distribution, which is positive based on y-axis symmetryIn a more common form of state distribution, the shape of t distribution gradually approaches normal distribution as the degree of freedom increases, and the distribution shape can be changed by using adjustable degree of freedom, so that the distribution formula is suitable for more node clusters. q. q.s _iu Can be used as a soft label for each node, which refers to probabilities belonging to various types. Based on the soft label, a target distribution P can be obtained:

if the numerical value in the soft label is larger, the node is more likely to belong to the designated category, so that the discrimination of the probability distribution is increased by using a square mode, the probability distribution is used as a target distribution, and the clustering effect can be well judged by judging the fitting degree of the two distributions.

The clustering method adopted in the embodiment is a K-Means algorithm, the clustering mode is simple in form, a good clustering effect can be achieved, and the basic principle of the K-Means algorithm is that a least square error E:

wherein, C _i For each cluster divided, mu _i For the clustering center, minimizing the square error of each node and the clustering center to which the node belongs, and iteratively calculating the clustering center, wherein the computing formula of the clustering center is as follows:

and taking the average value of the nodes in each cluster of the ith round-1 as the cluster center of each cluster of the ith round.

Based on the self-coding layer and the clustering layer, each part has an independent loss function, and the three loss functions are combined in the paper to be used as the final loss function, which includes:

L＝L _F +βL _A +γL _KL (24)

as a final overall loss function, where L _F Is the reconstruction error of the feature matrix, L _A Is the reconstruction error of the adjacency matrix, L _KL As an error function for measuring the clustering effect, beta is larger than or equal to 0 and gamma is larger than or equal to 0, the balance among the three errors is controlled, the reconstruction loss and the clustering loss which are originally used as two tasks are combined, and the whole task is optimized, so that the data representation learned by the self-encoder is more suitable for being used as the data representation of clustering, and a more optimal clustering result can be obtained.

The final clustering result can be obtained directly from the current probability distribution Q:

S _i ＝argmax q _iu (25)

the maximum value of each node corresponding to each cluster can be calculated, and the index corresponding to the maximum value is the label of node division.

The AEAGC algorithm uses an automatic encoder as a way of obtaining graph data representation, and then obtains the clustering result of the nodes through a K-Means clustering algorithm. The diffusion function and the low-pass filtering are used to improve the characteristic matrix and the adjacent matrix, and the clustering process is quantified through KL divergence. Firstly, preprocessing data, adding a diffusion function into an adjacent matrix, smoothing a characteristic matrix, then putting the characteristic matrix into a downstream automatic encoder and a clustering module, obtaining low-dimensional representation of network data by using the automatic encoder, performing a clustering task, combining the low-dimensional representation and the clustering task into the same task, and optimizing a target function to obtain a final clustering result. The process of applying the AEAGC algorithm in friend recommendation is as follows:

input data of the AEAGC algorithm includes an adjacency matrix and a feature matrix, which respectively correspond to the structural information and the node feature information of the complex network, so that when information acquisition is performed, friend information of a user and feature information of the user need to be acquired at the same time, the friend information of the user quantifies to obtain the adjacency matrix of the complex network, and the feature information may include a plurality of parts, for example: gender, age, date of birth, place of residence, etc., and the obtained information is quantified into a feature matrix.

And executing an AEAGC algorithm according to the obtained complex network. And processing the adjacency matrix by using a diffusion function to simulate a diffusion effect, wherein the influence of the nodes is weakened along with the distance. The feature matrix is processed using a low pass filter to make the feature information smoother. And taking the processed matrix as the input of an automatic encoder, reconstructing a structural information adjacent matrix of the network and a node characteristic information characteristic matrix in the network by the automatic encoder, measuring a clustering result by using KL divergence, and integrating the three as a target function. And training to obtain a community division result.

And according to the community division result, each user has a set of division results, the set of division results is compared with the set of the current friends of the user, and the difference set of the two sets is taken as the content recommended by the friends of the user.

The invention provides an attribute graph community detection method based on self-supervision, which takes a graph attention network as a basic network layer, has the advantage of adding weight compared with a graph convolution network, has better effect in a network with denser node connection, and forms an automatic encoder by using the network layer for obtaining low-dimensional representation of network nodes. Meanwhile, aiming at a downstream clustering task, a K-Means algorithm is selected, in order to measure the result of the clustering task, a KL divergence mode is selected, the result of the KL divergence is used as a part of a target function to carry out back propagation, the distribution of a clustering center and the distribution of nodes are selected, and the distribution fitting of the two shows that the clustering effect is more reasonable. Meanwhile, a diffusion function is selected to process the adjacent matrix, so that the generalization capability of the model under the condition of limited network layer number is improved; and (4) performing low-pass filtering on the characteristic matrix, wherein the filtered representation can represent the characteristics of the nodes more and the characteristic information is smoother. The AEAGC algorithm is applied to a real network data set with a label, and the result shows that the accuracy of the execution result of the AEAGC algorithm in the real network data set is improved by up to 20% compared with the optimal comparison algorithm, and the improvement of four evaluation indexes can be improved by up to 15% after DGI is introduced as a data enhancement mode.

The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims

1. The attribute graph community detection method based on self-supervision is characterized by comprising the following steps:

reconstructing through a self-encoding unit based on the converted adjacent matrix and the filtered characteristic matrix to obtain a reconstructed adjacent matrix and a reconstructed characteristic matrix; inputting the reconstructed feature matrix into a clustering unit for calculation to obtain a clustering result;

and respectively combining the loss functions in the self-coding unit and the clustering unit to obtain a final overall loss function, performing clustering calculation on the final overall loss function to obtain a final clustering result, and obtaining a partitioning result based on the final clustering result.

2. The method according to claim 1, wherein the preprocessing the initial adjacency matrix and the initial feature matrix respectively comprises:

and adding a diffusion function into the initial adjacency matrix, smoothing the initial characteristic matrix, then placing the initial characteristic matrix into the self-coding unit and the clustering unit, obtaining low-dimensional representation of network data through the self-coding unit, and performing clustering tasks through the clustering unit to optimize a target function.

3. The unsupervised attribute graph community detection method of claim 2, wherein adding the diffusion function to the initial adjacency matrix comprises:

4. The method of claim 3, wherein the diffusion function is a personalized web ranking function

5. The method according to claim 2, wherein the step of smoothing the initial feature matrix comprises:

6. The method according to claim 5, wherein a filter H is adopted for the low-pass filtering, wherein the filter H is:

H＝I-kL (2)

in the formula, I is a unit matrix, L is a laplace matrix, and k is a laplace matrix coefficient.

7. The method as claimed in claim 1, wherein the self-coding unit includes a self-coding layer, the self-coding layer is a graph self-coding layer, and the graph self-coding layer uses two layers of GATs as basic coding layer units, wherein the first layer of GAT uses the initial feature matrix as input, and the second layer of GAT inputs the first layer of output, and adds topology information of the network and feature information of the node to the two layers of GAT simultaneously, so as to enable the output of the encoder to obtain all information of the network completely.

8. The method as claimed in claim 7, wherein the self-encoding unit further includes a graph code layer symmetric to the graph self-encoding layer, and the graph code layer is configured to reconstruct different graph information, obtain a reconstructed adjacency matrix and a reconstructed feature matrix, and further obtain a reconstruction error of the adjacency matrix and a reconstruction error of the feature matrix.

9. The unsupervised attribute graph community detection method of claim 1, wherein in the clustering unit, KL divergence is selected as an objective function for measuring the clustering result:

wherein p is _iu For the target distribution, q _iu Embedding h for coding layer output _i And cluster center mu _u The similarity degree between the two nodes is measured, t distribution is used as the standard of measurement, the t distribution obtains distribution functions aiming at different node clusters, and Q refers to the embedding h output by a coding layer _i And cluster center mu _u Formed by twothe t distribution and P are distributions formed by increasing the resolution using Q distribution as a soft label by squaring.

10. The method for detecting community of attribute maps based on self-supervision as claimed in claim 9, wherein the overall loss function includes reconstruction error of adjacency matrix, reconstruction error of feature matrix and error function for measuring clustering result, and the overall loss function is used as final loss function L, as shown in the following formula (4):

L＝L _F +βL _A +γL _KL (4)