CN113254717A

CN113254717A - Multidimensional graph network node clustering processing method, apparatus and device

Info

Publication number: CN113254717A
Application number: CN202110645181.3A
Authority: CN
Inventors: 魏迎梅; 韩贝贝; 康来; 冯素茹; 蒋杰; 谢毓湘; 万珊珊; 杨雨璇
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2021-06-10
Filing date: 2021-06-10
Publication date: 2021-08-13

Abstract

The application relates to a multidimensional graph network node clustering processing method, a multidimensional graph network node clustering processing device and multidimensional graph network node clustering processing equipment, wherein the method comprises the following steps: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology. The purpose of remarkably improving the clustering effect is achieved, and the clustering effect is excellent.

Description

Multidimensional graph network node clustering processing method, apparatus and device

Technical Field

The present application relates to the field of network data processing technologies, and in particular, to a multidimensional graph network node clustering method, apparatus, and device.

Background

The network theory can be used for modeling complex relationships among various entities in real life, and the method for the relationships among individuals in the traditional modeling system mostly adopts a simple single network or a single-layer network, namely, the networks with the same node types and only one interaction type exist in the networks; wherein, the nodes in the network represent individuals in a complex system, and the continuous edges represent the interactive relationship existing between the individuals. The multilayer network can model different interaction relations existing among individuals, in other words, the multilayer network is a network comprising a plurality of layers, and each layer network in the layers is formed by independent single-layer networks (namely, traditional networks); edges in each layer of a multi-layer network are of the same type, but the edge types in different layers may be different; the node type of each layer in a multi-layer network may also be different.

Networks at different levels in a multidimensional graph network are composed of the same entities, and the connection relationships between nodes in each level have different properties. A multidimensional graph network is a special type of network that is a multi-layer network. The objective of attribute single-layer network node clustering is to satisfy the following requirements: 1) the structure compactness, namely the nodes in the same cluster are closely connected, and the nodes in different clusters are far away; 2) attribute homogeneity, i.e., nodes in the same cluster have similar attribute values, while nodes in different clusters have significant differences in attribute values. In practice, for node clustering of an attribute multidimensional graph network, not only the above-mentioned structure compactness and attribute homogeneity need to be satisfied, but also the association relationship between different dimensions and the information amount of different dimension graph networks need to be considered when node clustering is performed, that is, the influence of different dimension graph networks on node clustering of the whole system plays different importance. However, in the process of implementing the present invention, the inventor finds that the node clustering technology of the current graph network has a technical problem of poor clustering effect.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a multidimensional graph network node clustering method with a better clustering effect, a multidimensional graph network node clustering device, a computer device, and a computer readable storage medium.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in one aspect, an embodiment of the present invention provides a multidimensional graph network node clustering method, including:

converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;

according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network;

converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model;

clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network;

and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.

In another aspect, a multidimensional graph network node clustering processing apparatus is also provided, including:

the network conversion module is used for converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;

the migration processing module is used for carrying out multilayer network random migration processing of in-layer and cross-layer on the weighted multidimensional graph network according to the built in-layer transition probability and cross-layer random migration probability to obtain a sampling sequence of each node of the weighted multidimensional graph network;

the embedding processing module is used for converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model;

the clustering processing module is used for clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network;

and the visualization module is used for embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.

In still another aspect, a computer device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above-mentioned multidimensional graph network node clustering processing methods when executing the computer program.

In still another aspect, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned methods for processing clusters of nodes in a multidimensional graph network.

One of the above technical solutions has the following advantages and beneficial effects:

according to the method, the device and the equipment for processing the clustering of the nodes of the multidimensional graph network, an original unweighted attribute multidimensional graph network (namely the unweighted multidimensional graph network) is converted into a weighted multidimensional graph network, and comprehensive similarity characteristics of attribute similarity and structural similarity between nodes with connected edges are coded in the conversion, so that the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.

Drawings

FIG. 1 is a schematic diagram of a conventional single-layer network and a multi-dimensional network;

FIG. 2 is a flowchart illustrating a method for clustering nodes in a multidimensional graph network according to an embodiment;

FIG. 3 is a flow diagram that illustrates the conversion process of the multidimensional graph network in one embodiment;

FIG. 4 is a diagram illustrating conversion of an unweighted multidimensional graph network to a weighted multidimensional graph network, under an embodiment;

fig. 5 is a schematic block diagram of a multidimensional graph network node cluster processing apparatus in an embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should be considered to be absent and not within the protection scope of the present invention.

As shown in fig. 1, which is a schematic diagram of a conventional single-layer network and a multidimensional network, different layers of the multidimensional network can be regarded as interaction relationships existing in different angles of the same node set. If the network with multiple interactive relationships is represented as a multi-relationship fusion network as shown in fig. 1(a), the respective structural features of the same dimension graph network and the coupling information and the interactive association information between different dimensions cannot be clearly expressed. Compared with fig. 1(B), fig. 1(C) can more clearly represent the interaction relationship of the same node set in three different dimensions, namely dimension a, dimension B and dimension C, and the correlation information between layers. Compared with the traditional single-layer network, the multi-dimensional graph network describes different characteristics of the complex system from different angles, and makes up for the deviation brought by a single visual angle, so that the result obtained by analyzing the complex system based on the multi-dimensional graph network is more accurate. When a node in the network contains an attribute feature, the multidimensional graph network is called an attribute multidimensional graph network.

In addition, the visualization technology expresses information in a visual image mode, and powerful support is provided for discovering and understanding scientific laws. The graph visualization becomes an important graph network data analysis method, and the method mainly comprises the following steps: a force-guided based method and a data-dimension-reduction based method. Compared with a force guiding method, the graph visualization technology based on data dimension reduction strives to maintain the similarity between the node distribution in the original graph space and the two-dimensional layout space by optimizing an objective function, so that the node distribution in the two-dimensional layout space can reflect the node information in the original graph space. The visualization technology based on nonlinear dimension reduction can reflect structural data with nonlinear relation, and is more widely applied compared with the visualization technology based on linear dimension reduction.

The objective of attribute single-layer network node clustering is to satisfy the following requirements: 1) the structure compactness, namely the nodes in the same cluster are closely connected, and the nodes in different clusters are far away; 2) attribute homogeneity, i.e., nodes in the same cluster have similar attribute values, while nodes in different clusters have significant differences in attribute values. In practice, for node clustering of an attribute multidimensional graph network, not only the above-mentioned structure compactness and attribute homogeneity need to be satisfied, but also the association relationship between different dimensions and the information amount of different dimension graph networks need to be considered when node clustering is performed, that is, the influence of different dimension graph networks on node clustering of the whole system plays different importance.

The inventor finds that the node clustering technology of the current graph network has the technical problem of poor clustering effect, which can be specifically shown as follows: 1) a classical network embedding technology (node2vec) based on random walk focuses on analyzing a single-layer network, and the single-layer network usually only has topology information without considering the attribute characteristics of nodes; 2) While the attribute graph network node clustering method considering the attribute characteristics is generally only directed to a single-layer graph network. 3) The traditional node clustering method of the multidimensional graph network is usually a node clustering method of a single-layer network, such as modularity or matrix decomposition, and the like, and is extended to the multidimensional graph network, however, the methods are not suitable for large-scale graph networks.

The invention provides an effective solution to the technical problem of poor clustering effect of the node clustering technology of the current graph network, and can remarkably enhance the clustering effect of the nodes.

For convenience of illustration and understanding, the structure shown in FIG. 1(c) is

A multidimensional graph network (or multi-relationship network) composed of graph networks of different dimensions is represented as:

，

representing the channels in the multidimensional network

A node set consisting of individual nodes;

representing the set of edges of the multi-dimensional network

；

And

respectively representing the sizes of the node set and the edge set in the multidimensional network;

is represented by

A feature matrix formed by the feature values of the nodes;

an adjacency matrix representing the multidimensional network;

the dimension of expression is

A graph network of;

indicating that the graph network is an unlicensed network (sized to be

) Wherein, when

Representing nodes

And node

In the dimension of

There are connected edges in the graph network (i.e. there are connected edges in the graph network)

) Otherwise, the

。

Referring to fig. 2, in one aspect, the present invention provides a method for processing node clusters of a multidimensional graph network, including the following processing steps S12 to S20:

and S12, converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes.

And S14, carrying out multilayer network random walk processing of the weighted multidimensional graph network in and across layers according to the built in-layer transition probability and cross-layer random walk transition probability to obtain a sampling sequence of each node of the weighted multidimensional graph network.

It is understood that in a multidimensional graph network, graph networks in different dimensions represent different relationships of the same group of nodes in different views, and attribute information of the nodes

Shared by all dimensions. The purpose of the multidimensional graph network node clustering is to detect the cluster shared by all the multidimensional graph networks, and meanwhile, the correlation information among different multidimensional graph networks needs to be considered. From the perspective of the whole system, different image layers generally play different roles in the performance of the whole system, that is, the graph networks with different dimensions can be sorted according to different importance degrees exerted in the node clustering performance of the multidimensional graph network, and a corresponding sorting result with a descending order can be obtained.

Based on the above setting conditions, the embodiment designs a multilayer random walk method for a multidimensional graph network, which includes two walk processing conditions: one is random walk in the layer; the second is cross-layer random walk.

S16, converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; it can be understood that the skip gram model is also a skip-gram neural network model that is known in the art.

S18, clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network; it can be understood that the K-means algorithm is also known in the art as a K-means clustering algorithm (K-means clustering algorithm), which is a clustering analysis algorithm for iterative solution. And (3) clustering the low-dimensional embedding by adopting a K-means clustering algorithm, namely dividing each node into K different clusters, so that the sum of squares in each cluster is minimum, and obtaining a clustering result of the node, namely K different clusters.

And S20, embedding and projecting each low-dimensional image into a two-dimensional space by using a dimension reduction technology and displaying a clustering result by using a graph visualization technology.

According to the multi-dimensional graph network node clustering processing method, an original unweighted attribute multi-dimensional graph network (namely the unweighted multi-dimensional graph network) is converted into a weighted multi-dimensional graph network, comprehensive similarity characteristics of attribute similarity and structure similarity between nodes with connected edges are coded in the conversion, and the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.

Referring to fig. 3 and 4, in an embodiment, the step S12 may include the following steps:

s122, for each dimension of the graph network, determining the attribute similarity of the nodes with the connected edges according to the similar number of the nodes with the connected edges in the attribute vector of the F-dimension attribute;

s124, determining the structural similarity between the nodes by adopting a structural similarity measurement method;

and S126, adding weights to each unweighted connecting edge in the graph network of each dimension by utilizing the attribute similarity and the structure similarity, and converting the unweighted multidimensional graph network into a weighted multidimensional graph network.

It can be understood that the clustering target of the attribute multidimensional graph network needs to satisfy not only the compactness of the structure but also the homogeneity of the attributes. Therefore, based on the idea, for the graph network of each dimension, firstly measuring the attribute similarity between the nodes with connected edges in the graph network, secondly calculating the structural similarity between the nodes based on a structural similarity measurement method, and then fusing the attribute similarity and the structural similarity between the nodes to obtain the connected edge weight between the nodes, namely converting the unweighted graph network of each dimension into a weighted graph network.

Specifically, for processing attribute similarity:

the most intuitive measure has nodes with edges: (

) The method for attribute similarity between nodes is to compare the nodes one by one

And node

Attribute vector of

And

in (1),

number of similarities in dimension attributes:

(1)

based on the formula (1), the node

And node

The attribute similarity between them can be expressed as:

(2)

for the treatment of structural similarity:

there are many Common methods for measuring structural similarity between nodes, such as Common neighbor algorithm (CN), Jaccard Coefficient, Resource Allocation Index (RA), adaptive Adar Index (AA Index), Preferred Attachment (PA), and Community Resource Allocation Index (Community Resource Allocation Index). The structural similarity between the nodes can be determined by adopting any one of the structural similarity measurement methods.

In one embodiment, preferably, the structural similarity measure is an RA metric measure. Since the RA index is the best performance method in the tasks of community detection and link prediction graph analysis, the structural similarity between nodes is measured based on the RA index in this embodiment, and the specific calculation method is as follows:

(3)

wherein the content of the first and second substances,

representing nodes

And node

A common neighbor between them and a common neighbor between them,

representing the value of each node in the common neighborhood, taking the reciprocal and then adding up to obtain the node

And node

RA value in between, i.e. node

And node

Structural similarity of (c). In this way, optimal processing performance can be achieved in the process of determining structural similarity between nodes.

Constructing a weighted multidimensional graph network based on the attribute similarity and the structural similarity: in particular, for dimensions of

Graph network of

Each of the unauthorized strips can be connected to the edge

Adding a weight

The weight encodes the node

And node

The structural similarity and the attribute similarity are calculated according to the formula (4):

(4)

wherein the parameters

And parameters

Respectively for measuring nodes

And node

The relative magnitude of structural and attribute similarity between them.

Represents the calculation result of the formula (3),

the calculation result of formula (2) is expressed. Based on equation (4), the attribute unweighted multidimensional graph network can be constructed

Conversion into a weighted multidimensional graph network

As shown in fig. 4, wherein A, B and C represent three different dimensions, respectively.

In an embodiment, regarding the step S14, the process of performing intra-layer and inter-layer multilayer network random walk processing on the weighted multidimensional graph network according to the built intra-layer transition probability and cross-layer random walk transition probability may specifically include the following processing procedures:

carrying out intra-layer biased random walk processing on the weighted multi-dimensional graph network by adopting an embedding method of graph data; node point

To the node

Probability of intra-layer transition of

Is composed of

Wherein, in the step (A),

the calculation method of (c) is as follows:

(5)

wherein the content of the first and second substances,

representing nodes

And node

Distance between, nodes

Is a node

Of a previous node, a node

Is a node

The next-hop node of (1) is,

representing nodes

And node

Weight of the connecting edge between, parameter

And parameters

A parameter for guiding the random walker to perform biased random walk;

determining cross-layer random walk transfer probability according to the modularity of the graph network of each dimension; cross-layer random walk transition probability

Comprises the following steps:

(6)

wherein the content of the first and second substances,

representing nodes

And node

The weight of the connecting edge between the two,

representing according to nodes

And node

Distance between them to measure the node

Selecting a next hop node

The probability of (d); while

A layer jump probability is represented, wherein,

(ii) a When the dimension is

When the modularity of the graph network of (a) is high,

a higher value will be set; on the contrary, when the dimension is

When the modularity of the graph network of (a) is low,

a lower value will be set;

representing for a dimension of

A network of graphs according to

Probability values for modularity settings of the graph network, and similarly,

representing for a dimension of

The probability values set by the modularity of the graph network,

representing for a dimension of

The probability values set by the modularity of the graph network,

representing probability values set for a graph network of dimension 1 according to its modularity, which, when higher, will be

Set to a higher value, then the random walker will have

Jumping the probability of | to a graph network with the dimension of 1;

and indicating a random walker to determine the layers and the moving nodes of the traversed weighted multidimensional graph network according to the cross-layer random walk transfer probability, and performing cross-layer random walk.

Specifically, for intra-layer random walk, a node2vec method (i.e., a graph data embedding method) is adopted. Given a source node

，

Is expressed as length of

First in the wandering path length of

And a sampling node. Assume that the current sampling node is at

The random walker can walk to the next neighbor node according to the following probability distribution

：

(7)

Wherein the content of the first and second substances,

middle representation node

And node

In the type of relationship

There are connected edges in the graph network of (1),

representing nodes

And node

The probability of an intra-layer transition between,

which represents a normalization constant, is shown,

i.e., the above equation (5).

For cross-layer random walker, assume that the random walker is initially at a node

The random walker will then decide which layer to traverse in the first step and perform biased random walk initialization

. The random walker will then decide the next node to move

. In the next traversal, the random walker determines the level to traverse and the next node to walk.

The cross-layer random walk transfer probability is taken according to each dimension graph network

For the multi-dimensional graph network

The importance degree of the node clustering performance is determined, so that the information quantity of each dimension graph network can be measured by adopting a heuristic modularity method, and the calculation result based on the modularity is obtained

An

Value of modularity of

. Secondly, a layer jump probability is defined:

，

cross-layer random walk transition probability

Can be expressed as the above equation (6).

In some embodiments, the network of the graph with lower modularity

Of networks with differences smaller than the modularity of the network

A difference value; wherein the content of the first and second substances,

representing for a dimension of

A network of graphs according to

Probability values set by the modularity of the graph network,

representing for a dimension of

A network of graphs according to

Probability values set by modularity of the graph network.

It is to be understood that if

Is very low (i.e. the modularity of

Very small), then will give

Setting a lower value for the difference of (a); instead, a slightly higher value will be set. The setting idea is as follows: for a graph network layer with low modularity (small information content), a random walker has relatively low probability to traverse the network layer, and conversely, for a graph network layer with high modularity (large information content), the random walker has relatively high probability to traverse the graph network layer; the reason is that the graph network layer with high modularity plays a more important role in node clustering of the whole system.

In an embodiment, the step S16 may specifically include the following processing steps:

s162, dividing a sampling sequence through a window to obtain a training sample sequence of node information;

s164, inputting the training sample sequence into a SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of the nodes;

the process of optimizing the objective function by the random gradient descent method comprises the steps of sequentially determining conditional probability through conditional independence assumption and symmetry assumption optimization, and obtaining the objective function according to the optimized conditional probability.

It can be understood that in the processing of the previous step S14, a plurality of layers is obtained by the intra-layer and inter-layer random walk strategiesDimension graph network

In

Sampling sequence of individual nodes

The sequence comprising nodes

Context information of (i.e., neighbor nodes); then dividing the sequence through a window to obtain a training sample sequence related to the node information; inputting a training sample sequence into a Skip-Gram model, and optimizing an objective function by a random gradient descent method

To obtain a node

Low dimensional dense embedding of

。

In particular, the objective function

Comprises the following steps:

(8)

wherein the content of the first and second substances,

is a node to

A dimension is mapped to a size of

The embedded matrix of (a);

is a pre-set parameter of the process,

representing nodes

The neighbor nodes of (a) are,

is a conditional probability that indicates that, given each node, the probability of having its neighbor node appear is maximized.

Under the assumption of conditional independence (i.e., given a source node, the probability of its neighbor node appearing is independent of the rest of the nodes in the neighbor set), the conditional probability can be further expressed as:

(9)

under the assumption of symmetry (when in)

When the influence between two nodes in the dimensional feature space is symmetrical, that is, when one node is used as a source node and as a neighbor node, the same low-dimensional embedding is shared), the conditional probability can be further optimized and expressed as:

(10)

after the above assumptions, the optimized objective function

Can be expressed as:

(11)

in one embodiment, regarding step S18, specifically, the step S16 results in a multidimensional mapping network

In

Low dimensional embedding of individual nodes (nodes)

Is represented as

) That is, the obtained low-dimensional embedded set is

Wherein each token vector is one

The real vector of dimensions. Clustering these low-dimensional embeddings by using K-means algorithm, i.e. clustering

A node is divided into

In a different cluster

So that the sum of squares within each cluster is minimized, i.e., the clustering goal of K-means is to find a cluster that satisfies the following equation

，

(12)

Wherein the content of the first and second substances,

representing clusters

Average of all points in (1) such that each point belongs to a point away from the center

The closest mean (cluster center) corresponds to the cluster. Finally, the original multidimensional graph network can be obtained

In

After the nodes are clustered

A different cluster.

In an embodiment, the step S20 may specifically include the following processing steps S202 to S208:

s202, obtaining the similarity between the nodes according to the low-dimensional embedding calculation

Distributing;

the distribution is as follows:

(13)

(14)

wherein the content of the first and second substances,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing nodes

And node

The distance between the two or more of the two or more,

representing nodes

Is the variance of the gaussian distribution of the center point,

representing nodes

And node

The distance between the two or more of the two or more,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

Selecting a node

As a conditional probability of its close point,

indicating the number of nodes.

It can be understood that because of

The low-dimensional embedding of the nodes has captured the similarity of attributes and structure between nodes in the multidimensional graph network, and the correlation between different dimensional graph networks and the difference of importance to clustering performance, and is therefore based on

And

to calculate

And formulae (13) and (14) to

The distribution can more accurately and comprehensively describe the clustering characteristics in the multidimensional graph network.

S204, measuring nodes in the two-dimensional layout space based on Student-t distribution

And node

Proximity between them, calculating

Distributing;

the distribution is as follows:

wherein the content of the first and second substances,

representing nodes

And node

The distance in the two-dimensional layout space,

and

respectively representing nodes

And node

The coordinate values in the two-dimensional layout space,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

and

respectively representing nodes

And node

Coordinate values in the two-dimensional layout space.

It is understood that

The distribution is similar in that,

the distribution shows that similar nodes are closer in distance and dissimilar nodes are relatively farther in distance in the two-dimensional layout space.

S206, calculating

Are distributed and

distributed byKLDivergence; the calculation formula is as follows:

it can be appreciated that in the model optimization process, there is a constant decrease

Can make

As much as possible reflect

Namely, the coordinate position of the node in the two-dimensional layout space reflects the characteristic information in the original graph network as much as possible.

S208, whenKLAnd when the divergence stops iterative optimization, obtaining two-dimensional coordinate values of each node, drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.

It will be appreciated that when the model stops iterative optimization, this is the case

Is that

Two-dimensional coordinate values of the individual nodes. In this embodiment, for the processing of embedding and projecting the low dimension into the two-dimensional space and visually displaying the clustering result, the following may be specifically briefly mentioned:

(1) based on low-dimensional embedding

Calculating

Similarity between low-dimensional embedding of individual nodes, i.e. calculation

Distributing;

(2) in a two-dimensional layout space, calculating

Layout proximity between individual nodes, i.e. calculation

Distributing;

(3) computing

Are distributed and

between distributions

Divergence, continuous iterative optimization of objective function, reduction

Are distributed and

difference between the distributions to obtain

Two-dimensional coordinate values of the nodes in the two-dimensional layout space;

(4) and carrying out graph visualization mapping according to the two-dimensional coordinate values and the labels of the nodes.

In the embodiment, the nodes with the same label (label) are drawn in the same color, the nodes with different labels are drawn in different colors, the visual result is the visual effect of the multidimensional graph network node clustering, the nodes in the same cluster are close to each other, and the nodes in different clusters are far away from each other.

Compared with the prior art, the method and the device have the advantages that the original weightless network is converted into the weighting network based on the attribute similarity and the structural similarity of the nodes, the comprehensive similarity between the nodes is coded by the weights, and the clustering effect of the nodes can be enhanced; according to the method and the device, the cross-layer random walk transfer probability is set according to the information amount of different layers, so that the clustering effect of the nodes can be further enhanced; the method and the device apply the classic random walk-based network embedding technology aiming at the single-layer network to attribute multidimensional graph network node clustering, and expand the application range of the network embedding technology.

It should be understood that although the steps in the flowcharts of fig. 2 and 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of fig. 2 and 3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

Referring to fig. 5, a multidimensional graph network node clustering processing apparatus 100 is further provided, which includes a network conversion module 13, a migration processing module 15, an embedding processing module 17, a clustering processing module 19, and a visualization module 21. The network conversion module 13 is configured to convert the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structural similarity of the nodes. The migration processing module 15 is configured to perform multilayer network random migration processing in and across layers on the weighted multidimensional graph network according to the built in-layer transition probability and cross-layer random migration probability, so as to obtain a sampling sequence of each node of the weighted multidimensional graph network. The embedding processing module 17 is configured to convert the sample sequence of each node into low-dimensional embedding based on the SkipGram model. The clustering processing module 19 is configured to perform clustering processing on the low-dimensional embedding of each node by using a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network. The visualization module 21 is configured to embed and project each low-dimensional image into a two-dimensional space by using a dimension reduction technique and display a clustering result by using a graph visualization technique.

The multidimensional graph network node clustering processing device 100 firstly converts an original unweighted attribute multidimensional graph network (i.e., the unweighted multidimensional graph network) into a weighted multidimensional graph network through cooperation of all modules, and encodes comprehensive similarity characteristics with attribute similarity and structural similarity between nodes with edges in the conversion, so that the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.

In one embodiment, the network conversion module 13 includes an attribute sub-module, a structure sub-module, and a conversion sub-module. The attribute submodule is used for determining attribute similarity of the nodes with the connected edges according to the number of similar nodes in the F-dimensional attributes in the attribute vector of the nodes with the connected edges for the graph network of each dimension. The structure submodule is used for determining the structural similarity between the nodes by adopting a structural similarity measurement method. And the conversion submodule is used for adding weight to each unweighted connecting edge in the graph network of each dimension by utilizing the attribute similarity and the structure similarity, and converting the unweighted multidimensional graph network into a weighted multidimensional graph network.

In one embodiment, the structural similarity measure is an RA index measure.

In an embodiment, the migration processing module 15 is configured to, according to the built in-layer transition probability and cross-layer random migration transition probability, perform intra-layer and cross-layer multilayer network random migration processing on the weighted multidimensional graph network, and specifically may be configured to implement the following processing procedures:

To the node

Probability of intra-layer transition of

Is composed of

Wherein, in the step (A),

the calculation method of (c) is as follows:

wherein the content of the first and second substances,

representing nodes

And node

Distance between, nodes

Is a node

Of a previous node, a node

Is a node

The next-hop node of (1) is,

representing nodes

And node

Weight of the connecting edge between, parameter

And parameters

A parameter for guiding the random walker to perform biased random walk;

Comprises the following steps:

wherein the content of the first and second substances,

representing nodes

And node

The weight of the connecting edge between the two,

representing according to nodes

And node

Distance between them to measure the node

Selecting a next hop node

The probability of (d); while

A layer jump probability is represented, wherein,

(ii) a When the dimension is

When the modularity of the graph network of (a) is high,

a higher value will be set; on the contrary, when the dimension is

When the modularity of the graph network of (a) is low,

a lower value will be set;

representing for a dimension of

A network of graphs according to

Probability values for modularity settings of the graph network, and similarly,

representing for a dimension of

The probability values set by the modularity of the graph network,

representing for a dimension of

The probability values set by the modularity of the graph network,

Set to a higher value, then the random walker will have

Jumps to the graph network with dimension 1;

In an embodiment, the migration processing module 15 is configured to, according to the built in-layer transition probability and cross-layer random migration transition probability, perform intra-layer and cross-layer multilayer network random migration processing on the weighted multidimensional graph network, and specifically, may further be configured to implement the following processing procedures:

of networks of graphs with lower modularity

Of networks with differences smaller than the modularity of the network

A difference value;

representing for a dimension of

A network of graphs according to

Probability values set by the modularity of the graph network,

representing for a dimension of

A network of graphs according to

Probability values set by modularity of the graph network.

In an embodiment, the embedded processing module 17 may be specifically configured to implement the following processing steps: dividing a sampling sequence through a window to obtain a training sample sequence of node information; inputting the training sample sequence into a SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of nodes; the process of optimizing the objective function by the random gradient descent method comprises the steps of sequentially determining conditional probability through conditional independence assumption and symmetry assumption optimization, and obtaining the objective function according to the optimized conditional probability.

In one embodiment, the visualization module 21 may include a first distribution calculation sub-module, a second distribution calculation sub-module, a divergence calculation sub-module, and a presentation sub-module. Wherein the first distribution calculation submodule is used for obtaining the similarity between the nodes according to the low-dimensional embedding calculation

Distributing;

the distribution is as follows:

wherein the content of the first and second substances,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing nodes

And node

The distance between the two or more of the two or more,

representing nodes

Is the variance of the gaussian distribution of the center point,

representing nodes

And node

The distance between the two or more of the two or more,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

Selecting a node

As a conditional probability of its close point,

indicating the number of nodes.

The second distribution calculation submodule is used for measuring nodes in the two-dimensional layout space based on Student-t distribution

And

proximity between them, calculating

Distributing;

the distribution is as follows:

wherein the content of the first and second substances,

representing nodes

And node

The distance in the two-dimensional layout space,

and

respectively representing nodes

And node

The coordinate values in the two-dimensional layout space,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

and

respectively representing nodes

And node

Coordinate values in a two-dimensional layout space;

divergence calculation submodule for calculating

Are distributed and

distributed byKLDivergence; the calculation formula is as follows:

the display sub-module is used inKLAnd when the divergence stops iterative optimization, obtaining two-dimensional coordinate values of each node, drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.

For specific limitations of the multidimensional graph network node clustering processing apparatus 100, reference may be made to the corresponding limitations of the above multidimensional graph network node clustering processing method, which is not described herein again. The modules in the multidimensional graph network node clustering processing device 100 can be wholly or partially realized by software, hardware and a combination thereof. The modules may be embedded in a hardware form or a device independent of a specific data processing function, or may be stored in a memory of the device in a software form, so that a processor can call and execute operations corresponding to the modules, where the computing device may be, but is not limited to, various computers existing in the field.

In still another aspect, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the following steps: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.

In one embodiment, the processor, when executing the computer program, may further implement the additional steps or sub-steps in the embodiments of the multidimensional graph network node clustering processing method.

In yet another aspect, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.

In one embodiment, the computer program, when executed by the processor, may further implement the additional steps or sub-steps in the embodiments of the multidimensional graph network node clustering processing method.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link DRAM (Synchlink) DRAM (SLDRAM), Rambus DRAM (RDRAM), and interface DRAM (DRDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present application, and all of them fall within the scope of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims

1. A multidimensional graph network node clustering processing method is characterized by comprising the following steps:

converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model;

clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network;

and embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying the clustering result by adopting a graph visualization technology.

2. The method for processing node clusters in a multidimensional graph network according to claim 1, wherein the step of converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structural similarity of the nodes comprises:

for each dimension of the graph network, determining the attribute similarity of the nodes with the connected edges according to the similar number of the nodes with the connected edges in the attribute vector of the nodes with the connected edges in the F-dimension attribute;

determining the structural similarity between nodes by adopting a structural similarity measurement method;

and adding weights to each unweighted connecting edge in the graph network of each dimension by using the attribute similarity and the structural similarity, and converting the unweighted multidimensional graph network into the weighted multidimensional graph network.

3. The method of claim 2, wherein the structural similarity measure is an RA metric measure.

4. The multi-dimensional graph network node clustering processing method according to claim 1, wherein a process of performing intra-layer and inter-layer multi-layer network random walk processing on the weighted multi-dimensional graph network according to the built intra-layer transition probability and cross-layer random walk transition probability includes:

To the node

Probability of intra-layer transition of

Is composed of

Wherein

The calculation method of (c) is as follows:

wherein the content of the first and second substances,

representing nodes

And node

Distance between, nodes

Is a node

Of a previous node, a node

Is a node

The next-hop node of (1) is,

representing nodes

And node

Weight of the connecting edge between, parameter

And parameters

A parameter for guiding the random walker to perform biased random walk;

determining the cross-layer random walk transfer probability according to the modularity of the graph network of each dimension; the cross-layer random walk transition probability

Comprises the following steps:

wherein the content of the first and second substances,

representing nodes

And node

The weight of the connecting edge between the two,

representing according to nodes

And node

Measure the distance between the nodes

Selecting a next hop node

The probability of (a) of (b) being,

representing for a dimension of

The probability values set by the modularity of the graph network,

representing for a dimension of

Of a graph networkThe probability value of (a) is determined,

representing for a dimension of

The modularity of the graph network, M is an integer greater than 1,

representing a probability value set for a modularity of the graph network with dimension 1;

and indicating the random walker to determine the traversed layers and moving nodes of the weighted multidimensional graph network according to the cross-layer random walk transfer probability, and performing cross-layer random walk.

5. The method according to claim 4, wherein the weighted multidimensional graph network is subjected to intra-layer and inter-layer multi-layer network random walk processing according to the built intra-layer transition probability and cross-layer random walk transition probability, and further comprising:

of networks of graphs with lower modularity

Of networks with differences smaller than the modularity of the network

A difference value;

representing for a dimension of

A network of graphs according to

Probability values set by the modularity of the graph network,

representing for a dimension of

A network of graphs according to

Probability values set by modularity of the graph network.

6. The multi-dimensional graph network node clustering processing method according to claim 1, wherein the step of converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model includes:

dividing the sampling sequence through a window to obtain a training sample sequence of the node information;

inputting the training sample sequence into the SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of nodes;

7. The method for processing the clustering of the nodes in the multidimensional graph network according to any one of claims 1 to 6, wherein the step of projecting each low-dimensional embedding into a two-dimensional space by using a dimension reduction technique and displaying the clustering result by using a graph visualization technique comprises:

obtaining similarity between nodes according to said low-dimensional embedding calculation

Distributing; the above-mentioned

The distribution is as follows:

wherein the content of the first and second substances,

represents the ith node

Select the jth node

As a conditional probability of its close point,

representing nodes

And node

The distance between the two or more of the two or more,

representing nodes

Is the variance of the gaussian distribution of the center point,

representing nodes

And the kth node

The distance between the two or more of the two or more,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

Selecting a node

As a conditional probability of its close point,

representing the number of nodes;

student-t distribution based two-dimensional layout space node measurement

And node

Proximity between them, calculating

Distributing; the above-mentioned

The distribution is as follows:

wherein the content of the first and second substances,

representing nodes

And node

The distance in the two-dimensional layout space,

and

respectively representing nodes

And node

The coordinate values in the two-dimensional layout space,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

representing nodes

And node

Of similarity between them

The distribution of the water content is carried out,

and

respectively representing nodes

And the h node

Coordinate values in a two-dimensional layout space;

calculating the said

Distribution and said

Distributed byKLDivergence; the calculation formula is as follows:

when saidKLWhen divergence stops iterative optimization, the two-dimensional coordinate value of each node is obtained, and the two-dimensional coordinate value is used for calculating the divergence of each nodeAnd drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.

8. A multidimensional graph network node clustering processing device is characterized by comprising:

the embedding processing module is used for converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model;

the clustering processing module is used for clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network;

and the visualization module is used for embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying the clustering result by adopting a graph visualization technology.

9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program performs the steps of the method for cluster processing of nodes of a multidimensional graph network as claimed in any one of claims 1 to 7.