CN113254717A - Multidimensional graph network node clustering processing method, apparatus and device - Google Patents

Multidimensional graph network node clustering processing method, apparatus and device Download PDF

Info

Publication number
CN113254717A
CN113254717A CN202110645181.3A CN202110645181A CN113254717A CN 113254717 A CN113254717 A CN 113254717A CN 202110645181 A CN202110645181 A CN 202110645181A CN 113254717 A CN113254717 A CN 113254717A
Authority
CN
China
Prior art keywords
node
nodes
graph network
dimensional
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110645181.3A
Other languages
Chinese (zh)
Inventor
魏迎梅
韩贝贝
康来
冯素茹
蒋杰
谢毓湘
万珊珊
杨雨璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN202110645181.3A priority Critical patent/CN113254717A/en
Publication of CN113254717A publication Critical patent/CN113254717A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a multidimensional graph network node clustering processing method, a multidimensional graph network node clustering processing device and multidimensional graph network node clustering processing equipment, wherein the method comprises the following steps: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology. The purpose of remarkably improving the clustering effect is achieved, and the clustering effect is excellent.

Description

Multidimensional graph network node clustering processing method, apparatus and device
Technical Field
The present application relates to the field of network data processing technologies, and in particular, to a multidimensional graph network node clustering method, apparatus, and device.
Background
The network theory can be used for modeling complex relationships among various entities in real life, and the method for the relationships among individuals in the traditional modeling system mostly adopts a simple single network or a single-layer network, namely, the networks with the same node types and only one interaction type exist in the networks; wherein, the nodes in the network represent individuals in a complex system, and the continuous edges represent the interactive relationship existing between the individuals. The multilayer network can model different interaction relations existing among individuals, in other words, the multilayer network is a network comprising a plurality of layers, and each layer network in the layers is formed by independent single-layer networks (namely, traditional networks); edges in each layer of a multi-layer network are of the same type, but the edge types in different layers may be different; the node type of each layer in a multi-layer network may also be different.
Networks at different levels in a multidimensional graph network are composed of the same entities, and the connection relationships between nodes in each level have different properties. A multidimensional graph network is a special type of network that is a multi-layer network. The objective of attribute single-layer network node clustering is to satisfy the following requirements: 1) the structure compactness, namely the nodes in the same cluster are closely connected, and the nodes in different clusters are far away; 2) attribute homogeneity, i.e., nodes in the same cluster have similar attribute values, while nodes in different clusters have significant differences in attribute values. In practice, for node clustering of an attribute multidimensional graph network, not only the above-mentioned structure compactness and attribute homogeneity need to be satisfied, but also the association relationship between different dimensions and the information amount of different dimension graph networks need to be considered when node clustering is performed, that is, the influence of different dimension graph networks on node clustering of the whole system plays different importance. However, in the process of implementing the present invention, the inventor finds that the node clustering technology of the current graph network has a technical problem of poor clustering effect.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a multidimensional graph network node clustering method with a better clustering effect, a multidimensional graph network node clustering device, a computer device, and a computer readable storage medium.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in one aspect, an embodiment of the present invention provides a multidimensional graph network node clustering method, including:
converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;
according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network;
converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model;
clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network;
and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.
In another aspect, a multidimensional graph network node clustering processing apparatus is also provided, including:
the network conversion module is used for converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;
the migration processing module is used for carrying out multilayer network random migration processing of in-layer and cross-layer on the weighted multidimensional graph network according to the built in-layer transition probability and cross-layer random migration probability to obtain a sampling sequence of each node of the weighted multidimensional graph network;
the embedding processing module is used for converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model;
the clustering processing module is used for clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network;
and the visualization module is used for embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.
In still another aspect, a computer device is further provided, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of any one of the above-mentioned multidimensional graph network node clustering processing methods when executing the computer program.
In still another aspect, a computer-readable storage medium is provided, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any one of the above-mentioned methods for processing clusters of nodes in a multidimensional graph network.
One of the above technical solutions has the following advantages and beneficial effects:
according to the method, the device and the equipment for processing the clustering of the nodes of the multidimensional graph network, an original unweighted attribute multidimensional graph network (namely the unweighted multidimensional graph network) is converted into a weighted multidimensional graph network, and comprehensive similarity characteristics of attribute similarity and structural similarity between nodes with connected edges are coded in the conversion, so that the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.
Drawings
FIG. 1 is a schematic diagram of a conventional single-layer network and a multi-dimensional network;
FIG. 2 is a flowchart illustrating a method for clustering nodes in a multidimensional graph network according to an embodiment;
FIG. 3 is a flow diagram that illustrates the conversion process of the multidimensional graph network in one embodiment;
FIG. 4 is a diagram illustrating conversion of an unweighted multidimensional graph network to a weighted multidimensional graph network, under an embodiment;
fig. 5 is a schematic block diagram of a multidimensional graph network node cluster processing apparatus in an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
In addition, the technical solutions in the embodiments of the present invention may be combined with each other, but it must be based on the realization of those skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination of technical solutions should be considered to be absent and not within the protection scope of the present invention.
As shown in fig. 1, which is a schematic diagram of a conventional single-layer network and a multidimensional network, different layers of the multidimensional network can be regarded as interaction relationships existing in different angles of the same node set. If the network with multiple interactive relationships is represented as a multi-relationship fusion network as shown in fig. 1(a), the respective structural features of the same dimension graph network and the coupling information and the interactive association information between different dimensions cannot be clearly expressed. Compared with fig. 1(B), fig. 1(C) can more clearly represent the interaction relationship of the same node set in three different dimensions, namely dimension a, dimension B and dimension C, and the correlation information between layers. Compared with the traditional single-layer network, the multi-dimensional graph network describes different characteristics of the complex system from different angles, and makes up for the deviation brought by a single visual angle, so that the result obtained by analyzing the complex system based on the multi-dimensional graph network is more accurate. When a node in the network contains an attribute feature, the multidimensional graph network is called an attribute multidimensional graph network.
In addition, the visualization technology expresses information in a visual image mode, and powerful support is provided for discovering and understanding scientific laws. The graph visualization becomes an important graph network data analysis method, and the method mainly comprises the following steps: a force-guided based method and a data-dimension-reduction based method. Compared with a force guiding method, the graph visualization technology based on data dimension reduction strives to maintain the similarity between the node distribution in the original graph space and the two-dimensional layout space by optimizing an objective function, so that the node distribution in the two-dimensional layout space can reflect the node information in the original graph space. The visualization technology based on nonlinear dimension reduction can reflect structural data with nonlinear relation, and is more widely applied compared with the visualization technology based on linear dimension reduction.
The objective of attribute single-layer network node clustering is to satisfy the following requirements: 1) the structure compactness, namely the nodes in the same cluster are closely connected, and the nodes in different clusters are far away; 2) attribute homogeneity, i.e., nodes in the same cluster have similar attribute values, while nodes in different clusters have significant differences in attribute values. In practice, for node clustering of an attribute multidimensional graph network, not only the above-mentioned structure compactness and attribute homogeneity need to be satisfied, but also the association relationship between different dimensions and the information amount of different dimension graph networks need to be considered when node clustering is performed, that is, the influence of different dimension graph networks on node clustering of the whole system plays different importance.
The inventor finds that the node clustering technology of the current graph network has the technical problem of poor clustering effect, which can be specifically shown as follows: 1) a classical network embedding technology (node2vec) based on random walk focuses on analyzing a single-layer network, and the single-layer network usually only has topology information without considering the attribute characteristics of nodes; 2) While the attribute graph network node clustering method considering the attribute characteristics is generally only directed to a single-layer graph network. 3) The traditional node clustering method of the multidimensional graph network is usually a node clustering method of a single-layer network, such as modularity or matrix decomposition, and the like, and is extended to the multidimensional graph network, however, the methods are not suitable for large-scale graph networks.
The invention provides an effective solution to the technical problem of poor clustering effect of the node clustering technology of the current graph network, and can remarkably enhance the clustering effect of the nodes.
For convenience of illustration and understanding, the structure shown in FIG. 1(c) is
Figure 45170DEST_PATH_IMAGE001
A multidimensional graph network (or multi-relationship network) composed of graph networks of different dimensions is represented as:
Figure 327116DEST_PATH_IMAGE002
Figure 510973DEST_PATH_IMAGE003
representing the channels in the multidimensional network
Figure 349616DEST_PATH_IMAGE004
A node set consisting of individual nodes;
Figure 177895DEST_PATH_IMAGE005
representing the set of edges of the multi-dimensional network
Figure 709370DEST_PATH_IMAGE006
Figure 114944DEST_PATH_IMAGE007
And
Figure 944228DEST_PATH_IMAGE008
respectively representing the sizes of the node set and the edge set in the multidimensional network;
Figure 689330DEST_PATH_IMAGE009
is represented by
Figure 454024DEST_PATH_IMAGE010
A feature matrix formed by the feature values of the nodes;
Figure 956681DEST_PATH_IMAGE011
an adjacency matrix representing the multidimensional network;
Figure 137126DEST_PATH_IMAGE012
the dimension of expression is
Figure 64631DEST_PATH_IMAGE013
A graph network of;
Figure 859280DEST_PATH_IMAGE014
indicating that the graph network is an unlicensed network (sized to be
Figure 177129DEST_PATH_IMAGE015
) Wherein, when
Figure 958004DEST_PATH_IMAGE016
Representing nodes
Figure 349802DEST_PATH_IMAGE017
And node
Figure 456298DEST_PATH_IMAGE018
In the dimension of
Figure 182814DEST_PATH_IMAGE019
There are connected edges in the graph network (i.e. there are connected edges in the graph network)
Figure 970642DEST_PATH_IMAGE020
) Otherwise, the
Figure 607160DEST_PATH_IMAGE021
Referring to fig. 2, in one aspect, the present invention provides a method for processing node clusters of a multidimensional graph network, including the following processing steps S12 to S20:
and S12, converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes.
And S14, carrying out multilayer network random walk processing of the weighted multidimensional graph network in and across layers according to the built in-layer transition probability and cross-layer random walk transition probability to obtain a sampling sequence of each node of the weighted multidimensional graph network.
It is understood that in a multidimensional graph network, graph networks in different dimensions represent different relationships of the same group of nodes in different views, and attribute information of the nodes
Figure 822240DEST_PATH_IMAGE022
Shared by all dimensions. The purpose of the multidimensional graph network node clustering is to detect the cluster shared by all the multidimensional graph networks, and meanwhile, the correlation information among different multidimensional graph networks needs to be considered. From the perspective of the whole system, different image layers generally play different roles in the performance of the whole system, that is, the graph networks with different dimensions can be sorted according to different importance degrees exerted in the node clustering performance of the multidimensional graph network, and a corresponding sorting result with a descending order can be obtained.
Based on the above setting conditions, the embodiment designs a multilayer random walk method for a multidimensional graph network, which includes two walk processing conditions: one is random walk in the layer; the second is cross-layer random walk.
S16, converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; it can be understood that the skip gram model is also a skip-gram neural network model that is known in the art.
S18, clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network; it can be understood that the K-means algorithm is also known in the art as a K-means clustering algorithm (K-means clustering algorithm), which is a clustering analysis algorithm for iterative solution. And (3) clustering the low-dimensional embedding by adopting a K-means clustering algorithm, namely dividing each node into K different clusters, so that the sum of squares in each cluster is minimum, and obtaining a clustering result of the node, namely K different clusters.
And S20, embedding and projecting each low-dimensional image into a two-dimensional space by using a dimension reduction technology and displaying a clustering result by using a graph visualization technology.
According to the multi-dimensional graph network node clustering processing method, an original unweighted attribute multi-dimensional graph network (namely the unweighted multi-dimensional graph network) is converted into a weighted multi-dimensional graph network, comprehensive similarity characteristics of attribute similarity and structure similarity between nodes with connected edges are coded in the conversion, and the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.
Referring to fig. 3 and 4, in an embodiment, the step S12 may include the following steps:
s122, for each dimension of the graph network, determining the attribute similarity of the nodes with the connected edges according to the similar number of the nodes with the connected edges in the attribute vector of the F-dimension attribute;
s124, determining the structural similarity between the nodes by adopting a structural similarity measurement method;
and S126, adding weights to each unweighted connecting edge in the graph network of each dimension by utilizing the attribute similarity and the structure similarity, and converting the unweighted multidimensional graph network into a weighted multidimensional graph network.
It can be understood that the clustering target of the attribute multidimensional graph network needs to satisfy not only the compactness of the structure but also the homogeneity of the attributes. Therefore, based on the idea, for the graph network of each dimension, firstly measuring the attribute similarity between the nodes with connected edges in the graph network, secondly calculating the structural similarity between the nodes based on a structural similarity measurement method, and then fusing the attribute similarity and the structural similarity between the nodes to obtain the connected edge weight between the nodes, namely converting the unweighted graph network of each dimension into a weighted graph network.
Specifically, for processing attribute similarity:
the most intuitive measure has nodes with edges: (
Figure 786785DEST_PATH_IMAGE023
) The method for attribute similarity between nodes is to compare the nodes one by one
Figure 378304DEST_PATH_IMAGE024
And node
Figure 869328DEST_PATH_IMAGE025
Attribute vector of
Figure 176681DEST_PATH_IMAGE026
And
Figure 753156DEST_PATH_IMAGE027
in (1),
Figure 882786DEST_PATH_IMAGE028
number of similarities in dimension attributes:
Figure 838104DEST_PATH_IMAGE029
(1)
based on the formula (1), the node
Figure 457304DEST_PATH_IMAGE030
And node
Figure 380129DEST_PATH_IMAGE031
The attribute similarity between them can be expressed as:
Figure 844609DEST_PATH_IMAGE032
(2)
for the treatment of structural similarity:
there are many Common methods for measuring structural similarity between nodes, such as Common neighbor algorithm (CN), Jaccard Coefficient, Resource Allocation Index (RA), adaptive Adar Index (AA Index), Preferred Attachment (PA), and Community Resource Allocation Index (Community Resource Allocation Index). The structural similarity between the nodes can be determined by adopting any one of the structural similarity measurement methods.
In one embodiment, preferably, the structural similarity measure is an RA metric measure. Since the RA index is the best performance method in the tasks of community detection and link prediction graph analysis, the structural similarity between nodes is measured based on the RA index in this embodiment, and the specific calculation method is as follows:
Figure 185591DEST_PATH_IMAGE033
(3)
wherein the content of the first and second substances,
Figure 178955DEST_PATH_IMAGE034
representing nodes
Figure 464443DEST_PATH_IMAGE035
And node
Figure 201455DEST_PATH_IMAGE036
A common neighbor between them and a common neighbor between them,
Figure 646212DEST_PATH_IMAGE037
representing the value of each node in the common neighborhood, taking the reciprocal and then adding up to obtain the node
Figure 341635DEST_PATH_IMAGE038
And node
Figure 317681DEST_PATH_IMAGE039
RA value in between, i.e. node
Figure 530488DEST_PATH_IMAGE040
And node
Figure 439538DEST_PATH_IMAGE041
Structural similarity of (c). In this way, optimal processing performance can be achieved in the process of determining structural similarity between nodes.
Constructing a weighted multidimensional graph network based on the attribute similarity and the structural similarity: in particular, for dimensions of
Figure 774705DEST_PATH_IMAGE042
Graph network of
Figure 893839DEST_PATH_IMAGE043
Each of the unauthorized strips can be connected to the edge
Figure 34971DEST_PATH_IMAGE044
Adding a weight
Figure 939473DEST_PATH_IMAGE045
The weight encodes the node
Figure 179961DEST_PATH_IMAGE046
And node
Figure 927337DEST_PATH_IMAGE047
The structural similarity and the attribute similarity are calculated according to the formula (4):
Figure 731214DEST_PATH_IMAGE048
(4)
wherein the parameters
Figure 818119DEST_PATH_IMAGE049
And parameters
Figure 26246DEST_PATH_IMAGE050
Respectively for measuring nodes
Figure 136285DEST_PATH_IMAGE051
And node
Figure 556902DEST_PATH_IMAGE052
The relative magnitude of structural and attribute similarity between them.
Figure 560630DEST_PATH_IMAGE053
Represents the calculation result of the formula (3),
Figure 329872DEST_PATH_IMAGE054
the calculation result of formula (2) is expressed. Based on equation (4), the attribute unweighted multidimensional graph network can be constructed
Figure 989523DEST_PATH_IMAGE055
Conversion into a weighted multidimensional graph network
Figure 10569DEST_PATH_IMAGE056
As shown in fig. 4, wherein A, B and C represent three different dimensions, respectively.
In an embodiment, regarding the step S14, the process of performing intra-layer and inter-layer multilayer network random walk processing on the weighted multidimensional graph network according to the built intra-layer transition probability and cross-layer random walk transition probability may specifically include the following processing procedures:
carrying out intra-layer biased random walk processing on the weighted multi-dimensional graph network by adopting an embedding method of graph data; node point
Figure 744169DEST_PATH_IMAGE057
To the node
Figure 762941DEST_PATH_IMAGE058
Probability of intra-layer transition of
Figure 362418DEST_PATH_IMAGE059
Is composed of
Figure 124838DEST_PATH_IMAGE060
Wherein, in the step (A),
Figure 837579DEST_PATH_IMAGE061
the calculation method of (c) is as follows:
Figure 964935DEST_PATH_IMAGE062
(5)
wherein the content of the first and second substances,
Figure 599179DEST_PATH_IMAGE063
representing nodes
Figure 962027DEST_PATH_IMAGE064
And node
Figure 732537DEST_PATH_IMAGE065
Distance between, nodes
Figure 280062DEST_PATH_IMAGE066
Is a node
Figure 932760DEST_PATH_IMAGE067
Of a previous node, a node
Figure 36982DEST_PATH_IMAGE068
Is a node
Figure 865261DEST_PATH_IMAGE069
The next-hop node of (1) is,
Figure 193474DEST_PATH_IMAGE070
representing nodes
Figure 536731DEST_PATH_IMAGE071
And node
Figure 100436DEST_PATH_IMAGE072
Weight of the connecting edge between, parameter
Figure 907855DEST_PATH_IMAGE073
And parameters
Figure 282336DEST_PATH_IMAGE074
A parameter for guiding the random walker to perform biased random walk;
determining cross-layer random walk transfer probability according to the modularity of the graph network of each dimension; cross-layer random walk transition probability
Figure 112889DEST_PATH_IMAGE075
Comprises the following steps:
Figure 355651DEST_PATH_IMAGE076
(6)
wherein the content of the first and second substances,
Figure 407790DEST_PATH_IMAGE077
representing nodes
Figure 281068DEST_PATH_IMAGE078
And node
Figure 395654DEST_PATH_IMAGE079
The weight of the connecting edge between the two,
Figure 317474DEST_PATH_IMAGE080
representing according to nodes
Figure 37168DEST_PATH_IMAGE081
And node
Figure 143664DEST_PATH_IMAGE082
Distance between them to measure the node
Figure 683230DEST_PATH_IMAGE083
Selecting a next hop node
Figure 392429DEST_PATH_IMAGE084
The probability of (d); while
Figure 28947DEST_PATH_IMAGE085
A layer jump probability is represented, wherein,
Figure 916131DEST_PATH_IMAGE086
(ii) a When the dimension is
Figure 5310DEST_PATH_IMAGE087
When the modularity of the graph network of (a) is high,
Figure 331249DEST_PATH_IMAGE088
a higher value will be set; on the contrary, when the dimension is
Figure 946907DEST_PATH_IMAGE089
When the modularity of the graph network of (a) is low,
Figure 129627DEST_PATH_IMAGE088
a lower value will be set;
Figure 581468DEST_PATH_IMAGE090
representing for a dimension of
Figure 711098DEST_PATH_IMAGE091
A network of graphs according to
Figure 322208DEST_PATH_IMAGE091
Probability values for modularity settings of the graph network, and similarly,
Figure 879091DEST_PATH_IMAGE092
representing for a dimension of
Figure 801917DEST_PATH_IMAGE093
The probability values set by the modularity of the graph network,
Figure 797554DEST_PATH_IMAGE094
representing for a dimension of
Figure 872958DEST_PATH_IMAGE095
The probability values set by the modularity of the graph network,
Figure 600742DEST_PATH_IMAGE096
representing probability values set for a graph network of dimension 1 according to its modularity, which, when higher, will be
Figure 886230DEST_PATH_IMAGE097
Set to a higher value, then the random walker will have
Figure 810193DEST_PATH_IMAGE098
Jumping the probability of | to a graph network with the dimension of 1;
and indicating a random walker to determine the layers and the moving nodes of the traversed weighted multidimensional graph network according to the cross-layer random walk transfer probability, and performing cross-layer random walk.
Specifically, for intra-layer random walk, a node2vec method (i.e., a graph data embedding method) is adopted. Given a source node
Figure 67999DEST_PATH_IMAGE099
Figure 763422DEST_PATH_IMAGE100
Is expressed as length of
Figure 677152DEST_PATH_IMAGE101
First in the wandering path length of
Figure 952275DEST_PATH_IMAGE102
And a sampling node. Assume that the current sampling node is at
Figure 861325DEST_PATH_IMAGE103
The random walker can walk to the next neighbor node according to the following probability distribution
Figure 383443DEST_PATH_IMAGE104
Figure 581206DEST_PATH_IMAGE105
(7)
Wherein the content of the first and second substances,
Figure 722337DEST_PATH_IMAGE106
middle representation node
Figure 689156DEST_PATH_IMAGE107
And node
Figure 132907DEST_PATH_IMAGE108
In the type of relationship
Figure 614704DEST_PATH_IMAGE109
There are connected edges in the graph network of (1),
Figure 231630DEST_PATH_IMAGE110
representing nodes
Figure 505485DEST_PATH_IMAGE111
And node
Figure 182454DEST_PATH_IMAGE112
The probability of an intra-layer transition between,
Figure 151547DEST_PATH_IMAGE113
which represents a normalization constant, is shown,
Figure 509847DEST_PATH_IMAGE114
i.e., the above equation (5).
For cross-layer random walker, assume that the random walker is initially at a node
Figure 247996DEST_PATH_IMAGE115
The random walker will then decide which layer to traverse in the first step and perform biased random walk initialization
Figure 95867DEST_PATH_IMAGE116
. The random walker will then decide the next node to move
Figure 942469DEST_PATH_IMAGE117
. In the next traversal, the random walker determines the level to traverse and the next node to walk.
The cross-layer random walk transfer probability is taken according to each dimension graph network
Figure 901197DEST_PATH_IMAGE118
For the multi-dimensional graph network
Figure 759432DEST_PATH_IMAGE119
The importance degree of the node clustering performance is determined, so that the information quantity of each dimension graph network can be measured by adopting a heuristic modularity method, and the calculation result based on the modularity is obtained
Figure 450307DEST_PATH_IMAGE120
An
Figure 597255DEST_PATH_IMAGE121
Value of modularity of
Figure 156412DEST_PATH_IMAGE122
. Secondly, a layer jump probability is defined:
Figure 259366DEST_PATH_IMAGE123
Figure 449039DEST_PATH_IMAGE124
cross-layer random walk transition probability
Figure 614441DEST_PATH_IMAGE125
Can be expressed as the above equation (6).
In some embodiments, the network of the graph with lower modularity
Figure 852656DEST_PATH_IMAGE126
Of networks with differences smaller than the modularity of the network
Figure 888745DEST_PATH_IMAGE127
A difference value; wherein the content of the first and second substances,
Figure 311636DEST_PATH_IMAGE128
representing for a dimension of
Figure 167597DEST_PATH_IMAGE129
A network of graphs according to
Figure 193190DEST_PATH_IMAGE129
Probability values set by the modularity of the graph network,
Figure 146103DEST_PATH_IMAGE130
representing for a dimension of
Figure 677578DEST_PATH_IMAGE131
A network of graphs according to
Figure 958518DEST_PATH_IMAGE131
Probability values set by modularity of the graph network.
It is to be understood that if
Figure 397590DEST_PATH_IMAGE132
Is very low (i.e. the modularity of
Figure 142692DEST_PATH_IMAGE133
Very small), then will give
Figure 32019DEST_PATH_IMAGE134
Setting a lower value for the difference of (a); instead, a slightly higher value will be set. The setting idea is as follows: for a graph network layer with low modularity (small information content), a random walker has relatively low probability to traverse the network layer, and conversely, for a graph network layer with high modularity (large information content), the random walker has relatively high probability to traverse the graph network layer; the reason is that the graph network layer with high modularity plays a more important role in node clustering of the whole system.
In an embodiment, the step S16 may specifically include the following processing steps:
s162, dividing a sampling sequence through a window to obtain a training sample sequence of node information;
s164, inputting the training sample sequence into a SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of the nodes;
the process of optimizing the objective function by the random gradient descent method comprises the steps of sequentially determining conditional probability through conditional independence assumption and symmetry assumption optimization, and obtaining the objective function according to the optimized conditional probability.
It can be understood that in the processing of the previous step S14, a plurality of layers is obtained by the intra-layer and inter-layer random walk strategiesDimension graph network
Figure 862572DEST_PATH_IMAGE135
In
Figure 370914DEST_PATH_IMAGE136
Sampling sequence of individual nodes
Figure 642626DEST_PATH_IMAGE137
The sequence comprising nodes
Figure 515904DEST_PATH_IMAGE138
Context information of (i.e., neighbor nodes); then dividing the sequence through a window to obtain a training sample sequence related to the node information; inputting a training sample sequence into a Skip-Gram model, and optimizing an objective function by a random gradient descent method
Figure 896070DEST_PATH_IMAGE139
To obtain a node
Figure 145786DEST_PATH_IMAGE140
Low dimensional dense embedding of
Figure 544710DEST_PATH_IMAGE141
In particular, the objective function
Figure 651206DEST_PATH_IMAGE142
Comprises the following steps:
Figure 456351DEST_PATH_IMAGE143
(8)
wherein the content of the first and second substances,
Figure 181862DEST_PATH_IMAGE144
is a node to
Figure 756062DEST_PATH_IMAGE145
A dimension is mapped to a size of
Figure 33460DEST_PATH_IMAGE146
The embedded matrix of (a);
Figure 247273DEST_PATH_IMAGE147
is a pre-set parameter of the process,
Figure 573212DEST_PATH_IMAGE148
representing nodes
Figure 64236DEST_PATH_IMAGE149
The neighbor nodes of (a) are,
Figure 715797DEST_PATH_IMAGE150
is a conditional probability that indicates that, given each node, the probability of having its neighbor node appear is maximized.
Under the assumption of conditional independence (i.e., given a source node, the probability of its neighbor node appearing is independent of the rest of the nodes in the neighbor set), the conditional probability can be further expressed as:
Figure 902059DEST_PATH_IMAGE151
(9)
under the assumption of symmetry (when in)
Figure 94006DEST_PATH_IMAGE152
When the influence between two nodes in the dimensional feature space is symmetrical, that is, when one node is used as a source node and as a neighbor node, the same low-dimensional embedding is shared), the conditional probability can be further optimized and expressed as:
Figure 642799DEST_PATH_IMAGE153
(10)
after the above assumptions, the optimized objective function
Figure 386633DEST_PATH_IMAGE154
Can be expressed as:
Figure 388087DEST_PATH_IMAGE155
(11)
in one embodiment, regarding step S18, specifically, the step S16 results in a multidimensional mapping network
Figure 383725DEST_PATH_IMAGE156
In
Figure 459128DEST_PATH_IMAGE157
Low dimensional embedding of individual nodes (nodes)
Figure 186913DEST_PATH_IMAGE158
Is represented as
Figure 472400DEST_PATH_IMAGE159
) That is, the obtained low-dimensional embedded set is
Figure 130784DEST_PATH_IMAGE160
Wherein each token vector is one
Figure 388590DEST_PATH_IMAGE161
The real vector of dimensions. Clustering these low-dimensional embeddings by using K-means algorithm, i.e. clustering
Figure 615172DEST_PATH_IMAGE162
A node is divided into
Figure 591218DEST_PATH_IMAGE163
In a different cluster
Figure 272866DEST_PATH_IMAGE164
So that the sum of squares within each cluster is minimized, i.e., the clustering goal of K-means is to find a cluster that satisfies the following equation
Figure 713075DEST_PATH_IMAGE165
Figure 782662DEST_PATH_IMAGE166
(12)
Wherein the content of the first and second substances,
Figure 432955DEST_PATH_IMAGE167
representing clusters
Figure 42928DEST_PATH_IMAGE168
Average of all points in (1) such that each point belongs to a point away from the center
Figure 275326DEST_PATH_IMAGE169
The closest mean (cluster center) corresponds to the cluster. Finally, the original multidimensional graph network can be obtained
Figure 719077DEST_PATH_IMAGE170
In
Figure 404136DEST_PATH_IMAGE171
After the nodes are clustered
Figure 817800DEST_PATH_IMAGE172
A different cluster.
In an embodiment, the step S20 may specifically include the following processing steps S202 to S208:
s202, obtaining the similarity between the nodes according to the low-dimensional embedding calculation
Figure 91655DEST_PATH_IMAGE173
Distributing;
Figure 768624DEST_PATH_IMAGE174
the distribution is as follows:
Figure 3297DEST_PATH_IMAGE175
(13)
Figure 830438DEST_PATH_IMAGE176
(14)
wherein the content of the first and second substances,
Figure 37429DEST_PATH_IMAGE177
representing nodes
Figure 947616DEST_PATH_IMAGE178
Selecting a node
Figure 607267DEST_PATH_IMAGE179
As a conditional probability of its close point,
Figure 752947DEST_PATH_IMAGE180
representing nodes
Figure 548865DEST_PATH_IMAGE181
And node
Figure 364374DEST_PATH_IMAGE182
The distance between the two or more of the two or more,
Figure 449004DEST_PATH_IMAGE183
representing nodes
Figure 211424DEST_PATH_IMAGE184
Is the variance of the gaussian distribution of the center point,
Figure 924165DEST_PATH_IMAGE185
representing nodes
Figure 35210DEST_PATH_IMAGE186
And node
Figure 669453DEST_PATH_IMAGE187
The distance between the two or more of the two or more,
Figure 32301DEST_PATH_IMAGE188
representing nodes
Figure 68391DEST_PATH_IMAGE189
Selecting a node
Figure 101069DEST_PATH_IMAGE189
As a conditional probability of its close point,
Figure 19346DEST_PATH_IMAGE190
representing nodes
Figure 123568DEST_PATH_IMAGE191
And node
Figure 935535DEST_PATH_IMAGE192
Of similarity between them
Figure 263749DEST_PATH_IMAGE193
The distribution of the water content is carried out,
Figure 607005DEST_PATH_IMAGE194
representing nodes
Figure 452601DEST_PATH_IMAGE195
And node
Figure 197704DEST_PATH_IMAGE196
Of similarity between them
Figure 696818DEST_PATH_IMAGE197
The distribution of the water content is carried out,
Figure 714321DEST_PATH_IMAGE198
representing nodes
Figure 160346DEST_PATH_IMAGE199
And node
Figure 822272DEST_PATH_IMAGE200
Of similarity between them
Figure 367654DEST_PATH_IMAGE201
The distribution of the water content is carried out,
Figure 685503DEST_PATH_IMAGE202
representing nodes
Figure 731956DEST_PATH_IMAGE203
Selecting a node
Figure 373022DEST_PATH_IMAGE204
As a conditional probability of its close point,
Figure 417201DEST_PATH_IMAGE205
indicating the number of nodes.
It can be understood that because of
Figure 284663DEST_PATH_IMAGE205
The low-dimensional embedding of the nodes has captured the similarity of attributes and structure between nodes in the multidimensional graph network, and the correlation between different dimensional graph networks and the difference of importance to clustering performance, and is therefore based on
Figure 72490DEST_PATH_IMAGE206
And
Figure 318795DEST_PATH_IMAGE207
to calculate
Figure 596193DEST_PATH_IMAGE208
And formulae (13) and (14) to
Figure 623055DEST_PATH_IMAGE209
The distribution can more accurately and comprehensively describe the clustering characteristics in the multidimensional graph network.
S204, measuring nodes in the two-dimensional layout space based on Student-t distribution
Figure 401524DEST_PATH_IMAGE210
And node
Figure 626969DEST_PATH_IMAGE211
Proximity between them, calculating
Figure 12951DEST_PATH_IMAGE212
Distributing;
Figure 730371DEST_PATH_IMAGE213
the distribution is as follows:
Figure 860001DEST_PATH_IMAGE214
wherein the content of the first and second substances,
Figure 205532DEST_PATH_IMAGE215
representing nodes
Figure 949365DEST_PATH_IMAGE216
And node
Figure 950820DEST_PATH_IMAGE217
The distance in the two-dimensional layout space,
Figure 946457DEST_PATH_IMAGE218
and
Figure 84178DEST_PATH_IMAGE219
respectively representing nodes
Figure 749645DEST_PATH_IMAGE220
And node
Figure 300712DEST_PATH_IMAGE221
The coordinate values in the two-dimensional layout space,
Figure 693516DEST_PATH_IMAGE222
representing nodes
Figure 951322DEST_PATH_IMAGE223
And node
Figure 177904DEST_PATH_IMAGE224
Of similarity between them
Figure 888372DEST_PATH_IMAGE225
The distribution of the water content is carried out,
Figure 101178DEST_PATH_IMAGE226
representing nodes
Figure 275808DEST_PATH_IMAGE227
And node
Figure 345395DEST_PATH_IMAGE228
Of similarity between them
Figure 995688DEST_PATH_IMAGE229
The distribution of the water content is carried out,
Figure 605661DEST_PATH_IMAGE230
and
Figure 838059DEST_PATH_IMAGE231
respectively representing nodes
Figure 16230DEST_PATH_IMAGE232
And node
Figure 966869DEST_PATH_IMAGE233
Coordinate values in the two-dimensional layout space.
It is understood that
Figure 380533DEST_PATH_IMAGE234
The distribution is similar in that,
Figure 654388DEST_PATH_IMAGE235
the distribution shows that similar nodes are closer in distance and dissimilar nodes are relatively farther in distance in the two-dimensional layout space.
S206, calculating
Figure 331357DEST_PATH_IMAGE236
Are distributed and
Figure 566029DEST_PATH_IMAGE237
distributed byKLDivergence; the calculation formula is as follows:
Figure 721067DEST_PATH_IMAGE238
it can be appreciated that in the model optimization process, there is a constant decrease
Figure 600161DEST_PATH_IMAGE239
Can make
Figure 244769DEST_PATH_IMAGE240
As much as possible reflect
Figure 904421DEST_PATH_IMAGE241
Namely, the coordinate position of the node in the two-dimensional layout space reflects the characteristic information in the original graph network as much as possible.
S208, whenKLAnd when the divergence stops iterative optimization, obtaining two-dimensional coordinate values of each node, drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.
It will be appreciated that when the model stops iterative optimization, this is the case
Figure 784521DEST_PATH_IMAGE242
Is that
Figure 908335DEST_PATH_IMAGE243
Two-dimensional coordinate values of the individual nodes. In this embodiment, for the processing of embedding and projecting the low dimension into the two-dimensional space and visually displaying the clustering result, the following may be specifically briefly mentioned:
(1) based on low-dimensional embedding
Figure 927106DEST_PATH_IMAGE244
Calculating
Figure 11737DEST_PATH_IMAGE245
Similarity between low-dimensional embedding of individual nodes, i.e. calculation
Figure 305315DEST_PATH_IMAGE246
Distributing;
(2) in a two-dimensional layout space, calculating
Figure 221319DEST_PATH_IMAGE247
Layout proximity between individual nodes, i.e. calculation
Figure 597942DEST_PATH_IMAGE248
Distributing;
(3) computing
Figure 232186DEST_PATH_IMAGE249
Are distributed and
Figure 595034DEST_PATH_IMAGE250
between distributions
Figure 37648DEST_PATH_IMAGE251
Divergence, continuous iterative optimization of objective function, reduction
Figure 398222DEST_PATH_IMAGE252
Are distributed and
Figure 582079DEST_PATH_IMAGE253
difference between the distributions to obtain
Figure 420722DEST_PATH_IMAGE254
Two-dimensional coordinate values of the nodes in the two-dimensional layout space;
(4) and carrying out graph visualization mapping according to the two-dimensional coordinate values and the labels of the nodes.
In the embodiment, the nodes with the same label (label) are drawn in the same color, the nodes with different labels are drawn in different colors, the visual result is the visual effect of the multidimensional graph network node clustering, the nodes in the same cluster are close to each other, and the nodes in different clusters are far away from each other.
Compared with the prior art, the method and the device have the advantages that the original weightless network is converted into the weighting network based on the attribute similarity and the structural similarity of the nodes, the comprehensive similarity between the nodes is coded by the weights, and the clustering effect of the nodes can be enhanced; according to the method and the device, the cross-layer random walk transfer probability is set according to the information amount of different layers, so that the clustering effect of the nodes can be further enhanced; the method and the device apply the classic random walk-based network embedding technology aiming at the single-layer network to attribute multidimensional graph network node clustering, and expand the application range of the network embedding technology.
It should be understood that although the steps in the flowcharts of fig. 2 and 3 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps of fig. 2 and 3 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
Referring to fig. 5, a multidimensional graph network node clustering processing apparatus 100 is further provided, which includes a network conversion module 13, a migration processing module 15, an embedding processing module 17, a clustering processing module 19, and a visualization module 21. The network conversion module 13 is configured to convert the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structural similarity of the nodes. The migration processing module 15 is configured to perform multilayer network random migration processing in and across layers on the weighted multidimensional graph network according to the built in-layer transition probability and cross-layer random migration probability, so as to obtain a sampling sequence of each node of the weighted multidimensional graph network. The embedding processing module 17 is configured to convert the sample sequence of each node into low-dimensional embedding based on the SkipGram model. The clustering processing module 19 is configured to perform clustering processing on the low-dimensional embedding of each node by using a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network. The visualization module 21 is configured to embed and project each low-dimensional image into a two-dimensional space by using a dimension reduction technique and display a clustering result by using a graph visualization technique.
The multidimensional graph network node clustering processing device 100 firstly converts an original unweighted attribute multidimensional graph network (i.e., the unweighted multidimensional graph network) into a weighted multidimensional graph network through cooperation of all modules, and encodes comprehensive similarity characteristics with attribute similarity and structural similarity between nodes with edges in the conversion, so that the clustering performance of the nodes can be enhanced. Secondly, based on different importance differences exerted by different dimension graph networks on node clustering, namely different information quantities of different dimension graph networks, cross-layer random walk transfer probability is established according to the information quantity difference in a heterogeneous mode, transfer probability in a combination layer is combined, a sampling sequence of each node can be obtained, and the sampling sequence captures neighbor node information of each node. The resulting sample sequence is then converted to low-dimensional embedding using network embedding techniques. And based on the low-dimensional embedding of all the nodes, clustering the nodes by adopting a K-means clustering algorithm to obtain a clustering result of the nodes. Finally, the low-dimensional embedding is projected into the two-dimensional space by adopting a dimension reduction technology, the coordinate value of each node in the two-dimensional space is obtained, the label information of the node is used as color mapping, the node clustering effect is displayed from the visual angle by adopting a graph visualization technology, the purpose of remarkably improving the clustering effect is achieved, the clustering effect is excellent, and the application range of the network embedding technology is expanded.
In one embodiment, the network conversion module 13 includes an attribute sub-module, a structure sub-module, and a conversion sub-module. The attribute submodule is used for determining attribute similarity of the nodes with the connected edges according to the number of similar nodes in the F-dimensional attributes in the attribute vector of the nodes with the connected edges for the graph network of each dimension. The structure submodule is used for determining the structural similarity between the nodes by adopting a structural similarity measurement method. And the conversion submodule is used for adding weight to each unweighted connecting edge in the graph network of each dimension by utilizing the attribute similarity and the structure similarity, and converting the unweighted multidimensional graph network into a weighted multidimensional graph network.
In one embodiment, the structural similarity measure is an RA index measure.
In an embodiment, the migration processing module 15 is configured to, according to the built in-layer transition probability and cross-layer random migration transition probability, perform intra-layer and cross-layer multilayer network random migration processing on the weighted multidimensional graph network, and specifically may be configured to implement the following processing procedures:
carrying out intra-layer biased random walk processing on the weighted multi-dimensional graph network by adopting an embedding method of graph data; node point
Figure 498268DEST_PATH_IMAGE255
To the node
Figure 29744DEST_PATH_IMAGE256
Probability of intra-layer transition of
Figure 169738DEST_PATH_IMAGE257
Is composed of
Figure 15334DEST_PATH_IMAGE258
Wherein, in the step (A),
Figure 760436DEST_PATH_IMAGE259
the calculation method of (c) is as follows:
Figure 259551DEST_PATH_IMAGE260
wherein the content of the first and second substances,
Figure 277054DEST_PATH_IMAGE261
representing nodes
Figure 457500DEST_PATH_IMAGE262
And node
Figure 385004DEST_PATH_IMAGE263
Distance between, nodes
Figure 930386DEST_PATH_IMAGE264
Is a node
Figure 248235DEST_PATH_IMAGE265
Of a previous node, a node
Figure 294689DEST_PATH_IMAGE266
Is a node
Figure 14383DEST_PATH_IMAGE265
The next-hop node of (1) is,
Figure 979934DEST_PATH_IMAGE267
representing nodes
Figure 847396DEST_PATH_IMAGE268
And node
Figure 635223DEST_PATH_IMAGE269
Weight of the connecting edge between, parameter
Figure 147107DEST_PATH_IMAGE270
And parameters
Figure 362188DEST_PATH_IMAGE271
A parameter for guiding the random walker to perform biased random walk;
determining cross-layer random walk transfer probability according to the modularity of the graph network of each dimension; cross-layer random walk transition probability
Figure 185787DEST_PATH_IMAGE272
Comprises the following steps:
Figure 964256DEST_PATH_IMAGE273
wherein the content of the first and second substances,
Figure 392964DEST_PATH_IMAGE274
representing nodes
Figure 841262DEST_PATH_IMAGE275
And node
Figure 293104DEST_PATH_IMAGE276
The weight of the connecting edge between the two,
Figure 422734DEST_PATH_IMAGE277
representing according to nodes
Figure 768264DEST_PATH_IMAGE278
And node
Figure 590727DEST_PATH_IMAGE279
Distance between them to measure the node
Figure 779131DEST_PATH_IMAGE280
Selecting a next hop node
Figure 243611DEST_PATH_IMAGE276
The probability of (d); while
Figure 646910DEST_PATH_IMAGE281
A layer jump probability is represented, wherein,
Figure 577957DEST_PATH_IMAGE282
(ii) a When the dimension is
Figure 801128DEST_PATH_IMAGE283
When the modularity of the graph network of (a) is high,
Figure 334878DEST_PATH_IMAGE284
a higher value will be set; on the contrary, when the dimension is
Figure 779634DEST_PATH_IMAGE285
When the modularity of the graph network of (a) is low,
Figure 678320DEST_PATH_IMAGE284
a lower value will be set;
Figure 716683DEST_PATH_IMAGE286
representing for a dimension of
Figure 991807DEST_PATH_IMAGE287
A network of graphs according to
Figure 41803DEST_PATH_IMAGE288
Probability values for modularity settings of the graph network, and similarly,
Figure 173707DEST_PATH_IMAGE289
representing for a dimension of
Figure 371470DEST_PATH_IMAGE290
The probability values set by the modularity of the graph network,
Figure 637235DEST_PATH_IMAGE291
representing for a dimension of
Figure 869633DEST_PATH_IMAGE292
The probability values set by the modularity of the graph network,
Figure 906859DEST_PATH_IMAGE293
representing probability values set for a graph network of dimension 1 according to its modularity, which, when higher, will be
Figure 795181DEST_PATH_IMAGE294
Set to a higher value, then the random walker will have
Figure 412107DEST_PATH_IMAGE295
Jumps to the graph network with dimension 1;
and indicating a random walker to determine the layers and the moving nodes of the traversed weighted multidimensional graph network according to the cross-layer random walk transfer probability, and performing cross-layer random walk.
In an embodiment, the migration processing module 15 is configured to, according to the built in-layer transition probability and cross-layer random migration transition probability, perform intra-layer and cross-layer multilayer network random migration processing on the weighted multidimensional graph network, and specifically, may further be configured to implement the following processing procedures:
of networks of graphs with lower modularity
Figure 561329DEST_PATH_IMAGE296
Of networks with differences smaller than the modularity of the network
Figure 972718DEST_PATH_IMAGE296
A difference value;
Figure 66445DEST_PATH_IMAGE297
representing for a dimension of
Figure 549379DEST_PATH_IMAGE298
A network of graphs according to
Figure 490790DEST_PATH_IMAGE298
Probability values set by the modularity of the graph network,
Figure 276344DEST_PATH_IMAGE299
representing for a dimension of
Figure 935995DEST_PATH_IMAGE300
A network of graphs according to
Figure 691462DEST_PATH_IMAGE300
Probability values set by modularity of the graph network.
In an embodiment, the embedded processing module 17 may be specifically configured to implement the following processing steps: dividing a sampling sequence through a window to obtain a training sample sequence of node information; inputting the training sample sequence into a SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of nodes; the process of optimizing the objective function by the random gradient descent method comprises the steps of sequentially determining conditional probability through conditional independence assumption and symmetry assumption optimization, and obtaining the objective function according to the optimized conditional probability.
In one embodiment, the visualization module 21 may include a first distribution calculation sub-module, a second distribution calculation sub-module, a divergence calculation sub-module, and a presentation sub-module. Wherein the first distribution calculation submodule is used for obtaining the similarity between the nodes according to the low-dimensional embedding calculation
Figure 939909DEST_PATH_IMAGE301
Distributing;
Figure 958681DEST_PATH_IMAGE302
the distribution is as follows:
Figure 902366DEST_PATH_IMAGE303
Figure 664786DEST_PATH_IMAGE304
wherein the content of the first and second substances,
Figure 518472DEST_PATH_IMAGE305
representing nodes
Figure 504883DEST_PATH_IMAGE306
Selecting a node
Figure 139126DEST_PATH_IMAGE307
As a conditional probability of its close point,
Figure 626608DEST_PATH_IMAGE308
representing nodes
Figure 397118DEST_PATH_IMAGE309
And node
Figure 820009DEST_PATH_IMAGE310
The distance between the two or more of the two or more,
Figure 348074DEST_PATH_IMAGE311
representing nodes
Figure 452296DEST_PATH_IMAGE312
Is the variance of the gaussian distribution of the center point,
Figure 405209DEST_PATH_IMAGE313
representing nodes
Figure 936684DEST_PATH_IMAGE314
And node
Figure 466891DEST_PATH_IMAGE315
The distance between the two or more of the two or more,
Figure 905963DEST_PATH_IMAGE316
representing nodes
Figure 651065DEST_PATH_IMAGE317
Selecting a node
Figure 291125DEST_PATH_IMAGE318
As a conditional probability of its close point,
Figure 918415DEST_PATH_IMAGE319
representing nodes
Figure 364440DEST_PATH_IMAGE320
And node
Figure 416579DEST_PATH_IMAGE321
Of similarity between them
Figure 289857DEST_PATH_IMAGE322
The distribution of the water content is carried out,
Figure 138864DEST_PATH_IMAGE323
representing nodes
Figure 326263DEST_PATH_IMAGE324
And node
Figure 45957DEST_PATH_IMAGE325
Of similarity between them
Figure 152454DEST_PATH_IMAGE326
The distribution of the water content is carried out,
Figure 613391DEST_PATH_IMAGE327
representing nodes
Figure 401218DEST_PATH_IMAGE328
And node
Figure 444261DEST_PATH_IMAGE328
Of similarity between them
Figure 190500DEST_PATH_IMAGE329
The distribution of the water content is carried out,
Figure 669891DEST_PATH_IMAGE330
representing nodes
Figure 995831DEST_PATH_IMAGE331
Selecting a node
Figure 752434DEST_PATH_IMAGE332
As a conditional probability of its close point,
Figure 810520DEST_PATH_IMAGE333
indicating the number of nodes.
The second distribution calculation submodule is used for measuring nodes in the two-dimensional layout space based on Student-t distribution
Figure 324678DEST_PATH_IMAGE328
And
Figure 516625DEST_PATH_IMAGE331
proximity between them, calculating
Figure 65418DEST_PATH_IMAGE334
Distributing;
Figure 567110DEST_PATH_IMAGE335
the distribution is as follows:
Figure 630881DEST_PATH_IMAGE336
wherein the content of the first and second substances,
Figure 564202DEST_PATH_IMAGE337
representing nodes
Figure 905184DEST_PATH_IMAGE338
And node
Figure 632969DEST_PATH_IMAGE339
The distance in the two-dimensional layout space,
Figure 918457DEST_PATH_IMAGE340
and
Figure 842419DEST_PATH_IMAGE341
respectively representing nodes
Figure 100225DEST_PATH_IMAGE342
And node
Figure 61228DEST_PATH_IMAGE343
The coordinate values in the two-dimensional layout space,
Figure 37274DEST_PATH_IMAGE344
representing nodes
Figure 984502DEST_PATH_IMAGE345
And node
Figure 159131DEST_PATH_IMAGE346
Of similarity between them
Figure 494298DEST_PATH_IMAGE347
The distribution of the water content is carried out,
Figure 879011DEST_PATH_IMAGE348
representing nodes
Figure 754564DEST_PATH_IMAGE349
And node
Figure 721383DEST_PATH_IMAGE350
Of similarity between them
Figure 165133DEST_PATH_IMAGE351
The distribution of the water content is carried out,
Figure 850193DEST_PATH_IMAGE352
and
Figure 263856DEST_PATH_IMAGE353
respectively representing nodes
Figure 537712DEST_PATH_IMAGE354
And node
Figure 214681DEST_PATH_IMAGE355
Coordinate values in a two-dimensional layout space;
divergence calculation submodule for calculating
Figure 449353DEST_PATH_IMAGE356
Are distributed and
Figure 542074DEST_PATH_IMAGE357
distributed byKLDivergence; the calculation formula is as follows:
Figure 483485DEST_PATH_IMAGE358
the display sub-module is used inKLAnd when the divergence stops iterative optimization, obtaining two-dimensional coordinate values of each node, drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.
For specific limitations of the multidimensional graph network node clustering processing apparatus 100, reference may be made to the corresponding limitations of the above multidimensional graph network node clustering processing method, which is not described herein again. The modules in the multidimensional graph network node clustering processing device 100 can be wholly or partially realized by software, hardware and a combination thereof. The modules may be embedded in a hardware form or a device independent of a specific data processing function, or may be stored in a memory of the device in a software form, so that a processor can call and execute operations corresponding to the modules, where the computing device may be, but is not limited to, various computers existing in the field.
In still another aspect, a computer device is provided, which includes a memory and a processor, the memory stores a computer program, and the processor executes the computer program to implement the following steps: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.
In one embodiment, the processor, when executing the computer program, may further implement the additional steps or sub-steps in the embodiments of the multidimensional graph network node clustering processing method.
In yet another aspect, there is also provided a computer readable storage medium having a computer program stored thereon, the computer program when executed by a processor implementing the steps of: converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes; according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network; converting the sampling sequence of each node into low-dimensional embedding based on the SkipGram model; clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multidimensional graph network; and embedding and projecting each low dimension into a two-dimensional space by adopting a dimension reduction technology and displaying a clustering result by adopting a graph visualization technology.
In one embodiment, the computer program, when executed by the processor, may further implement the additional steps or sub-steps in the embodiments of the multidimensional graph network node clustering processing method.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), synchronous link DRAM (Synchlink) DRAM (SLDRAM), Rambus DRAM (RDRAM), and interface DRAM (DRDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above examples only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various changes and modifications can be made without departing from the spirit of the present application, and all of them fall within the scope of the present application. Therefore, the protection scope of the present patent should be subject to the appended claims.

Claims (9)

1. A multidimensional graph network node clustering processing method is characterized by comprising the following steps:
converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;
according to the built in-layer transition probability and cross-layer random walk transition probability, carrying out in-layer and cross-layer multilayer network random walk processing on the weighted multi-dimensional graph network to obtain a sampling sequence of each node of the weighted multi-dimensional graph network;
converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model;
clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network;
and embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying the clustering result by adopting a graph visualization technology.
2. The method for processing node clusters in a multidimensional graph network according to claim 1, wherein the step of converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structural similarity of the nodes comprises:
for each dimension of the graph network, determining the attribute similarity of the nodes with the connected edges according to the similar number of the nodes with the connected edges in the attribute vector of the nodes with the connected edges in the F-dimension attribute;
determining the structural similarity between nodes by adopting a structural similarity measurement method;
and adding weights to each unweighted connecting edge in the graph network of each dimension by using the attribute similarity and the structural similarity, and converting the unweighted multidimensional graph network into the weighted multidimensional graph network.
3. The method of claim 2, wherein the structural similarity measure is an RA metric measure.
4. The multi-dimensional graph network node clustering processing method according to claim 1, wherein a process of performing intra-layer and inter-layer multi-layer network random walk processing on the weighted multi-dimensional graph network according to the built intra-layer transition probability and cross-layer random walk transition probability includes:
carrying out intra-layer biased random walk processing on the weighted multi-dimensional graph network by adopting an embedding method of graph data; node point
Figure 722134DEST_PATH_IMAGE001
To the node
Figure 295066DEST_PATH_IMAGE002
Probability of intra-layer transition of
Figure 809224DEST_PATH_IMAGE003
Is composed of
Figure 1171DEST_PATH_IMAGE004
Wherein
Figure 222068DEST_PATH_IMAGE005
The calculation method of (c) is as follows:
Figure 778952DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 842722DEST_PATH_IMAGE007
representing nodes
Figure 776043DEST_PATH_IMAGE008
And node
Figure 100714DEST_PATH_IMAGE009
Distance between, nodes
Figure 890816DEST_PATH_IMAGE010
Is a node
Figure 379566DEST_PATH_IMAGE011
Of a previous node, a node
Figure 788682DEST_PATH_IMAGE012
Is a node
Figure 843225DEST_PATH_IMAGE013
The next-hop node of (1) is,
Figure 7491DEST_PATH_IMAGE014
representing nodes
Figure 928346DEST_PATH_IMAGE015
And node
Figure 207DEST_PATH_IMAGE016
Weight of the connecting edge between, parameter
Figure 378099DEST_PATH_IMAGE017
And parameters
Figure 385369DEST_PATH_IMAGE018
A parameter for guiding the random walker to perform biased random walk;
determining the cross-layer random walk transfer probability according to the modularity of the graph network of each dimension; the cross-layer random walk transition probability
Figure 848711DEST_PATH_IMAGE019
Comprises the following steps:
Figure 724263DEST_PATH_IMAGE020
wherein the content of the first and second substances,
Figure 878033DEST_PATH_IMAGE021
representing nodes
Figure 384101DEST_PATH_IMAGE022
And node
Figure 131477DEST_PATH_IMAGE023
The weight of the connecting edge between the two,
Figure 420507DEST_PATH_IMAGE024
representing according to nodes
Figure 507412DEST_PATH_IMAGE025
And node
Figure 246698DEST_PATH_IMAGE026
Measure the distance between the nodes
Figure 419053DEST_PATH_IMAGE027
Selecting a next hop node
Figure 761041DEST_PATH_IMAGE028
The probability of (a) of (b) being,
Figure 968032DEST_PATH_IMAGE029
representing for a dimension of
Figure 612640DEST_PATH_IMAGE030
The probability values set by the modularity of the graph network,
Figure 209974DEST_PATH_IMAGE031
representing for a dimension of
Figure 168703DEST_PATH_IMAGE032
Of a graph networkThe probability value of (a) is determined,
Figure 26938DEST_PATH_IMAGE033
representing for a dimension of
Figure 45709DEST_PATH_IMAGE034
The modularity of the graph network, M is an integer greater than 1,
Figure 114028DEST_PATH_IMAGE035
representing a probability value set for a modularity of the graph network with dimension 1;
and indicating the random walker to determine the traversed layers and moving nodes of the weighted multidimensional graph network according to the cross-layer random walk transfer probability, and performing cross-layer random walk.
5. The method according to claim 4, wherein the weighted multidimensional graph network is subjected to intra-layer and inter-layer multi-layer network random walk processing according to the built intra-layer transition probability and cross-layer random walk transition probability, and further comprising:
of networks of graphs with lower modularity
Figure 938765DEST_PATH_IMAGE036
Of networks with differences smaller than the modularity of the network
Figure 854768DEST_PATH_IMAGE037
A difference value;
Figure 716545DEST_PATH_IMAGE038
representing for a dimension of
Figure 350789DEST_PATH_IMAGE039
A network of graphs according to
Figure 713637DEST_PATH_IMAGE039
Probability values set by the modularity of the graph network,
Figure 671097DEST_PATH_IMAGE040
representing for a dimension of
Figure 766092DEST_PATH_IMAGE041
A network of graphs according to
Figure 949949DEST_PATH_IMAGE041
Probability values set by modularity of the graph network.
6. The multi-dimensional graph network node clustering processing method according to claim 1, wherein the step of converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model includes:
dividing the sampling sequence through a window to obtain a training sample sequence of the node information;
inputting the training sample sequence into the SkipGram model and optimizing an objective function by adopting a random gradient descent method to obtain low-dimensional dense embedding of nodes;
the process of optimizing the objective function by the random gradient descent method comprises the steps of sequentially determining conditional probability through conditional independence assumption and symmetry assumption optimization, and obtaining the objective function according to the optimized conditional probability.
7. The method for processing the clustering of the nodes in the multidimensional graph network according to any one of claims 1 to 6, wherein the step of projecting each low-dimensional embedding into a two-dimensional space by using a dimension reduction technique and displaying the clustering result by using a graph visualization technique comprises:
obtaining similarity between nodes according to said low-dimensional embedding calculation
Figure 54171DEST_PATH_IMAGE042
Distributing; the above-mentioned
Figure 616871DEST_PATH_IMAGE043
The distribution is as follows:
Figure 945084DEST_PATH_IMAGE044
Figure 288341DEST_PATH_IMAGE045
wherein the content of the first and second substances,
Figure 383204DEST_PATH_IMAGE046
represents the ith node
Figure 128307DEST_PATH_IMAGE047
Select the jth node
Figure 893000DEST_PATH_IMAGE048
As a conditional probability of its close point,
Figure 395657DEST_PATH_IMAGE049
representing nodes
Figure 841682DEST_PATH_IMAGE050
And node
Figure 238028DEST_PATH_IMAGE051
The distance between the two or more of the two or more,
Figure 298257DEST_PATH_IMAGE052
representing nodes
Figure 616106DEST_PATH_IMAGE053
Is the variance of the gaussian distribution of the center point,
Figure 396980DEST_PATH_IMAGE054
representing nodes
Figure 54357DEST_PATH_IMAGE055
And the kth node
Figure 98537DEST_PATH_IMAGE056
The distance between the two or more of the two or more,
Figure 965998DEST_PATH_IMAGE057
representing nodes
Figure 753826DEST_PATH_IMAGE058
Selecting a node
Figure 249398DEST_PATH_IMAGE058
As a conditional probability of its close point,
Figure 526796DEST_PATH_IMAGE059
representing nodes
Figure 553658DEST_PATH_IMAGE060
And node
Figure 82859DEST_PATH_IMAGE061
Of similarity between them
Figure 511566DEST_PATH_IMAGE062
The distribution of the water content is carried out,
Figure 959865DEST_PATH_IMAGE063
representing nodes
Figure 660974DEST_PATH_IMAGE064
And node
Figure 790604DEST_PATH_IMAGE065
Of similarity between them
Figure 136135DEST_PATH_IMAGE066
The distribution of the water content is carried out,
Figure 958597DEST_PATH_IMAGE067
representing nodes
Figure 897734DEST_PATH_IMAGE068
And node
Figure 627793DEST_PATH_IMAGE068
Of similarity between them
Figure 765513DEST_PATH_IMAGE069
The distribution of the water content is carried out,
Figure 945828DEST_PATH_IMAGE070
representing nodes
Figure 434578DEST_PATH_IMAGE071
Selecting a node
Figure 968327DEST_PATH_IMAGE068
As a conditional probability of its close point,
Figure 898237DEST_PATH_IMAGE072
representing the number of nodes;
student-t distribution based two-dimensional layout space node measurement
Figure 62502DEST_PATH_IMAGE073
And node
Figure 835286DEST_PATH_IMAGE074
Proximity between them, calculating
Figure 110410DEST_PATH_IMAGE075
Distributing; the above-mentioned
Figure 409673DEST_PATH_IMAGE076
The distribution is as follows:
Figure 541577DEST_PATH_IMAGE077
wherein the content of the first and second substances,
Figure 4919DEST_PATH_IMAGE078
representing nodes
Figure 755838DEST_PATH_IMAGE079
And node
Figure 988236DEST_PATH_IMAGE080
The distance in the two-dimensional layout space,
Figure 291041DEST_PATH_IMAGE081
and
Figure 241680DEST_PATH_IMAGE082
respectively representing nodes
Figure 779977DEST_PATH_IMAGE083
And node
Figure 929199DEST_PATH_IMAGE084
The coordinate values in the two-dimensional layout space,
Figure 606168DEST_PATH_IMAGE085
representing nodes
Figure 716206DEST_PATH_IMAGE086
And node
Figure 871244DEST_PATH_IMAGE083
Of similarity between them
Figure 140552DEST_PATH_IMAGE087
The distribution of the water content is carried out,
Figure 644214DEST_PATH_IMAGE088
representing nodes
Figure 303865DEST_PATH_IMAGE089
And node
Figure 324911DEST_PATH_IMAGE090
Of similarity between them
Figure 386408DEST_PATH_IMAGE091
The distribution of the water content is carried out,
Figure 342863DEST_PATH_IMAGE092
and
Figure 286548DEST_PATH_IMAGE093
respectively representing nodes
Figure 783388DEST_PATH_IMAGE094
And the h node
Figure 886342DEST_PATH_IMAGE095
Coordinate values in a two-dimensional layout space;
calculating the said
Figure 872753DEST_PATH_IMAGE096
Distribution and said
Figure 506997DEST_PATH_IMAGE097
Distributed byKLDivergence; the calculation formula is as follows:
Figure 745211DEST_PATH_IMAGE098
when saidKLWhen divergence stops iterative optimization, the two-dimensional coordinate value of each node is obtained, and the two-dimensional coordinate value is used for calculating the divergence of each nodeAnd drawing the nodes with the same label by adopting the same color, drawing the nodes with different labels by adopting different colors, and performing cluster display.
8. A multidimensional graph network node clustering processing device is characterized by comprising:
the network conversion module is used for converting the original unweighted multidimensional graph network into a weighted multidimensional graph network according to the attribute similarity and the structure similarity of the nodes;
the migration processing module is used for carrying out multilayer network random migration processing of in-layer and cross-layer on the weighted multidimensional graph network according to the built in-layer transition probability and cross-layer random migration probability to obtain a sampling sequence of each node of the weighted multidimensional graph network;
the embedding processing module is used for converting the sampling sequence of each node into low-dimensional embedding based on a SkipGram model;
the clustering processing module is used for clustering the low-dimensional embedding of each node by adopting a K-means algorithm to obtain a clustering result of each node of the weighted multi-dimensional graph network;
and the visualization module is used for embedding and projecting each low-dimensional image into a two-dimensional space by adopting a dimension reduction technology and displaying the clustering result by adopting a graph visualization technology.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor when executing the computer program performs the steps of the method for cluster processing of nodes of a multidimensional graph network as claimed in any one of claims 1 to 7.
CN202110645181.3A 2021-06-10 2021-06-10 Multidimensional graph network node clustering processing method, apparatus and device Pending CN113254717A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110645181.3A CN113254717A (en) 2021-06-10 2021-06-10 Multidimensional graph network node clustering processing method, apparatus and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110645181.3A CN113254717A (en) 2021-06-10 2021-06-10 Multidimensional graph network node clustering processing method, apparatus and device

Publications (1)

Publication Number Publication Date
CN113254717A true CN113254717A (en) 2021-08-13

Family

ID=77187250

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110645181.3A Pending CN113254717A (en) 2021-06-10 2021-06-10 Multidimensional graph network node clustering processing method, apparatus and device

Country Status (1)

Country Link
CN (1) CN113254717A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113729686A (en) * 2021-09-23 2021-12-03 南京航空航天大学 Brain local function dynamic real-time measurement system
CN114826921A (en) * 2022-05-05 2022-07-29 苏州大学应用技术学院 Network resource dynamic allocation method, system and medium based on sampling subgraph
CN114819971A (en) * 2022-04-22 2022-07-29 支付宝(杭州)信息技术有限公司 Wind control method based on multi-dimensional relational data, graph clustering method and device
CN115631799A (en) * 2022-12-20 2023-01-20 深圳先进技术研究院 Sample phenotype prediction method and device, electronic equipment and storage medium

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113729686A (en) * 2021-09-23 2021-12-03 南京航空航天大学 Brain local function dynamic real-time measurement system
CN113729686B (en) * 2021-09-23 2023-12-01 南京航空航天大学 Brain local function dynamic real-time measurement system
CN114819971A (en) * 2022-04-22 2022-07-29 支付宝(杭州)信息技术有限公司 Wind control method based on multi-dimensional relational data, graph clustering method and device
CN114826921A (en) * 2022-05-05 2022-07-29 苏州大学应用技术学院 Network resource dynamic allocation method, system and medium based on sampling subgraph
CN114826921B (en) * 2022-05-05 2024-05-17 苏州大学应用技术学院 Dynamic network resource allocation method, system and medium based on sampling subgraph
CN115631799A (en) * 2022-12-20 2023-01-20 深圳先进技术研究院 Sample phenotype prediction method and device, electronic equipment and storage medium
CN115631799B (en) * 2022-12-20 2023-03-28 深圳先进技术研究院 Sample phenotype prediction method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113254717A (en) Multidimensional graph network node clustering processing method, apparatus and device
Refenes et al. Exploratory data analysis by the self-organizing map: Structures of welfare and poverty in the world
Nepusz et al. Fuzzy communities and the concept of bridgeness in complex networks
Fischer et al. Bagging for path-based clustering
CN109389151B (en) Knowledge graph processing method and device based on semi-supervised embedded representation model
Gorban et al. Principal manifolds and graphs in practice: from molecular biology to dynamical systems
CN108171010B (en) Protein complex detection method and device based on semi-supervised network embedded model
CN108764726B (en) Method and device for making decision on request according to rules
KR101866522B1 (en) Object clustering method for image segmentation
Astudillo et al. Imposing tree-based topologies onto self organizing maps
CN110379521A (en) Medical data collection feature selection approach based on information theory
Meng et al. A new quality assessment criterion for nonlinear dimensionality reduction
Tripathy et al. A Study of Algorithm Selection in Data Mining using Meta-Learning.
Kumar et al. Comparative analysis of SOM neural network with K-means clustering algorithm
Basto-Fernandes et al. A survey of diversity oriented optimization: Problems, indicators, and algorithms
Gao et al. A soft-sensor model of VCM rectification concentration based on an improved WOA-RBFNN
Király et al. Geodesic distance based fuzzy c-medoid clustering–searching for central points in graphs and high dimensional data
CN110232151A (en) A kind of construction method of the QoS prediction model of mixing probability distribution detection
Li et al. Unsupervised domain adaptation via discriminative feature learning and classifier adaptation from center-based distances
Sakri et al. Analysis of the dimensionality issues in house price forecasting modeling
CN109409415A (en) A kind of LLE algorithm kept based on global information
Rafi et al. Optimal fuzzy min-max neural network (fmmnn) for medical data classification using modified group search optimizer algorithm
Xun et al. Sparse estimation of historical functional linear models with a nested group bridge approach
Mohammadi et al. An enhanced noise resilient K-associated graph classifier
Meng et al. Passage method for nonlinear dimensionality reduction of data on multi-cluster manifolds

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210813

RJ01 Rejection of invention patent application after publication