CN112925989B

CN112925989B - Group discovery method and system of attribute network

Info

Publication number: CN112925989B
Application number: CN202110127755.8A
Authority: CN
Inventors: 汪晓锋; 王栽胜; 刘伟; 赵本香; 刘睿敏
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2022-04-26
Anticipated expiration: 2041-01-29
Also published as: CN115438272A; CN112925989A

Abstract

The invention belongs to the field of network data mining, and discloses a group discovery method and a group discovery system for an attribute network, which are used for accurately identifying potential group structures in the attribute network and comprise the following steps: acquiring attribute network user interaction behavior data; the method comprises the steps of preprocessing attribute network data to model attribute network topology and node attribute sets; positioning potential clustering center nodes based on the node degree centrality measurement and the relative distance between the nodes; converting the network adjacent matrix into a similar matrix according to the topological structure information, and simultaneously synthesizing a node attribute matrix; applying a multilayer graph convolution model to perform deep fusion on the structure information and the node attributes simultaneously and automatically identifying a complete group structure; and finally evaluating the group discovery result. The invention can be oriented to large-scale attribute network data, reveals the group structure under lower time complexity, has strong universality to complex networks and has higher application value.

Description

Group discovery method and system of attribute network

Technical Field

The invention belongs to the field of graph data mining. In particular to a group discovery method and a group discovery system of an attribute network.

Background

With the continuous development of information technology and internet technology, the contact and interaction between people and environment become common and complicated, thereby forming various complex systems. These complex systems can generally be abstractly described in terms of complex networks, such as online social networks, mobile communication networks, and the like. The complex network relates to the crossing fields of physics, biology, social science, system science, network science and the like, gradually becomes a powerful tool for solving complex problems, and has wide application in numerous fields such as social network analysis, bioengineering, electric power and traffic, human behavior analysis, big data analysis and the like. The network topology formed by the correlated individuals in the complex network systems has randomness and self-organization and shows obvious population aggregation characteristics. Recent research shows that the group structure is an important mesoscopic structural feature ubiquitous in a complex network, and is generally closely related to corresponding functional modules and group attributes in the network. The group discovery reveals the group aggregation characteristics and the functional structure characteristics of the complex network from the perspective, plays a key role in analyzing the problems of node characteristics, structure attributes, group interaction modes and the like of the complex network, and provides important support for researching the complex network structure evolution mechanism, the information propagation rule, the group behaviors and the like.

The group structure corresponds to different functional modules and structural units in a complex network system, and internal nodes of the group structure are connected more closely relative to the groups. For example, in a social network, as social interaction is continuously enhanced, a large number of compact groups are formed based on characteristics such as different interests, themes, professions, regions and the like, and the community structural characteristics are particularly obvious; a group in a communication network represents a communication group or a personal relationship network. Therefore, mining the closely-connected group structure in the network has important application value for understanding and analyzing network structure attributes, information propagation rules, human social organization structures and the like. It is a common method at present to construct the topology of a complex relational network and divide it into different groups or modules that are tightly connected. Typical approaches divide the complex network into different population structures as much as possible, e.g. by maximizing the modularity. However, complex network structures exhibit sparsity as a whole, making such approaches challenging to solve the optimization problem. Experiments show that the method performs well on a small-scale relational network, but cannot obtain the optimal group discovery result on a large-scale complex network. Meanwhile, with the development of big data technology, besides the topological structure, a large amount of multi-source attribute information is accumulated in a complex network, and important influence is generated on the formation and evolution of a group structure, for example, in a financial transaction network, potential abnormal behaviors such as fraud, money laundering and the like can be mined out based on the interaction information and attribute characteristics among users. The above methods typically do not make use of this information, resulting in lower population discovery accuracy and precision. Therefore, group discovery in the attribute network is a problem which needs to be solved urgently and has important application value.

Disclosure of Invention

In view of the foregoing, there is a need to provide a group discovery method for an attribute network, which effectively utilizes the attribute characteristics of a network or a node to compensate for the sparsity problem caused by a network topology structure, and sufficiently fuses the two pieces of information in a group discovery process in an unsupervised learning manner, so as to reduce the computational complexity and improve the accuracy of group discovery.

In order to overcome the defects of the prior art, the invention provides a group discovery method of an attribute network. In the method, connection relations and user attribute characteristics among all node users are determined based on acquired attribute network data, a complex network reflecting user relations and a user attribute set are constructed, clustering center nodes are positioned and labels are distributed according to node degree centrality and distances among the nodes, then a multilayer graph convolution model is constructed for the attribute network, deep fusion of user node structures and node attribute information is achieved, and group discovery is conducted simultaneously. The invention can obtain better group discovery effect in large-scale attribute network.

In order to achieve the purpose, the invention adopts the following technical scheme to realize;

in a first aspect of the present invention, a group discovery method for an attribute network is provided, which includes the following steps:

s1: acquiring interactive behavior data among all users in the attribute network;

s2: preprocessing the acquired data, modeling a complex network structure according to the interactive relation among users, and extracting the attribute information of each node;

s3: positioning a structural center node in the topological network according to the degree centrality measurement and the relative distance of the network node, and distributing a group label;

s4: converting the network adjacent matrix into a similar matrix based on the network topological structure and the attribute information, and synthesizing the node attribute into an attribute matrix;

s5: constructing a multilayer graph volume model for group discovery based on the similarity matrix, the node attribute matrix and the initial group label to obtain a group discovery result of the attribute network;

s6; and evaluating the obtained group discovery result by using the clustering accuracy and the standard mutual information measurement.

In a possible implementation, the step S3 is to locate a cluster center node in the network topology, and further includes:

s31: based on the constructed attribute network, calculating the degree centrality D of each node:

D(i)＝Degree(v_i),i∈[1,N]

wherein v is_iRepresenting the ith node in the network; degree (v)_i) Representing a node v_iDegree of centrality, i.e. node v_iThe number of the neighbor nodes; n is the total number of nodes in the network;

s32; calculating the average centrality of each node in the attribute network, and adding the nodes with the centrality greater than the average centrality into the queue C as candidate cluster central nodes;

s33: c, arranging the candidate central nodes in the C in a descending order according to the degree centrality value;

s34: selecting a first candidate node as a first clustering center node;

s35: setting a truncation distance parameter d_cCalculating the distance (shortest path length) d between all candidate nodes in the current candidate queue C and the first structure center node_spIf a candidate nodal point satisfies d_sp≤d_cDeleting the central node from the candidate queue, otherwise, taking the central node as a second candidate cluster and continuously keeping the central node in the candidate queue;

s36: step S35 is repeatedly performed until all structural center nodes are identified and population labels are assigned. In a possible implementation manner, the similarity matrix transformation and attribute matrix synthesis specifically include:

in a possible implementation manner, the step S4 transforms the similarity matrix according to the local similarity of the nodes and synthesizes the attribute matrix, and the step further includes:

s41; and calculating the local similarity between the network user nodes based on the network topological structure. The local similarity s between nodes is calculated by the following formula:

wherein, N (v)_i) Representation and node v_iThe connected neighbor node set, | | | |, represents a norm where the number of elements in the set is calculated, defining: if i equals j, s_ij＝1；

S42: the network topology is expressed in the form of similar matrix S ═ S_ijDenotes that all node attributes synthesize a matrix X ═ X_iDenotes wherein x_iIs a node v_iA corresponding attribute vector;

in a possible embodiment, the step S5 is to construct a multi-layer graph convolution model for group discovery according to the input of the similarity matrix, the attribute matrix, the initial label, and the like, and the step further includes:

s51; constructing a three-layer graph convolution network model, wherein the output Z can be expressed as:

Z＝softmax(S ReLU(S ReLU(SXW⁽⁰⁾)W⁽¹⁾)W⁽²⁾)

where ReLU and softmax represent two activation functions. Specifically, the activation function ReLU is defined as ReLU (z)_i)＝max(0,z_i) For extracting node v_iCorresponding output z_iThe non-linear characteristic of (a); the activation function softmax is defined as

(| C | represents the length of the cluster center node queue, i.e., the number of population structures); w⁽⁰⁾、W⁽¹⁾And W⁽²⁾Respectively represent each modelThe weight matrix of one layer is automatically updated through a training process after random initialization;

s52; based on the label set of the clustering center node obtained in the step S3, simultaneously inputting the initial label set and the attribute matrix into the model for training;

s53; and finishing training after the model parameters are not updated any more, dividing the nodes with the same label into the same group according to the output of softmax, and finally obtaining the group discovery result of the attribute network.

In one possible embodiment, the attribute network type includes at least one of: a user relationship network in a mobile communication system; social networks in the social media domain; trade network in the field of financial wind control.

In one possible embodiment, the population type includes at least one of: a group of users in a mobile communications network; interest groups in a social network; a fraudulent group in the field of wind control.

In a second aspect of the present invention, a group discovery system for an attribute network includes the following modules:

the attribute network data acquisition module is used for acquiring interactive behavior data among different users in the attribute network;

the network modeling and attribute feature extraction module is used for determining all user nodes and relation connection among the nodes to obtain a network topological structure based on the attribute network data, and selecting user attribute features to obtain a node attribute set;

the cluster center positioning module is used for determining a cluster center in the network and distributing group labels according to the degree centrality measurement of the nodes and the relative distance between the nodes;

the network matrix conversion module is used for converting the adjacent matrix into a similar matrix based on network topology information and constructing an attribute matrix based on the node attribute set;

the graph convolution model creating module is used for building a multilayer graph convolution model based on the network topology structure and the node attributes;

the group discovery module is used for training based on the constructed multilayer graph convolution model to realize group discovery;

and the evaluation analysis module is used for evaluating the obtained group discovery result.

Compared with the prior art, the invention has the following beneficial effects:

high efficiency: the group discovery method provided by the invention adopts a deep learning graph convolution network to carry out deep fusion and training on network topology and attribute information, and establishes an efficient classification model. On one hand, the topological structure of the attribute network has the characteristics of large scale and sparseness on the whole, the time complexity problem caused by network sparseness can be effectively relieved by adding the attribute information, and the group discovery efficiency is improved. On the other hand, the topology information and the node attributes are effectively fused through a multilayer graph convolution model, and a potentially more meaningful group structure can be found based on a small amount of node label information, so that the group discovery performance is improved.

The accuracy is as follows: the invention models the inherent information of the attribute network, does not need any prior knowledge, only excavates the group structure in an unsupervised mode through the network topology and the node attribute, establishes an accurate group discovery model, can be used for processing a large-scale attribute network, and has strong practicability for a real complex network. Compared with the current mainstream methods such as Deepwalk, MGAE, GCN and the like, the accuracy is greatly improved.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flowchart of a group discovery method for an attribute network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the present invention analyzing a real network;

FIG. 3 is a schematic diagram of cluster center node location according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of population discovery results according to an embodiment of the present invention;

FIG. 5 is a block diagram of a group discovery system for an attribute network according to the present invention;

the following specific embodiments will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

The invention is described in further detail below with reference to the accompanying drawings:

referring to fig. 1, a method for discovering a group of an attribute network is provided in an embodiment of the present invention. The method effectively combines network topology structure information and node attribute information to reveal potential group structures in the attribute network. The method comprises the steps of constructing an interactive relationship network and a node attribute set among users based on acquired attribute network data, positioning potential clustering center nodes in the network through local degree centrality measurement and relative distance, converting network similarity matrixes and synthesizing node attribute matrixes on the basis, and performing efficient group discovery by using a multilayer graph convolution model. Aiming at the defects of larger cost and insufficient consideration of attribute information caused by network sparsity in the conventional method, the method utilizes a nonlinear model based on deep learning to fuse information of two aspects of network topology and node attribute and excavate a more reasonable group structure, can obtain a better effect in a large-scale attribute network, and has strong universality and higher application value.

The group discovery method of the attribute network provided by the invention comprises the following steps:

s1: and acquiring interactive behavior data among all users in the attribute network. Specifically, all user information is extracted from the attribute network, and statistical analysis is performed on user behavior data to obtain an interactive relationship between users.

S2: and preprocessing the acquired data, modeling a complex network structure according to the interactive relation among users, and extracting the attribute information of each node. Specifically, each user (ID) in the attribute network is abstracted into different nodes, and the interaction between users is abstracted into connected edges, so as to construct a complex attribute network G ═ (V, E, X), where V represents the set of all user nodes, E represents the set of connected edges between user nodes, and X represents the set formed by the attributes of each node. Meanwhile, operations such as duplicate removal, feature selection, numerical value coding, normalization and the like are required to be performed on the attributes of the nodes so as to obtain a structured node attribute set.

S3: and positioning the structural center nodes in the topological network according to the degree centrality measurement and the relative distance of the network nodes, and distributing the group labels. In the present invention, the following is specifically adopted;

s301: and calculating the degree centrality D (i) of each node based on the connection relation among the user nodes in the constructed network topology G (V, E, X). The specific formula is D (i) ═ Degreee (v)_i),i∈[1,N]Wherein v is_iRepresenting the ith node in the network; degree (v)_i) Representing a node v_iDegree of (i.e. node v)_iThe number of neighbors of (2); n is the number of network nodes;

s302; according to the degree centrality distribution of the nodes, the average degree centrality of the nodes is counted

Comparing the centrality of each node if

Then node v_iThe candidate cluster center node is used as a candidate cluster center node and added into a queue C;

s303: c, arranging the candidate central nodes in the C in a descending order according to the degree centrality value;

s304: selecting a first candidate clustering center node as a first clustering center node;

s305: setting a truncation distance parameter d_cCalculating the relative distance between all candidate nodes in the current candidate queue C and the first cluster center node, namely the shortest path length d_sp. If a candidate node center satisfies d_sp≤d_cIf not, the central node is taken as a second candidate cluster central node and continuously kept in the candidate queue. Wherein with respect to the distance parameter d_cIs generally determined empirically. In the present embodiment, d is set_cExperiments show that the setting has no influence on the final group discovery result;

s306: step S305 is repeatedly performed until all cluster center nodes are identified and the community label is assigned. Specifically, each cluster center node is assigned a separate label, and the labels determine the subsequent population structure categories.

S4: and converting the network adjacency matrix into a similar matrix based on the network topological structure and the attribute information, and synthesizing the node attributes into an attribute matrix. Specifically, the present embodiment is implemented in the following manner;

s401; according to the network topological structure information, calculating the local similarity between the network user nodes, and obtaining the local similarity through the following formula:

S402: the network topology is expressed in the form of similar matrix S ═ S_ijAn abstract representation; simultaneously, the attribute characteristics of each node are expressed by characteristic vectors with the same length, and a node attribute matrix X is synthesized to be { X }_iIn which x_iIs a node v_iA corresponding attribute vector;

in the present embodiment, the similarity matrix constructed by step S402 reflects a richer network connection relationship than a simple adjacency matrix. The adjacency matrix only describes the connection between adjacent nodes, if a connecting edge exists between two nodes, the value in the adjacency matrix is 1, otherwise, the value is 0, and the whole is sparse. The similar matrix not only depicts the fuzzy relation of adjacent nodes, but also further reflects the interactive relation between non-adjacent nodes, and the value range of matrix elements is 0 to 1, thereby providing effective guarantee for subsequently revealing high-quality group discovery results.

S5: and constructing a multilayer graph volume model for group discovery based on the similarity matrix, the node attribute matrix and the initial group label to obtain a group discovery result of the attribute network.

Specifically, the group discovery method provided by the invention adopts a deep learning technology, namely a graph convolution network model. The model jointly learns the node topological structure and the node attributes and infers the group labels of the unmarked nodes based on the initial clustering label information. The method realizes effective fusion of network topology and attribute characteristics, and is beneficial to revealing the group structure in a natural sense.

Specifically, the steps of constructing a graph convolution model for group discovery and learning a label comprise the following steps:

s501; constructing a three-layer graph convolution network model, wherein the output Z can be expressed as:

Z＝softmax(S ReLU(S ReLU(SXW⁽⁰⁾)W⁽¹⁾)W⁽²⁾)

(| C | represents the length of the cluster center node queue, i.e., the number of population structures); w⁽⁰⁾、W⁽¹⁾And W⁽²⁾Respectively representing the weight matrix of each layer of the model, and obtaining automatic updating through a training process after random initialization;

s502; based on the structural center node and the label set thereof obtained in step S3, the model is trained and label information of all nodes is input. Specifically, in the training process of the model, attribute information of each node is updated based on similarity between nodes and node attribute vectors in each iteration process, and nodes with similar attributes are endowed with the same label; and calculating the error update model parameters of the predicted label and the node real label until the node label is not changed any more or the specified iteration times are finished. In this embodiment, the number of iterations of the model is 200. In addition, the similarity between nodes actually plays a role in adjusting the edge weight;

s503; after training is finished, the nodes with the same label are divided into the same group according to the output of softmax, and finally the group discovery result of the attribute network is obtained. Each classified node is endowed with a label in the final prediction result of the model, the label corresponds to the node of the clustering center, then the nodes with the same label are classified into the same group, and finally the group division corresponding to the whole attribute network is obtained;

s6; and evaluating the obtained group discovery result by using the clustering accuracy and the standard mutual information measurement. The clustering accuracy measures the proportion of the correct number of labels in the group discovery result to the labels of the whole nodes. And the standard mutual information measures the similarity between the prediction result and the real label from the angle of the information entropy, and the larger the value of the standard mutual information is, the closer the real group structure division is represented. In the embodiment, a large number of experiments are compared to find that the group discovery method provided by the invention has great advantages and the algorithm performance is obviously improved.

The attribute network type of the present invention includes, but is not limited to, a user relationship network in a mobile communication system, a social network in the social media field, and a transaction network in the financial wind control field. The group types include, but are not limited to, a user group in a mobile communication network, an interest group in a social network, and a fraud group in a wind control domain.

The invention also discloses a group discovery system of the attribute network, which is shown as a system structure block diagram in fig. 5 and specifically comprises the following modules:

As a further improvement, the attribute network data acquisition module further comprises extracting relevant information of all users from the attribute network, including data reflecting interactive behaviors among the users and attribute data of the users;

as a further improvement scheme, the network modeling and attribute feature extraction module specifically comprises two sub-modules, namely a network topological structure modeling module and a node attribute feature extraction module. The network topological structure modeling module identifies each user in the attribute network as a node and establishes a link according to an interactive relation among the users, and finally obtains the topological structure of the whole attribute network; the node attribute feature extraction module acquires attribute sets of all user nodes through processes of data integration, cleaning, feature selection, numerical value coding, normalization and the like;

as a further improvement, the cluster center positioning module specifically implements the cluster center node positioning step in the method provided by the present invention, and calculates the Degree centrality (d (i) ═ Degree (v) of the node according to the topological structure relationship of the attribute network_i) Wherein v is_iOn behalf of the ith node in the network) toSelecting candidate clustering center nodes, and then comparing the relative distance between the candidate clustering centers to determine the final clustering center node and distribute a group label;

as a further improvement, the network matrix transformation module specifically implements step S4 in the method provided by the present invention, which includes two parts, namely, similar matrix transformation and attribute matrix synthesis. Wherein the similarity matrix transformation is performed by calculating local similarities s between nodes_ij＝||N(v_i)∩N(v_j)||/||N(v_i)∩N(v_j) I (wherein, N (v)_i) A set of neighbor nodes representing node i) converts the adjacency matrix into a similarity matrix S; the attribute matrix synthesis is based on that each node is an attribute feature set to construct an attribute matrix X corresponding to the whole attribute network;

as a further improvement scheme, the graph convolution model creating module constructs a multilayer graph convolution model according to the similarity matrix, the attribute matrix and the clustering center node. The dimensions of the similarity matrix and the attribute matrix determine the number of neurons in an input layer of the model, and the number of nodes in the center of the cluster determines the output of the model. The model utilizes a deep neural network to carry out joint learning on a node topological structure and node attributes, and simultaneously carries out local aggregation on node characteristics;

as a further improvement, the group discovery module implements step S5 in the method provided by the present invention. Based on the constructed multilayer graph convolution model, the module takes label information of a clustering center node as input, trains the graph convolution model by utilizing graph convolution operation and a back propagation algorithm of a neural network, propagates the label information of the clustering center node to a global network, infers labels of unmarked nodes, and outputs the group labels of the unmarked nodes through a trained Softmax classifier, thereby realizing group discovery. The module output is specifically denoted as Z ═ softmax (S ReLU (SXW)⁽⁰⁾)W⁽¹⁾)W⁽²⁾) Wherein, ReLU is another non-linear activation function mentioned in the present invention, W is the weight matrix of the convolution layer, and is automatically updated in the training process;

as a further improvement, the evaluation and analysis module implements step S6 in the method of the present invention. The module also includes an evaluation index-clustering accuracy for evaluating the population discovery results. The accuracy measures the proportion of the number of correctly divided user nodes in the group discovery result to the whole node scale, directly reflects the effectiveness of the provided method, and provides feedback to the group discovery module to further improve the system performance.

To further illustrate the effectiveness and scalability of the provided methods, the present invention experimentally conducted the following examples.

One embodiment of the present invention selects a small-scale real network for experiment, and further describes the process and effect of the provided method through a visual manner. The experiment is realized by adopting a Tensorflow deep learning framework and Python software programming under the environment of a Windows 7 operating system with a CPU Intel Pentium Dual-Core 2.0GHz and an RAM 8.00 GB. In order to improve the calculation efficiency and accuracy, scientific calculation packages Numpy and Scipy are adopted in the experiment.

The real network used for the experiment is the Zachary air channel club social network. The network is based on the long-term investigation and observation of the interaction relationship among members of a null track club by W.W.Zachery, and an interactive network with 34 members and 78 edges is constructed, as shown in FIG. 1. During the observation process, the network eventually splits into two smaller community structures due to the divergence between the club manager and the coach. The network is widely used for testing the effectiveness and reliability of the group discovery method.

According to the group discovery method provided by the invention, on the basis of the constructed network topology and member attribute information, firstly, clustering center nodes in the network are positioned according to the degree centrality and the relative distance of the nodes (two nodes in a virtual circle pointed by an arrow in figure 3 accurately correspond to a manager and a coach in the network), and labels are distributed; then calculating the similarity among the member nodes to obtain a similar matrix and synthesizing an attribute matrix; then constructing a multilayer graph volume model and training; finally, the group finding result is obtained, as shown in fig. 4.

In experiments, the method provided by the invention can effectively identify the real split phenomenon in the network, namely two groups with obvious group structures are found. As shown in fig. 4, two population structures are represented by circles and squares, respectively. The result is completely consistent with the real group division by comparing with the real labels of the member nodes.

To further verify the technical effect of the present invention, the present embodiment performs a comparative experiment on the real data set:

TABLE 1 comparison of the population discovery accuracy of the method of the invention and 3 representative algorithms

The method collects a large amount of user interaction data from the actual attribute network. For comparison with other algorithms, 5 data sets are selected in the experiment, 30% of each data set is used as a training set, 70% of each data set is used as a test set, and the number of the training sets required by the method is determined by the number of the cluster center nodes and the adjacent nodes, and the number is actually far lower than the proportion of the reference training set. The invention is compared with 3 representative methods respectively, and specifically comprises the following steps: a traditional group discovery method Infomap only utilizes network topology information to identify group structure, and can obtain better effect in similar algorithms; a graph embedding method MGAE learns node structure and attribute feature representation through an autoencoder, realizes group identification by using a traditional clustering method, and is better in performance on a small data set; a semi-supervised graph neural network method GCN considers network topology and node attribute information at the same time, aggregates node characteristics and deduces node labels through convolution operation, and performs well under the condition that training labels are sufficient. See table 1 for experimental data comparing the accuracy of population discovery for the present invention and the above method.

Table 1 above gives the accuracy of the method of the invention compared to three representative methods in a population discovery task. Compared with Infomap, MGAE and GCN, the method provided by the invention has the advantages that the identification accuracy is improved by 22.9%, 9.58% and 7.68% on average. The method of the invention can obtain better performance, and the reason is that: on one hand, the method fully considers the network topology and the node attribute information in the group discovery process, and effectively ensures the accuracy of group discovery. The performance is improved by adding the attribute information, and the attribute information can be reflected from the experimental results of MGAE and GCN; on the other hand, the method adopts a strategy of clustering center node positioning, so that the graph convolution model can be converged more quickly in the training process, and a better result is obtained. The GCN also identifies the population labels of the nodes based on graph convolution operation, but in practice, a large number of a priori labels are needed for training, and it is difficult to ensure that the labels are uniformly distributed in the network to effectively deliver label information to the global network,

the following are embodiments of systems of the present invention that may be used to perform method embodiments of the present invention. For details not described in the system embodiment, reference is made to the method embodiment of the present invention.

In yet another embodiment of the present invention, a group discovery system for an attribute network is provided. The group discovery system of the attribute network is used for realizing the group discovery method of the attribute network, and particularly comprises an attribute network data acquisition module, a network structure modeling module, a clustering center positioning module, a matrix conversion module, a graph volume model creation module, a group discovery module and an evaluation analysis module.

The cluster center positioning module determines a cluster center in the network and distributes a group label according to the degree centrality measurement of the nodes and the relative distance between the nodes; the network matrix conversion module converts the adjacent matrix into a similar matrix based on the network topology information and constructs an attribute matrix based on the node attribute set; the graph convolution model creating module builds a multilayer graph convolution model based on the network topology structure and the node attributes; the group discovery module trains based on the constructed graph volume model to realize group discovery; and the evaluation analysis module is used for evaluating the obtained group discovery result.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A group discovery method of an attribute network is characterized by comprising the following steps:

s3: positioning clustering center nodes in the topological network according to the degree centrality measurement and the relative distance of the network nodes, and distributing group labels;

s6; evaluating the obtained group discovery result by using clustering accuracy and standard mutual information measurement;

wherein the step S3 further includes:

D(i)＝Degree(v_i),i∈[1,N]

s34: selecting a first candidate node as a first clustering center node;

s35: setting a truncation distance parameter d_cCalculating the shortest distance path length d between all candidate nodes in the current candidate queue C and the first structure center node_spIf a candidate nodal point satisfies d_sp≤d_cDeleting the central node from the candidate queue, otherwise, taking the central node as a second candidate cluster and continuously keeping the central node in the candidate queue;

s36: repeatedly executing the step S35 until all the structure center nodes are identified and distributing the group labels;

wherein the step S4 further includes:

s41; based on the network topological structure, calculating the local similarity between the network user nodes; the local similarity s between nodes is calculated by the following formula:

wherein the step S5 further includes:

Z＝softmax(SReLU(SReLU(SXW⁽⁰⁾)W⁽¹⁾)W⁽²⁾)

wherein ReLU and softmax represent two activation functions; specifically, the activation function ReLU is defined as ReLU (z)_i)＝max(0,z_i) For extracting node v_iCorresponding output z_iThe non-linear characteristic of (a); the activation function softmax is defined as

| C | represents the length of a clustering center node queue, namely the number of group structures; w⁽⁰⁾、W⁽¹⁾And W⁽²⁾Respectively representing the weight matrix of each layer of the model, and obtaining automatic updating through a training process after random initialization;

2. The group discovery method of the attribute network of claim 1, wherein the attribute network type comprises at least one of:

a user relationship network in a mobile communication system;

social networks in the social media domain;

trade network in the field of financial wind control.

3. The group discovery method of the attribute network of claim 1, wherein the group type comprises at least one of:

consumer groups in a mobile communication network;

interest groups in a social network;

a fraudulent group in the field of wind control.