CN112925989B - Group discovery method and system of attribute network - Google Patents

Group discovery method and system of attribute network Download PDF

Info

Publication number
CN112925989B
CN112925989B CN202110127755.8A CN202110127755A CN112925989B CN 112925989 B CN112925989 B CN 112925989B CN 202110127755 A CN202110127755 A CN 202110127755A CN 112925989 B CN112925989 B CN 112925989B
Authority
CN
China
Prior art keywords
network
node
attribute
group
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110127755.8A
Other languages
Chinese (zh)
Other versions
CN112925989A (en
Inventor
汪晓锋
王栽胜
刘伟
赵本香
刘睿敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN202210313002.0A priority Critical patent/CN115438272A/en
Priority to CN202110127755.8A priority patent/CN112925989B/en
Publication of CN112925989A publication Critical patent/CN112925989A/en
Application granted granted Critical
Publication of CN112925989B publication Critical patent/CN112925989B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Business, Economics & Management (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention belongs to the field of network data mining, and discloses a group discovery method and a group discovery system for an attribute network, which are used for accurately identifying potential group structures in the attribute network and comprise the following steps: acquiring attribute network user interaction behavior data; the method comprises the steps of preprocessing attribute network data to model attribute network topology and node attribute sets; positioning potential clustering center nodes based on the node degree centrality measurement and the relative distance between the nodes; converting the network adjacent matrix into a similar matrix according to the topological structure information, and simultaneously synthesizing a node attribute matrix; applying a multilayer graph convolution model to perform deep fusion on the structure information and the node attributes simultaneously and automatically identifying a complete group structure; and finally evaluating the group discovery result. The invention can be oriented to large-scale attribute network data, reveals the group structure under lower time complexity, has strong universality to complex networks and has higher application value.

Description

Group discovery method and system of attribute network
Technical Field
The invention belongs to the field of graph data mining. In particular to a group discovery method and a group discovery system of an attribute network.
Background
With the continuous development of information technology and internet technology, the contact and interaction between people and environment become common and complicated, thereby forming various complex systems. These complex systems can generally be abstractly described in terms of complex networks, such as online social networks, mobile communication networks, and the like. The complex network relates to the crossing fields of physics, biology, social science, system science, network science and the like, gradually becomes a powerful tool for solving complex problems, and has wide application in numerous fields such as social network analysis, bioengineering, electric power and traffic, human behavior analysis, big data analysis and the like. The network topology formed by the correlated individuals in the complex network systems has randomness and self-organization and shows obvious population aggregation characteristics. Recent research shows that the group structure is an important mesoscopic structural feature ubiquitous in a complex network, and is generally closely related to corresponding functional modules and group attributes in the network. The group discovery reveals the group aggregation characteristics and the functional structure characteristics of the complex network from the perspective, plays a key role in analyzing the problems of node characteristics, structure attributes, group interaction modes and the like of the complex network, and provides important support for researching the complex network structure evolution mechanism, the information propagation rule, the group behaviors and the like.
The group structure corresponds to different functional modules and structural units in a complex network system, and internal nodes of the group structure are connected more closely relative to the groups. For example, in a social network, as social interaction is continuously enhanced, a large number of compact groups are formed based on characteristics such as different interests, themes, professions, regions and the like, and the community structural characteristics are particularly obvious; a group in a communication network represents a communication group or a personal relationship network. Therefore, mining the closely-connected group structure in the network has important application value for understanding and analyzing network structure attributes, information propagation rules, human social organization structures and the like. It is a common method at present to construct the topology of a complex relational network and divide it into different groups or modules that are tightly connected. Typical approaches divide the complex network into different population structures as much as possible, e.g. by maximizing the modularity. However, complex network structures exhibit sparsity as a whole, making such approaches challenging to solve the optimization problem. Experiments show that the method performs well on a small-scale relational network, but cannot obtain the optimal group discovery result on a large-scale complex network. Meanwhile, with the development of big data technology, besides the topological structure, a large amount of multi-source attribute information is accumulated in a complex network, and important influence is generated on the formation and evolution of a group structure, for example, in a financial transaction network, potential abnormal behaviors such as fraud, money laundering and the like can be mined out based on the interaction information and attribute characteristics among users. The above methods typically do not make use of this information, resulting in lower population discovery accuracy and precision. Therefore, group discovery in the attribute network is a problem which needs to be solved urgently and has important application value.
Disclosure of Invention
In view of the foregoing, there is a need to provide a group discovery method for an attribute network, which effectively utilizes the attribute characteristics of a network or a node to compensate for the sparsity problem caused by a network topology structure, and sufficiently fuses the two pieces of information in a group discovery process in an unsupervised learning manner, so as to reduce the computational complexity and improve the accuracy of group discovery.
In order to overcome the defects of the prior art, the invention provides a group discovery method of an attribute network. In the method, connection relations and user attribute characteristics among all node users are determined based on acquired attribute network data, a complex network reflecting user relations and a user attribute set are constructed, clustering center nodes are positioned and labels are distributed according to node degree centrality and distances among the nodes, then a multilayer graph convolution model is constructed for the attribute network, deep fusion of user node structures and node attribute information is achieved, and group discovery is conducted simultaneously. The invention can obtain better group discovery effect in large-scale attribute network.
In order to achieve the purpose, the invention adopts the following technical scheme to realize;
in a first aspect of the present invention, a group discovery method for an attribute network is provided, which includes the following steps:
s1: acquiring interactive behavior data among all users in the attribute network;
s2: preprocessing the acquired data, modeling a complex network structure according to the interactive relation among users, and extracting the attribute information of each node;
s3: positioning a structural center node in the topological network according to the degree centrality measurement and the relative distance of the network node, and distributing a group label;
s4: converting the network adjacent matrix into a similar matrix based on the network topological structure and the attribute information, and synthesizing the node attribute into an attribute matrix;
s5: constructing a multilayer graph volume model for group discovery based on the similarity matrix, the node attribute matrix and the initial group label to obtain a group discovery result of the attribute network;
s6; and evaluating the obtained group discovery result by using the clustering accuracy and the standard mutual information measurement.
In a possible implementation, the step S3 is to locate a cluster center node in the network topology, and further includes:
s31: based on the constructed attribute network, calculating the degree centrality D of each node:
D(i)=Degree(vi),i∈[1,N]
wherein v isiRepresenting the ith node in the network; degree (v)i) Representing a node viDegree of centrality, i.e. node viThe number of the neighbor nodes; n is the total number of nodes in the network;
s32; calculating the average centrality of each node in the attribute network, and adding the nodes with the centrality greater than the average centrality into the queue C as candidate cluster central nodes;
s33: c, arranging the candidate central nodes in the C in a descending order according to the degree centrality value;
s34: selecting a first candidate node as a first clustering center node;
s35: setting a truncation distance parameter dcCalculating the distance (shortest path length) d between all candidate nodes in the current candidate queue C and the first structure center nodespIf a candidate nodal point satisfies dsp≤dcDeleting the central node from the candidate queue, otherwise, taking the central node as a second candidate cluster and continuously keeping the central node in the candidate queue;
s36: step S35 is repeatedly performed until all structural center nodes are identified and population labels are assigned. In a possible implementation manner, the similarity matrix transformation and attribute matrix synthesis specifically include:
in a possible implementation manner, the step S4 transforms the similarity matrix according to the local similarity of the nodes and synthesizes the attribute matrix, and the step further includes:
s41; and calculating the local similarity between the network user nodes based on the network topological structure. The local similarity s between nodes is calculated by the following formula:
Figure BDA0002924062320000031
wherein, N (v)i) Representation and node viThe connected neighbor node set, | | | |, represents a norm where the number of elements in the set is calculated, defining: if i equals j, sij=1;
S42: the network topology is expressed in the form of similar matrix S ═ SijDenotes that all node attributes synthesize a matrix X ═ XiDenotes wherein xiIs a node viA corresponding attribute vector;
in a possible embodiment, the step S5 is to construct a multi-layer graph convolution model for group discovery according to the input of the similarity matrix, the attribute matrix, the initial label, and the like, and the step further includes:
s51; constructing a three-layer graph convolution network model, wherein the output Z can be expressed as:
Z=softmax(S ReLU(S ReLU(SXW(0))W(1))W(2))
where ReLU and softmax represent two activation functions. Specifically, the activation function ReLU is defined as ReLU (z)i)=max(0,zi) For extracting node viCorresponding output ziThe non-linear characteristic of (a); the activation function softmax is defined as
Figure BDA0002924062320000042
(| C | represents the length of the cluster center node queue, i.e., the number of population structures); w(0)、W(1)And W(2)Respectively represent each modelThe weight matrix of one layer is automatically updated through a training process after random initialization;
s52; based on the label set of the clustering center node obtained in the step S3, simultaneously inputting the initial label set and the attribute matrix into the model for training;
s53; and finishing training after the model parameters are not updated any more, dividing the nodes with the same label into the same group according to the output of softmax, and finally obtaining the group discovery result of the attribute network.
In one possible embodiment, the attribute network type includes at least one of: a user relationship network in a mobile communication system; social networks in the social media domain; trade network in the field of financial wind control.
In one possible embodiment, the population type includes at least one of: a group of users in a mobile communications network; interest groups in a social network; a fraudulent group in the field of wind control.
In a second aspect of the present invention, a group discovery system for an attribute network includes the following modules:
the attribute network data acquisition module is used for acquiring interactive behavior data among different users in the attribute network;
the network modeling and attribute feature extraction module is used for determining all user nodes and relation connection among the nodes to obtain a network topological structure based on the attribute network data, and selecting user attribute features to obtain a node attribute set;
the cluster center positioning module is used for determining a cluster center in the network and distributing group labels according to the degree centrality measurement of the nodes and the relative distance between the nodes;
the network matrix conversion module is used for converting the adjacent matrix into a similar matrix based on network topology information and constructing an attribute matrix based on the node attribute set;
the graph convolution model creating module is used for building a multilayer graph convolution model based on the network topology structure and the node attributes;
the group discovery module is used for training based on the constructed multilayer graph convolution model to realize group discovery;
and the evaluation analysis module is used for evaluating the obtained group discovery result.
Compared with the prior art, the invention has the following beneficial effects:
high efficiency: the group discovery method provided by the invention adopts a deep learning graph convolution network to carry out deep fusion and training on network topology and attribute information, and establishes an efficient classification model. On one hand, the topological structure of the attribute network has the characteristics of large scale and sparseness on the whole, the time complexity problem caused by network sparseness can be effectively relieved by adding the attribute information, and the group discovery efficiency is improved. On the other hand, the topology information and the node attributes are effectively fused through a multilayer graph convolution model, and a potentially more meaningful group structure can be found based on a small amount of node label information, so that the group discovery performance is improved.
The accuracy is as follows: the invention models the inherent information of the attribute network, does not need any prior knowledge, only excavates the group structure in an unsupervised mode through the network topology and the node attribute, establishes an accurate group discovery model, can be used for processing a large-scale attribute network, and has strong practicability for a real complex network. Compared with the current mainstream methods such as Deepwalk, MGAE, GCN and the like, the accuracy is greatly improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flowchart of a group discovery method for an attribute network according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the present invention analyzing a real network;
FIG. 3 is a schematic diagram of cluster center node location according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of population discovery results according to an embodiment of the present invention;
FIG. 5 is a block diagram of a group discovery system for an attribute network according to the present invention;
the following specific embodiments will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention is described in further detail below with reference to the accompanying drawings:
referring to fig. 1, a method for discovering a group of an attribute network is provided in an embodiment of the present invention. The method effectively combines network topology structure information and node attribute information to reveal potential group structures in the attribute network. The method comprises the steps of constructing an interactive relationship network and a node attribute set among users based on acquired attribute network data, positioning potential clustering center nodes in the network through local degree centrality measurement and relative distance, converting network similarity matrixes and synthesizing node attribute matrixes on the basis, and performing efficient group discovery by using a multilayer graph convolution model. Aiming at the defects of larger cost and insufficient consideration of attribute information caused by network sparsity in the conventional method, the method utilizes a nonlinear model based on deep learning to fuse information of two aspects of network topology and node attribute and excavate a more reasonable group structure, can obtain a better effect in a large-scale attribute network, and has strong universality and higher application value.
The group discovery method of the attribute network provided by the invention comprises the following steps:
s1: and acquiring interactive behavior data among all users in the attribute network. Specifically, all user information is extracted from the attribute network, and statistical analysis is performed on user behavior data to obtain an interactive relationship between users.
S2: and preprocessing the acquired data, modeling a complex network structure according to the interactive relation among users, and extracting the attribute information of each node. Specifically, each user (ID) in the attribute network is abstracted into different nodes, and the interaction between users is abstracted into connected edges, so as to construct a complex attribute network G ═ (V, E, X), where V represents the set of all user nodes, E represents the set of connected edges between user nodes, and X represents the set formed by the attributes of each node. Meanwhile, operations such as duplicate removal, feature selection, numerical value coding, normalization and the like are required to be performed on the attributes of the nodes so as to obtain a structured node attribute set.
S3: and positioning the structural center nodes in the topological network according to the degree centrality measurement and the relative distance of the network nodes, and distributing the group labels. In the present invention, the following is specifically adopted;
s301: and calculating the degree centrality D (i) of each node based on the connection relation among the user nodes in the constructed network topology G (V, E, X). The specific formula is D (i) ═ Degreee (v)i),i∈[1,N]Wherein v isiRepresenting the ith node in the network; degree (v)i) Representing a node viDegree of (i.e. node v)iThe number of neighbors of (2); n is the number of network nodes;
s302; according to the degree centrality distribution of the nodes, the average degree centrality of the nodes is counted
Figure BDA0002924062320000061
Comparing the centrality of each node if
Figure BDA0002924062320000062
Then node viThe candidate cluster center node is used as a candidate cluster center node and added into a queue C;
s303: c, arranging the candidate central nodes in the C in a descending order according to the degree centrality value;
s304: selecting a first candidate clustering center node as a first clustering center node;
s305: setting a truncation distance parameter dcCalculating the relative distance between all candidate nodes in the current candidate queue C and the first cluster center node, namely the shortest path length dsp. If a candidate node center satisfies dsp≤dcIf not, the central node is taken as a second candidate cluster central node and continuously kept in the candidate queue. Wherein with respect to the distance parameter dcIs generally determined empirically. In the present embodiment, d is setcExperiments show that the setting has no influence on the final group discovery result;
s306: step S305 is repeatedly performed until all cluster center nodes are identified and the community label is assigned. Specifically, each cluster center node is assigned a separate label, and the labels determine the subsequent population structure categories.
S4: and converting the network adjacency matrix into a similar matrix based on the network topological structure and the attribute information, and synthesizing the node attributes into an attribute matrix. Specifically, the present embodiment is implemented in the following manner;
s401; according to the network topological structure information, calculating the local similarity between the network user nodes, and obtaining the local similarity through the following formula:
Figure BDA0002924062320000063
wherein, N (v)i) Representation and node viThe connected neighbor node set, | | | |, represents a norm where the number of elements in the set is calculated, defining: if i equals j, sij=1;
S402: the network topology is expressed in the form of similar matrix S ═ SijAn abstract representation; simultaneously, the attribute characteristics of each node are expressed by characteristic vectors with the same length, and a node attribute matrix X is synthesized to be { X }iIn which xiIs a node viA corresponding attribute vector;
in the present embodiment, the similarity matrix constructed by step S402 reflects a richer network connection relationship than a simple adjacency matrix. The adjacency matrix only describes the connection between adjacent nodes, if a connecting edge exists between two nodes, the value in the adjacency matrix is 1, otherwise, the value is 0, and the whole is sparse. The similar matrix not only depicts the fuzzy relation of adjacent nodes, but also further reflects the interactive relation between non-adjacent nodes, and the value range of matrix elements is 0 to 1, thereby providing effective guarantee for subsequently revealing high-quality group discovery results.
S5: and constructing a multilayer graph volume model for group discovery based on the similarity matrix, the node attribute matrix and the initial group label to obtain a group discovery result of the attribute network.
Specifically, the group discovery method provided by the invention adopts a deep learning technology, namely a graph convolution network model. The model jointly learns the node topological structure and the node attributes and infers the group labels of the unmarked nodes based on the initial clustering label information. The method realizes effective fusion of network topology and attribute characteristics, and is beneficial to revealing the group structure in a natural sense.
Specifically, the steps of constructing a graph convolution model for group discovery and learning a label comprise the following steps:
s501; constructing a three-layer graph convolution network model, wherein the output Z can be expressed as:
Z=softmax(S ReLU(S ReLU(SXW(0))W(1))W(2))
where ReLU and softmax represent two activation functions. Specifically, the activation function ReLU is defined as ReLU (z)i)=max(0,zi) For extracting node viCorresponding output ziThe non-linear characteristic of (a); the activation function softmax is defined as
Figure BDA0002924062320000071
(| C | represents the length of the cluster center node queue, i.e., the number of population structures); w(0)、W(1)And W(2)Respectively representing the weight matrix of each layer of the model, and obtaining automatic updating through a training process after random initialization;
s502; based on the structural center node and the label set thereof obtained in step S3, the model is trained and label information of all nodes is input. Specifically, in the training process of the model, attribute information of each node is updated based on similarity between nodes and node attribute vectors in each iteration process, and nodes with similar attributes are endowed with the same label; and calculating the error update model parameters of the predicted label and the node real label until the node label is not changed any more or the specified iteration times are finished. In this embodiment, the number of iterations of the model is 200. In addition, the similarity between nodes actually plays a role in adjusting the edge weight;
s503; after training is finished, the nodes with the same label are divided into the same group according to the output of softmax, and finally the group discovery result of the attribute network is obtained. Each classified node is endowed with a label in the final prediction result of the model, the label corresponds to the node of the clustering center, then the nodes with the same label are classified into the same group, and finally the group division corresponding to the whole attribute network is obtained;
s6; and evaluating the obtained group discovery result by using the clustering accuracy and the standard mutual information measurement. The clustering accuracy measures the proportion of the correct number of labels in the group discovery result to the labels of the whole nodes. And the standard mutual information measures the similarity between the prediction result and the real label from the angle of the information entropy, and the larger the value of the standard mutual information is, the closer the real group structure division is represented. In the embodiment, a large number of experiments are compared to find that the group discovery method provided by the invention has great advantages and the algorithm performance is obviously improved.
The attribute network type of the present invention includes, but is not limited to, a user relationship network in a mobile communication system, a social network in the social media field, and a transaction network in the financial wind control field. The group types include, but are not limited to, a user group in a mobile communication network, an interest group in a social network, and a fraud group in a wind control domain.
The invention also discloses a group discovery system of the attribute network, which is shown as a system structure block diagram in fig. 5 and specifically comprises the following modules:
the attribute network data acquisition module is used for acquiring interactive behavior data among different users in the attribute network;
the network modeling and attribute feature extraction module is used for determining all user nodes and relation connection among the nodes to obtain a network topological structure based on the attribute network data, and selecting user attribute features to obtain a node attribute set;
the cluster center positioning module is used for determining a cluster center in the network and distributing group labels according to the degree centrality measurement of the nodes and the relative distance between the nodes;
the network matrix conversion module is used for converting the adjacent matrix into a similar matrix based on network topology information and constructing an attribute matrix based on the node attribute set;
the graph convolution model creating module is used for building a multilayer graph convolution model based on the network topology structure and the node attributes;
the group discovery module is used for training based on the constructed multilayer graph convolution model to realize group discovery;
and the evaluation analysis module is used for evaluating the obtained group discovery result.
As a further improvement, the attribute network data acquisition module further comprises extracting relevant information of all users from the attribute network, including data reflecting interactive behaviors among the users and attribute data of the users;
as a further improvement scheme, the network modeling and attribute feature extraction module specifically comprises two sub-modules, namely a network topological structure modeling module and a node attribute feature extraction module. The network topological structure modeling module identifies each user in the attribute network as a node and establishes a link according to an interactive relation among the users, and finally obtains the topological structure of the whole attribute network; the node attribute feature extraction module acquires attribute sets of all user nodes through processes of data integration, cleaning, feature selection, numerical value coding, normalization and the like;
as a further improvement, the cluster center positioning module specifically implements the cluster center node positioning step in the method provided by the present invention, and calculates the Degree centrality (d (i) ═ Degree (v) of the node according to the topological structure relationship of the attribute networki) Wherein v isiOn behalf of the ith node in the network) toSelecting candidate clustering center nodes, and then comparing the relative distance between the candidate clustering centers to determine the final clustering center node and distribute a group label;
as a further improvement, the network matrix transformation module specifically implements step S4 in the method provided by the present invention, which includes two parts, namely, similar matrix transformation and attribute matrix synthesis. Wherein the similarity matrix transformation is performed by calculating local similarities s between nodesij=||N(vi)∩N(vj)||/||N(vi)∩N(vj) I (wherein, N (v)i) A set of neighbor nodes representing node i) converts the adjacency matrix into a similarity matrix S; the attribute matrix synthesis is based on that each node is an attribute feature set to construct an attribute matrix X corresponding to the whole attribute network;
as a further improvement scheme, the graph convolution model creating module constructs a multilayer graph convolution model according to the similarity matrix, the attribute matrix and the clustering center node. The dimensions of the similarity matrix and the attribute matrix determine the number of neurons in an input layer of the model, and the number of nodes in the center of the cluster determines the output of the model. The model utilizes a deep neural network to carry out joint learning on a node topological structure and node attributes, and simultaneously carries out local aggregation on node characteristics;
as a further improvement, the group discovery module implements step S5 in the method provided by the present invention. Based on the constructed multilayer graph convolution model, the module takes label information of a clustering center node as input, trains the graph convolution model by utilizing graph convolution operation and a back propagation algorithm of a neural network, propagates the label information of the clustering center node to a global network, infers labels of unmarked nodes, and outputs the group labels of the unmarked nodes through a trained Softmax classifier, thereby realizing group discovery. The module output is specifically denoted as Z ═ softmax (S ReLU (SXW)(0))W(1))W(2)) Wherein, ReLU is another non-linear activation function mentioned in the present invention, W is the weight matrix of the convolution layer, and is automatically updated in the training process;
as a further improvement, the evaluation and analysis module implements step S6 in the method of the present invention. The module also includes an evaluation index-clustering accuracy for evaluating the population discovery results. The accuracy measures the proportion of the number of correctly divided user nodes in the group discovery result to the whole node scale, directly reflects the effectiveness of the provided method, and provides feedback to the group discovery module to further improve the system performance.
To further illustrate the effectiveness and scalability of the provided methods, the present invention experimentally conducted the following examples.
One embodiment of the present invention selects a small-scale real network for experiment, and further describes the process and effect of the provided method through a visual manner. The experiment is realized by adopting a Tensorflow deep learning framework and Python software programming under the environment of a Windows 7 operating system with a CPU Intel Pentium Dual-Core 2.0GHz and an RAM 8.00 GB. In order to improve the calculation efficiency and accuracy, scientific calculation packages Numpy and Scipy are adopted in the experiment.
The real network used for the experiment is the Zachary air channel club social network. The network is based on the long-term investigation and observation of the interaction relationship among members of a null track club by W.W.Zachery, and an interactive network with 34 members and 78 edges is constructed, as shown in FIG. 1. During the observation process, the network eventually splits into two smaller community structures due to the divergence between the club manager and the coach. The network is widely used for testing the effectiveness and reliability of the group discovery method.
According to the group discovery method provided by the invention, on the basis of the constructed network topology and member attribute information, firstly, clustering center nodes in the network are positioned according to the degree centrality and the relative distance of the nodes (two nodes in a virtual circle pointed by an arrow in figure 3 accurately correspond to a manager and a coach in the network), and labels are distributed; then calculating the similarity among the member nodes to obtain a similar matrix and synthesizing an attribute matrix; then constructing a multilayer graph volume model and training; finally, the group finding result is obtained, as shown in fig. 4.
In experiments, the method provided by the invention can effectively identify the real split phenomenon in the network, namely two groups with obvious group structures are found. As shown in fig. 4, two population structures are represented by circles and squares, respectively. The result is completely consistent with the real group division by comparing with the real labels of the member nodes.
To further verify the technical effect of the present invention, the present embodiment performs a comparative experiment on the real data set:
TABLE 1 comparison of the population discovery accuracy of the method of the invention and 3 representative algorithms
Figure BDA0002924062320000101
The method collects a large amount of user interaction data from the actual attribute network. For comparison with other algorithms, 5 data sets are selected in the experiment, 30% of each data set is used as a training set, 70% of each data set is used as a test set, and the number of the training sets required by the method is determined by the number of the cluster center nodes and the adjacent nodes, and the number is actually far lower than the proportion of the reference training set. The invention is compared with 3 representative methods respectively, and specifically comprises the following steps: a traditional group discovery method Infomap only utilizes network topology information to identify group structure, and can obtain better effect in similar algorithms; a graph embedding method MGAE learns node structure and attribute feature representation through an autoencoder, realizes group identification by using a traditional clustering method, and is better in performance on a small data set; a semi-supervised graph neural network method GCN considers network topology and node attribute information at the same time, aggregates node characteristics and deduces node labels through convolution operation, and performs well under the condition that training labels are sufficient. See table 1 for experimental data comparing the accuracy of population discovery for the present invention and the above method.
Table 1 above gives the accuracy of the method of the invention compared to three representative methods in a population discovery task. Compared with Infomap, MGAE and GCN, the method provided by the invention has the advantages that the identification accuracy is improved by 22.9%, 9.58% and 7.68% on average. The method of the invention can obtain better performance, and the reason is that: on one hand, the method fully considers the network topology and the node attribute information in the group discovery process, and effectively ensures the accuracy of group discovery. The performance is improved by adding the attribute information, and the attribute information can be reflected from the experimental results of MGAE and GCN; on the other hand, the method adopts a strategy of clustering center node positioning, so that the graph convolution model can be converged more quickly in the training process, and a better result is obtained. The GCN also identifies the population labels of the nodes based on graph convolution operation, but in practice, a large number of a priori labels are needed for training, and it is difficult to ensure that the labels are uniformly distributed in the network to effectively deliver label information to the global network,
the following are embodiments of systems of the present invention that may be used to perform method embodiments of the present invention. For details not described in the system embodiment, reference is made to the method embodiment of the present invention.
In yet another embodiment of the present invention, a group discovery system for an attribute network is provided. The group discovery system of the attribute network is used for realizing the group discovery method of the attribute network, and particularly comprises an attribute network data acquisition module, a network structure modeling module, a clustering center positioning module, a matrix conversion module, a graph volume model creation module, a group discovery module and an evaluation analysis module.
The cluster center positioning module determines a cluster center in the network and distributes a group label according to the degree centrality measurement of the nodes and the relative distance between the nodes; the network matrix conversion module converts the adjacent matrix into a similar matrix based on the network topology information and constructs an attribute matrix based on the node attribute set; the graph convolution model creating module builds a multilayer graph convolution model based on the network topology structure and the node attributes; the group discovery module trains based on the constructed graph volume model to realize group discovery; and the evaluation analysis module is used for evaluating the obtained group discovery result.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (3)

1. A group discovery method of an attribute network is characterized by comprising the following steps:
s1: acquiring interactive behavior data among all users in the attribute network;
s2: preprocessing the acquired data, modeling a complex network structure according to the interactive relation among users, and extracting the attribute information of each node;
s3: positioning clustering center nodes in the topological network according to the degree centrality measurement and the relative distance of the network nodes, and distributing group labels;
s4: converting the network adjacent matrix into a similar matrix based on the network topological structure and the attribute information, and synthesizing the node attribute into an attribute matrix;
s5: constructing a multilayer graph volume model for group discovery based on the similarity matrix, the node attribute matrix and the initial group label to obtain a group discovery result of the attribute network;
s6; evaluating the obtained group discovery result by using clustering accuracy and standard mutual information measurement;
wherein the step S3 further includes:
s31: based on the constructed attribute network, calculating the degree centrality D of each node:
D(i)=Degree(vi),i∈[1,N]
wherein v isiRepresenting the ith node in the network; degree (v)i) Representing a node viDegree of centrality, i.e. node viThe number of the neighbor nodes; n is the total number of nodes in the network;
s32; calculating the average centrality of each node in the attribute network, and adding the nodes with the centrality greater than the average centrality into the queue C as candidate cluster central nodes;
s33: c, arranging the candidate central nodes in the C in a descending order according to the degree centrality value;
s34: selecting a first candidate node as a first clustering center node;
s35: setting a truncation distance parameter dcCalculating the shortest distance path length d between all candidate nodes in the current candidate queue C and the first structure center nodespIf a candidate nodal point satisfies dsp≤dcDeleting the central node from the candidate queue, otherwise, taking the central node as a second candidate cluster and continuously keeping the central node in the candidate queue;
s36: repeatedly executing the step S35 until all the structure center nodes are identified and distributing the group labels;
wherein the step S4 further includes:
s41; based on the network topological structure, calculating the local similarity between the network user nodes; the local similarity s between nodes is calculated by the following formula:
Figure FDA0003523673340000021
wherein, N (v)i) Representation and node viThe connected neighbor node set, | | | |, represents a norm where the number of elements in the set is calculated, defining: if i equals j, sij=1;
S42: the network topology is expressed in the form of similar matrix S ═ SijDenotes that all node attributes synthesize a matrix X ═ XiDenotes wherein xiIs a node viA corresponding attribute vector;
wherein the step S5 further includes:
s51; constructing a three-layer graph convolution network model, wherein the output Z can be expressed as:
Z=softmax(SReLU(SReLU(SXW(0))W(1))W(2))
wherein ReLU and softmax represent two activation functions; specifically, the activation function ReLU is defined as ReLU (z)i)=max(0,zi) For extracting node viCorresponding output ziThe non-linear characteristic of (a); the activation function softmax is defined as
Figure FDA0003523673340000022
| C | represents the length of a clustering center node queue, namely the number of group structures; w(0)、W(1)And W(2)Respectively representing the weight matrix of each layer of the model, and obtaining automatic updating through a training process after random initialization;
s52; based on the label set of the clustering center node obtained in the step S3, simultaneously inputting the initial label set and the attribute matrix into the model for training;
s53; and finishing training after the model parameters are not updated any more, dividing the nodes with the same label into the same group according to the output of softmax, and finally obtaining the group discovery result of the attribute network.
2. The group discovery method of the attribute network of claim 1, wherein the attribute network type comprises at least one of:
a user relationship network in a mobile communication system;
social networks in the social media domain;
trade network in the field of financial wind control.
3. The group discovery method of the attribute network of claim 1, wherein the group type comprises at least one of:
consumer groups in a mobile communication network;
interest groups in a social network;
a fraudulent group in the field of wind control.
CN202110127755.8A 2021-01-29 2021-01-29 Group discovery method and system of attribute network Active CN112925989B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210313002.0A CN115438272A (en) 2021-01-29 2021-01-29 Group discovery system of attribute network
CN202110127755.8A CN112925989B (en) 2021-01-29 2021-01-29 Group discovery method and system of attribute network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110127755.8A CN112925989B (en) 2021-01-29 2021-01-29 Group discovery method and system of attribute network

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202210313002.0A Division CN115438272A (en) 2021-01-29 2021-01-29 Group discovery system of attribute network

Publications (2)

Publication Number Publication Date
CN112925989A CN112925989A (en) 2021-06-08
CN112925989B true CN112925989B (en) 2022-04-26

Family

ID=76168680

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202210313002.0A Pending CN115438272A (en) 2021-01-29 2021-01-29 Group discovery system of attribute network
CN202110127755.8A Active CN112925989B (en) 2021-01-29 2021-01-29 Group discovery method and system of attribute network

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202210313002.0A Pending CN115438272A (en) 2021-01-29 2021-01-29 Group discovery system of attribute network

Country Status (1)

Country Link
CN (2) CN115438272A (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113362071A (en) * 2021-06-21 2021-09-07 浙江工业大学 Pompe fraudster identification method and system for Ether house platform
CN113420161B (en) * 2021-06-24 2024-07-02 平安科技(深圳)有限公司 Node text fusion method and device, computer equipment and storage medium
CN113344638B (en) * 2021-06-29 2022-05-24 云南电网有限责任公司信息中心 Power grid user group portrait construction method and device based on hypergraph
CN114095503A (en) * 2021-10-19 2022-02-25 广西综合交通大数据研究院 Block chain-based federated learning participation node selection method
CN113992718B (en) * 2021-10-28 2022-10-04 安徽农业大学 Method and system for detecting abnormal data of group sensor based on dynamic width chart neural network
CN114050975B (en) * 2022-01-10 2022-04-19 苏州浪潮智能科技有限公司 Heterogeneous multi-node interconnection topology generation method and storage medium
CN114997897A (en) * 2022-04-07 2022-09-02 重庆邮电大学 Mobile data-based method for constructing images of easily-damaged people
CN114510650B (en) * 2022-04-19 2022-07-12 湖南三湘银行股份有限公司 Heterogeneous social network wind control processing method and system
CN114741433B (en) * 2022-06-09 2022-09-23 北京芯盾时代科技有限公司 Community mining method, device, equipment and storage medium
CN115086179B (en) * 2022-08-19 2022-12-09 北京科技大学 Detection method for community structure in social network
CN117272345B (en) * 2023-10-09 2024-03-01 上海花小桔科技有限公司 Electronic contract encryption method and system based on cloud service
CN117252488B (en) * 2023-11-16 2024-02-09 国网吉林省电力有限公司经济技术研究院 Industrial cluster energy efficiency optimization method and system based on big data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325061A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Community discovery method and system
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN106411572A (en) * 2016-09-06 2017-02-15 山东大学 Community discovery method combining node information and network structure
CN108596264A (en) * 2018-04-26 2018-09-28 南京大学 A kind of community discovery method based on deep learning
CN110990718A (en) * 2019-11-27 2020-04-10 国网能源研究院有限公司 Social network model building module of company image improving system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107153713B (en) * 2017-05-27 2018-02-23 合肥工业大学 Overlapping community detection method and system based on similitude between node in social networks
CN108133272A (en) * 2018-01-15 2018-06-08 大连民族大学 A kind of method of complex network community detection
CA3061717A1 (en) * 2018-11-16 2020-05-16 Royal Bank Of Canada System and method for a convolutional neural network for multi-label classification with partial annotations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325061A (en) * 2012-11-02 2013-09-25 中国人民解放军国防科学技术大学 Community discovery method and system
CN103942308A (en) * 2014-04-18 2014-07-23 中国科学院信息工程研究所 Method and device for detecting large-scale social network communities
CN106411572A (en) * 2016-09-06 2017-02-15 山东大学 Community discovery method combining node information and network structure
CN108596264A (en) * 2018-04-26 2018-09-28 南京大学 A kind of community discovery method based on deep learning
CN110990718A (en) * 2019-11-27 2020-04-10 国网能源研究院有限公司 Social network model building module of company image improving system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"在线社交网络的社区发现研究进展";张海涛等;《图书情报工作》;20200531;142-152 *

Also Published As

Publication number Publication date
CN115438272A (en) 2022-12-06
CN112925989A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN112925989B (en) Group discovery method and system of attribute network
Xinyi et al. Capsule graph neural network
CN111476261A (en) Community-enhanced graph convolution neural network method
CN113672811A (en) Hypergraph convolution collaborative filtering recommendation method and system based on topology information embedding and computer readable storage medium
CN113255895A (en) Graph neural network representation learning-based structure graph alignment method and multi-graph joint data mining method
CN112580902B (en) Object data processing method and device, computer equipment and storage medium
Li et al. Explain graph neural networks to understand weighted graph features in node classification
CN112765415A (en) Link prediction method based on relational content joint embedding convolution neural network
CN115310589A (en) Group identification method and system based on depth map self-supervision learning
He et al. CECAV-DNN: Collective ensemble comparison and visualization using deep neural networks
Wu et al. Prediction on recommender system based on bi-clustering and moth flame optimization
CN113989544A (en) Group discovery method based on deep map convolution network
Fang et al. Learning decomposed spatial relations for multi-variate time-series modeling
Cai et al. Training deep convolution network with synthetic data for architectural morphological prototype classification
Abreu et al. Currency exchange prediction using machine learning, genetic algorithms and technical analysis
Zhang et al. End‐to‐end generation of structural topology for complex architectural layouts with graph neural networks
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.
Gautam et al. Evolving clustering based data imputation
CN106815653B (en) Distance game-based social network relationship prediction method and system
Zuo et al. Domain selection of transfer learning in fuzzy prediction models
Zhou et al. Online recommendation based on incremental-input self-organizing map
CN114265954B (en) Graph representation learning method based on position and structure information
Lai et al. Learning graph convolution filters from data manifold
CN114297498A (en) Opinion leader identification method and device based on key propagation structure perception
Wong et al. Rainfall prediction using neural fuzzy technique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant