CN111159483A - Social network diagram abstract generation method based on incremental calculation - Google Patents

Social network diagram abstract generation method based on incremental calculation Download PDF

Info

Publication number
CN111159483A
CN111159483A CN201911373671.1A CN201911373671A CN111159483A CN 111159483 A CN111159483 A CN 111159483A CN 201911373671 A CN201911373671 A CN 201911373671A CN 111159483 A CN111159483 A CN 111159483A
Authority
CN
China
Prior art keywords
tensor
boolean
matrix
graph
decomposition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911373671.1A
Other languages
Chinese (zh)
Other versions
CN111159483B (en
Inventor
谢夏
王健
金海�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201911373671.1A priority Critical patent/CN111159483B/en
Publication of CN111159483A publication Critical patent/CN111159483A/en
Application granted granted Critical
Publication of CN111159483B publication Critical patent/CN111159483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a social network diagram abstract generation method based on incremental computation, and belongs to the field of social networks. The method comprises the following steps: tensor expression is carried out on the social network diagram in the target time period to obtain a Boolean tensor TG(ii) a For Boolean tensor TGCarrying out tensor decomposition to obtain a decomposed node matrix N1,N2Attribute matrix A1,…Ah‑3And a time matrix T; node matrix N1Or N2Clustering to obtain a cluster center and a type of each node; and (4) regarding the cluster center as the over point of the graph abstract, and calculating the over edge weight value between the over points to obtain the graph abstract. The invention carries out multidimensional data fusion on the nodes, the node attributes and the timestamps of the social network, and expresses the characteristics based on the binary high-dimensional degree of the binary property and the tensor of the social network graphAnd the properties of the method realize the uniform expression of high-dimensional graph data and the Boolean quantitative representation of a complex social network. Incremental CP decomposition is introduced, prior information such as a decomposition result of the old graph tensor is fully utilized, the size of the decomposition tensor is reduced, and the decomposition efficiency of the graph abstract is improved.

Description

Social network diagram abstract generation method based on incremental calculation
Technical Field
The invention belongs to the field of social networks, and particularly relates to a social network diagram abstract generation method based on incremental computation.
Background
Social network analysis is a hot topic of the data mining community in recent years, and the interaction between entities in a social network is queried and inferred, so that interesting and deep insight on various phenomena can be inspired. However, due to the characteristics of the social network such as large dynamic and variable complex data, the expression and mining of the social network graph data are limited by certain computing resources and cost overhead. Therefore, the starting point for analyzing these complex large graph data is usually a concise representation, i.e., a graph summary. It helps to understand these datasets and to represent queries in a meaningful way. Graph summarization has a very important role in the processing of graph data, from reducing the number of bits required to encode the original graph to more complex database operations, and so on.
In recent years, tensor methods have been applied to graph summarization methods, enabling more accurate weighted graph summaries to be generated. A tensor is a form of data storage in multiple dimensions, referred to as the order of the tensor. Since real tensor data often has the characteristic of high-dimensional sparsity, a tensor decomposition method is generally used for retaining original information, reducing the computational complexity and reducing the data loss.
The current graph abstract method only focuses on time sequence dynamics or node attributes of graph data, while user nodes in the social network contain various attributes, the connection relationship between users can be changed newly every moment, and the social network graph data has both dynamics and node attributes. In addition, for the time-series dynamic graph, the current method can repeatedly calculate the historical data, which results in low calculation efficiency.
Disclosure of Invention
Aiming at the defects and the improvement requirements of the prior art, the invention provides a social network graph abstract generation method based on incremental computation, and aims to adopt an incremental computation framework to uniformly express the dynamics and the node attributes of social network graph data and introduce a Boolean tensor decomposition method to realize extensible and efficient graph abstract computation.
To achieve the above object, according to a first aspect of the present invention, there is provided a method for generating a social network diagram abstract based on incremental computation, the method including the following steps:
s1, tensor expression is carried out on the social network diagram in the target time period to obtain a target Boolean tensor TG
S2, aiming at the target Boolean tensor TGCarrying out tensor decomposition to obtain a decomposed node matrix N1,N2
S3. pair node matrix N1Or N2Clustering to obtain a cluster center and a type of each node;
and S4, regarding the cluster center as the super points of the graph abstract, and calculating the super edge weight values among the super points to obtain the graph abstract of the social network graph.
Preferably, the social network graph is a dynamic undirected graph, one-to-one corresponding to a timestamp.
Preferably, step S2 includes the steps of:
s21, combining the old Boolean tensor ToldAnd the target Boolean tensor TGCombined into a Boolean tensor TallThe Boolean tensor TallThe last order is the time dimension, the old Boolean tensor ToldA tensorial representation of the social networking graph for a previous time period;
s22, pair of Boolean tensors TallBiased sampling is carried out to generate k sub tensors sTi
S23, for each sub tensor sTiPerforming parallel distributed Boolean CP decomposition, and calculating to obtain decomposition factor matrix of each sub tensor
Figure BDA0002338803960000021
S24, dividing the sub tensor sTiBoolean decomposition matrix of
Figure BDA0002338803960000022
And old Boolean tensor ToldBoolean decomposition matrix of
Figure BDA0002338803960000023
Merging to obtain new Boolean tensor TallBoolean CP decomposition result of
Figure BDA0002338803960000024
Wherein i is more than or equal to 1 and less than or equal to k, and j is more than or equal to 1 and less than or equal to h.
Preferably, the step S22 includes the steps of:
s221, for h-order old Boolean tensor ToldIs summed to obtain
Figure BDA0002338803960000031
S222. will
Figure BDA0002338803960000032
Divided by ToldThe number of the medium and non-zero elements is calculated to obtain the sampling probability of each order of index
Figure BDA0002338803960000033
S223, calculating T according to the set sampling factoroldSize L of sampling index of each orderj
S224. according to the sampling probability
Figure BDA0002338803960000034
For ToldIs indexed by the jth order ofjSub-sampling to obtain sample index set
Figure BDA0002338803960000035
S225, collecting sampling indexes
Figure BDA0002338803960000036
And the target Boolean tensor TGAre combined to obtain { V1,V2,...,Vh∪{Vnew} in which V isnewRepresents TGThe time dimension index of;
s226. according to the index set { V1,V2,...,Vh∪{VnewGet a sampling sub tensor;
and S227, repeating the steps S221 to S226 until k sub tensors are generated.
Preferably, the step S23 includes the steps of:
s231. sub tensor sTiFactor matrix of
Figure BDA0002338803960000037
Initializing for Y times, wherein each time the initialization is a Boolean matrix with the non-zero item probability of p, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, h iterations are carried out, wherein in each iteration process, (h-1) factor matrixes are fixed, and the remaining factor matrixes are optimized, so that the overall reduction error is minimized, and one iteration is completed;
s233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure BDA0002338803960000038
Preferably, step S24 includes the steps of:
s241, dividing the sub tensor sT1Boolean decomposition matrix of
Figure BDA0002338803960000039
And old tensor ToldBoolean decomposition matrix of
Figure BDA00023388039600000310
Merging to obtain a Boolean decomposition matrix set
Figure BDA00023388039600000311
S242, the sub tensor sT2Boolean decomposition matrix of
Figure BDA00023388039600000312
And Boolean decomposed matrix set
Figure BDA00023388039600000313
Combining corresponding matrixes, and so on until the sub-tensor sTkBoolean decomposition matrix of
Figure BDA00023388039600000314
And Boolean decomposed matrix set
Figure BDA00023388039600000315
Combining the corresponding matrixes to obtain a new tensor TallBoolean CP decomposition matrix of
Figure BDA00023388039600000316
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) calculating tensors V and U, where VxIs composed of
Figure BDA0002338803960000041
X line of (1), uxIs composed of
Figure BDA0002338803960000042
Corresponds to an index row, V is VxTensor restored by matrix with other factors, U being UxA tensor restored from the matrix of other factors;
(2) calculating the reconstruction error epsilon of the tensor V and the old tensor factor matrix1And ε2
ε1=||V-Tx||
ε2=||U-Tx||
Wherein, TxA slice tensor that is a corresponding index row;
(3) judging whether epsilon is satisfied1<ε2If yes, u of the original tensor factor matrixxUsing vxAnd updating, otherwise, not updating.
Preferably, in step S3, hamming distance is selected, the number r of clustering centers is set, and K-Means clustering is adopted to obtain the clustering center SiAnd each node belongs to a cluster, i 1.. r.
Preferably, step S4 includes the steps of:
s41, calculating the excess edge weight between the excess points in the graph abstract, wherein the calculation formula is as follows:
Figure BDA0002338803960000043
wherein S isi、SjCluster centers calculated for the clustering algorithm, l and m being S, respectivelyi、SjL is the Boolean tensor TallLength in time dimension, N being TallNumber of nodes, σ (S)i) Is SiThe number of points contained;
s42, calculating the reconstruction error of the graph abstract, wherein the calculation formula is as follows:
Figure BDA0002338803960000044
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, and taking the excess edge weight value as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering the step S3.
To achieve the above object, according to a second aspect of the present invention, there is provided a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the method for generating a social network diagram summary based on incremental computation according to the first aspect.
Generally, by the above technical solution conceived by the present invention, the following beneficial effects can be obtained:
(1) aiming at the problem that the existing graph abstract method only focuses on the dynamics or node attributes of graph data, the invention performs multidimensional data fusion on the nodes, the node attributes and the time stamps of the social network, and realizes the uniform expression of high-dimensional graph data and the Boolean quantitative expression of a complex social network based on the high-dimensional expression characteristics of the binary property and tensor of the social network graph.
(2) Aiming at the problem of low calculation efficiency of the existing graph summarization method, incremental Boolean CP decomposition is introduced, prior information such as a decomposition result of an old graph tensor is fully utilized, the size of the decomposition tensor is reduced, and the decomposition efficiency of the graph summarization is improved.
Drawings
Fig. 1 is a flowchart of a method for generating a social network diagram abstract based on incremental computation according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
First, some terms related to the present invention are explained.
And (4) drawing abstract: the abstract is a concise representation of an original graph, and a large number of points and edges in the graph are aggregated into a super point and a super edge, so that the visualization of a large graph and the mining of graph data are facilitated. The super point is a point set formed by aggregating a plurality of nodes in the graph, the super edge is an edge set formed by aggregating a plurality of edges in the graph, and the super edge weight is calculated by the edge adjacency characteristics and the weight in the set.
The Boolean tensor is: all elements are tensors of 0 or 1, and due to the binary nature of the adjacency matrix of the unweighted graph, the dynamic unweighted graph can be represented as a Boolean tensor, where the order is the dimension of the tensor.
Undirected, weightless graph: edges in the graph have neither direction nor weight, wherein the dynamic undirected weighted graph is an undirected weighted graph at each time stamp.
Tensor decomposition: a scheme of representing a tensor as a basic sequence of operations on other simpler tensors is generally available for tensor filling, dimensionality reduction, feature extraction, and so on.
And (3) CP decomposition: a common form of tensor Decomposition, the tensor is decomposed into the sum of a number of rank 1 tensors, a special tensor which can be decomposed into the outer product of a number of vectors.
As shown in fig. 1, the present invention provides a method for generating a social network diagram abstract based on incremental computation, which includes the following steps:
s1, tensor expression is carried out on the social network diagram in the target time period to obtain a Boolean tensor TG
Preferably, the social network graph is a dynamic undirected graph, one-to-one corresponding to a timestamp.
Abstracting users in the social network into nodes, abstracting the relationship between the users in the social network into edges, and obtaining the social network graph. For example, in a microblog social network, microblog users are nodes, each node has a plurality of node attributes, such as gender, academic calendar, work and occupation, and the concern relationship between the users is an edge. The concern relationships between users are dynamically changing, and thus, the social networking graph data is dynamic.
In this embodiment, the target time period is 1 day, that is, a social network diagram summary within 1 day needs to be generated. In a generated graph abstract in a microblog social network, user nodes with similar user attributes and user concerns are represented by a super point, and connection relations among different user super points are represented by super edges.
The graph data is constructed into a high-order tensor, the node attribute and the timestamp of the graph data serve as different dimensions of the tensor, the tensor is a binary tensor, and the nonzero element in the tensor represents two nodes, the node attribute and the timestamp of an edge in the dynamic attribute graph.
For a high-order sparse tensor, if all elements in the tensor are stored, a large amount of storage space is consumed, so for a graph tensor, the invention only uses tuples to store the index values of non-zero elements in different dimensions, for example, (Node1, Node2, Node1.attribute, Node2.attribute, T,.). at time T, there are edges with Node1 and Node2 as vertexes, and the attributes of Node1 and Node2 are: node1.attribute, node2. attribute. In order to support the calculation of large-scale graph data, graph tuple of the graph is uploaded to a distributed file system (HDFS).
S2, a Boolean tensor T is pairedGCarrying out tensor decomposition to obtain a decomposed node matrix N1,N2Attribute matrix A1,…Ah-3And a time matrix T.
Decomposed node matrix N1,N2Is a feature vector, attribute matrix A, used to represent the adjacency characteristics of nodes1,…Ah-3Is a feature vector used to represent the attributes of a graph node, and the time matrix T is a feature vector used to represent a graph in the time dimension.
Preferably, step S2 includes the steps of:
s21, setting the Boolean tensor ToldAnd the Boolean tensor TGCombined into a Boolean tensor TallWherein the last order is the time dimension, the Boolean tensor ToldA tensoriated representation of the social networking graph in a previous time period.
In this embodiment, the Boolean tensor ToldA tensoriated representation of the social networking graph within the previous day.
S22, pair of Boolean tensors TallBiased sampling is carried out to generate k sub tensors sTi,1≤i≤k。
For Boolean tensor TallBiased sampling according to importance measure is to increase the non-zero item density of the sampled sub-tensor and to increase the decomposition result pair T of each sub-tensorallThe impact of the update. The following procedure is illustrated with h ═ 2.
Suppose that
Figure BDA0002338803960000071
The sampling probability is 0.5.
Preferably, the step S22 includes the steps of:
s221, for h-order old Boolean tensor ToldIs summed to obtain
Figure BDA0002338803960000081
In the present embodiment, the first and second electrodes are,
Figure BDA0002338803960000082
Figure BDA0002338803960000083
s222. by
Figure BDA0002338803960000084
Divided by ToldThe number of the medium and non-zero items is calculated to obtain the sampling probability of each order index
Figure BDA0002338803960000085
In the present embodiment, the first and second electrodes are,
Figure BDA0002338803960000086
Figure BDA0002338803960000087
s223, calculating T according to the set sampling factoroldSize L of sampling index of each orderj
In this embodiment, L1=2*0.5=1,L2=2*0.5=1。
S224. according to the sampling probability
Figure BDA0002338803960000088
For ToldIs indexed by the jth order ofjSub-sampling to obtain sample index set on the order
Figure BDA0002338803960000089
In this embodiment, in the first dimension, the sample size is 1, and is [0, 1 ] overall]The sampling probability of the corresponding element is [ 0.330.67 ]](ii) a In the second dimension, the sample size is 1, overall [0, 1 ]]The sampling probability of the corresponding element is [ 0.330.67 ]]. Assuming the sampling result
Figure BDA00023388039600000810
Figure BDA00023388039600000811
S225, merging the sampling index set and the index set of the new tensor to obtain { V }1,V2,...,Vh∪{Vnew} in which V isnewRepresents TGIs indexed by the time dimension of (a).
In this embodiment, Vnew=[2,3]The final sampling index is { [1 ]],[1,2,3]}。
S226. according to sampling index { V1,V2,...,Vh∪{VnewGet the sample sub-tensor.
In this embodiment, the sub-tensor is Tall[1,{1,2,3}]=[1 1 1]。
S227, repeating the steps S221-S226 until k sub-tensors sT are generated1,......,sTk
S23, for each sub tensor sTiPerforming parallel distributed Boolean CP decomposition, and calculating to obtain decomposition factor matrix of each matrix
Figure BDA00023388039600000812
1≤i≤k,1≤j≤h。
Preferably, the step S23 includes the steps of:
s231. sub tensor sTiFactor matrix of
Figure BDA0002338803960000091
And initializing for Y times, wherein each time, the initialization is carried out to a Boolean matrix with the probability of non-zero items being p, and taking a factor matrix with the minimum reduction error as a final initialization matrix.
In this embodiment, Y is set according to actual requirements, and is generally any integer of 5 to 20.
And S232, h iterations are carried out, wherein in each iteration process, (h-1) factor matrixes are fixed, and the rest factor matrixes are optimized, so that the integral reduction error is minimum.
The present embodiment employs least squares optimization. The following procedure is illustrated with h ═ 3.
Sub-tensor sTiRespectively is
Figure BDA0002338803960000092
Fixing
Figure BDA0002338803960000093
Optimization
Figure BDA0002338803960000094
So that the reduction error is minimized; fixing
Figure BDA0002338803960000095
Optimization
Figure BDA0002338803960000096
So that the reduction error is minimized; fixing
Figure BDA0002338803960000097
Optimization
Figure BDA0002338803960000098
So that the reduction error is minimized.
S233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure BDA0002338803960000099
In this embodiment, k and e are set according to actual requirements.
S24, dividing the sub tensor sTiBoolean decomposition matrix of
Figure BDA00023388039600000910
And old tensor ToldBoolean decomposition matrix of
Figure BDA00023388039600000911
Merging to obtain new tensor TallCloth ofResult of ErCP decomposition
Figure BDA00023388039600000912
Combining the two to obtain a new tensor TallBoolean CP decomposition result of
Figure BDA00023388039600000913
Can be the old tensor TallThe decomposition matrix of (2) introduces updates to reduce errors of the decomposition.
Preferably, step S24 includes the steps of:
s241, dividing the sub tensor sT1Boolean decomposition matrix of
Figure BDA00023388039600000914
And old tensor ToldBoolean decomposition matrix of
Figure BDA00023388039600000915
Merging to obtain a Boolean decomposition matrix set
Figure BDA00023388039600000916
S242, the sub tensor sT2Boolean decomposition matrix of
Figure BDA00023388039600000917
And Boolean decomposed matrix set
Figure BDA00023388039600000918
Combining corresponding matrixes, and so on until the sub-tensor sTkBoolean decomposition matrix of
Figure BDA00023388039600000919
And Boolean decomposed matrix set
Figure BDA00023388039600000920
Combining the corresponding matrixes to obtain a new tensor TallBoolean CP decomposition matrix of
Figure BDA00023388039600000921
Figure BDA00023388039600000922
Figure BDA0002338803960000101
Figure BDA0002338803960000102
Preferably, the merging of the boolean decomposition matrices comprises the following steps:
(1) calculating tensors V and U, wherein V is VxThe tensor, U, restored from the matrix of other factors is UxTensor, v, restored with other factor matricesxIs composed of
Figure BDA0002338803960000103
X line of (1), uxIs composed of
Figure BDA0002338803960000104
The sample of (b) corresponds to an index row.
(2) Calculating the reconstruction error epsilon of the tensor V and the old tensor factor matrix1And ε2
ε1=||V-Tx||
ε2=||U-Tx||
Wherein, TxFor the slice tensor corresponding to the index row, | | | | represents tensor 1-norm, i.e., the number of nonzero entries in the boolean tensor.
(3) Judging whether epsilon is satisfied1<ε2If yes, u of the original tensor factor matrixxUsing vxAnd updating, otherwise, not updating.
Satisfies epsilon1<ε2Representing updated rows may reduce the overall reconstruction error, u, of the original tensor factor matrixxUsing vxTo carry outAnd (6) updating.
S3, a node matrix N is paired1Or N2And clustering to obtain a cluster center and the type of each node.
Preferably, the clustering method in step S3 is a K-Means clustering method, and the distance is a hamming distance.
The method comprises the following steps:
s31, selecting a row vector set { N ] of the node Boolean factor matrix N1,n2,...,nlAnd l is the number of rows of the matrix N and the number of nodes in the graph.
S32, optionally selecting r vectors from the graph summary as initial clustering centers, wherein r represents the number of clustering centers and is the number of generated super points in the final graph summary.
In this embodiment, r is K, and is initialized to 100.
And S33, calculating the Hamming distance from other nodes to the center of each cluster, and dividing the Hamming distance into the clusters with the closest distances.
And S34, updating the cluster center by using the integral average value of all the vectors according to all the vectors in each cluster, and finishing one round of iteration.
And S35, if the iteration times reach the specified value, outputting the cluster to which each point belongs.
In this embodiment, the number of iterations is specified to be 10.
And S4, regarding the cluster center as the over point of the graph abstract, and calculating the over edge weight value between the over points to obtain the complete graph abstract.
Preferably, step S4 includes the steps of:
and S41, calculating the excess edge weight between the excess points in the graph abstract according to the graph node adjacency similarity formula.
Figure BDA0002338803960000111
And S42, calculating the reconstruction error of the graph abstract according to the Euclidean distance of the tensor.
Figure BDA0002338803960000112
Wherein S isi、SjCluster centers calculated for the clustering algorithm, l and m being S, respectivelyi、SjL is the Boolean tensor TallLength in time dimension, N being TallNumber of nodes, σ (S)i) Is SiThe number of points involved, | | | represents the absolute value operator.
S43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, and taking the excess edge weight value as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering the step S3.
In this embodiment, the reconstruction error setting threshold is 1000. And if the reconstruction error does not meet the set threshold, increasing the number of the clustering centers.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1.A method for generating a social network diagram abstract based on incremental computation is characterized by comprising the following steps:
s1, tensor expression is carried out on the social network diagram in the target time period to obtain a target Boolean tensor TG
S2, aiming at the target Boolean tensor TGCarrying out tensor decomposition to obtain a decomposed node matrix N1,N2
S3. pair node matrix N1Or N2Clustering to obtain a cluster center and a type of each node;
and S4, regarding the cluster center as the super points of the graph abstract, and calculating the super edge weight values among the super points to obtain the graph abstract of the social network graph.
2. The method of claim 1, wherein the social network graph is a dynamic undirected graph, one-to-one with timestamps.
3. The method according to claim 1 or 2, wherein step S2 comprises the steps of:
s21, combining the old Boolean tensor ToldAnd the target Boolean tensor TGCombined into a Boolean tensor TallThe Boolean tensor TallThe last order is the time dimension, the old Boolean tensor ToldA tensorial representation of the social networking graph for a previous time period;
s22, pair of Boolean tensors TallBiased sampling is carried out to generate k sub tensors sTi
S23, for each sub tensor sTiPerforming parallel distributed Boolean CP decomposition, and calculating to obtain decomposition factor matrix of each sub tensor
Figure FDA0002338803950000011
S24, dividing the sub tensor sTiBoolean decomposition matrix of
Figure FDA0002338803950000012
And old Boolean tensor ToldBoolean decomposition matrix of
Figure FDA0002338803950000013
Merging to obtain new Boolean tensor TallBoolean CP decomposition result of
Figure FDA0002338803950000014
Wherein i is more than or equal to 1 and less than or equal to k, and j is more than or equal to 1 and less than or equal to h.
4. The method of claim 3, wherein the step S22 includes the steps of:
s221, for h-order old Boolean tensor ToldIs summed to obtain
Figure FDA0002338803950000015
S222. will
Figure FDA0002338803950000016
Divided by ToldThe number of the medium and non-zero elements is calculated to obtain the sampling probability of each order of index
Figure FDA0002338803950000017
S223, calculating T according to the set sampling factoroldSize L of sampling index of each orderj
S224. according to the sampling probability
Figure FDA0002338803950000021
For ToldIs indexed by the jth order ofjSub-sampling to obtain sample index set
Figure FDA0002338803950000022
S225, collecting sampling indexes
Figure FDA0002338803950000023
And the target Boolean tensor TGAre combined to obtain { V1,V2,...,Vh∪{Vnew} in which V isnewRepresents TGThe time dimension index of;
s226. according to the index set { V1,V2,...,Vh∪{VnewGet a sampling sub tensor;
and S227, repeating the steps S221 to S226 until k sub tensors are generated.
5. The method of claim 3, wherein the step S23 includes the steps of:
s231. sub tensor sTiFactor matrix of
Figure FDA0002338803950000024
Initializing for Y times, wherein each time the initialization is a Boolean matrix with the non-zero item probability of p, and taking a factor matrix with the minimum reduction error as a final initialization matrix;
s232, h iterations are carried out, wherein in each iteration process, (h-1) factor matrixes are fixed, and the remaining factor matrixes are optimized, so that the overall reduction error is minimized, and one iteration is completed;
s233, repeating the step 232 until the number of iteration rounds reaches k or the iteration error is smaller than e, and returning to the Boolean factor matrix
Figure FDA0002338803950000025
6. The method of claim 3, wherein the step S24 includes the steps of:
s241, dividing the sub tensor sT1Boolean decomposition matrix of
Figure FDA0002338803950000026
And old Boolean tensor ToldBoolean decomposition matrix of
Figure FDA0002338803950000027
Merging to obtain a Boolean decomposition matrix set
Figure FDA0002338803950000028
S242, the sub tensor sT2Boolean decomposition matrix of
Figure FDA0002338803950000029
And Boolean decomposed matrix set
Figure FDA00023388039500000210
Combining corresponding matrixes, and so on until the sub-tensor sTkBoolean decomposition matrix of
Figure FDA00023388039500000211
And Boolean decomposed matrix set
Figure FDA00023388039500000212
Combining the corresponding matrixes to obtain a new tensor TallBoolean CP decomposition matrix of
Figure FDA00023388039500000213
7. The method of claim 6, wherein the merging of the Boolean decomposed matrices comprises the steps of:
(1) calculating tensors V and U, where VxIs composed of
Figure FDA00023388039500000214
X line of (1), uxIs composed of
Figure FDA00023388039500000215
Corresponds to an index row, V is VxTensor restored by matrix with other factors, U being UxA tensor restored from the matrix of other factors;
(2) calculating the reconstruction error epsilon of the tensor V and the old tensor factor matrix1And ε2
ε1=||V-Tx||
ε2=||U-Tx||
Wherein, TxA slice tensor that is a corresponding index row;
(3) judging whether epsilon is satisfied1<ε2If yes, u of the original tensor factor matrixxUsing vxAnd updating, otherwise, not updating.
8. The method of claim 1, wherein in step S3, hamming distance is selected, the number r of cluster centers is set, and K-Means clustering is used to obtain cluster center SiAnd each node belongs to a cluster, i 1.. r.
9. The method according to any one of claims 3 to 7, wherein step S4 includes the steps of:
s41, calculating the excess edge weight between the excess points in the graph abstract, wherein the calculation formula is as follows:
Figure FDA0002338803950000031
wherein S isi、SjCluster centers calculated for the clustering algorithm, l and m being S, respectivelyi、SjL is the Boolean tensor TallLength in time dimension, N being TallNumber of nodes, σ (S)i) Is SiThe number of points contained;
s42, calculating the reconstruction error of the graph abstract, wherein the calculation formula is as follows:
Figure FDA0002338803950000032
s43, judging whether the reconstruction error meets a set threshold value, if so, taking the cluster as a node in the graph abstract, and taking the excess edge weight value as the weight of the edge of the graph abstract, otherwise, changing the number of cluster centers, and then entering the step S3.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for generating a social network diagram summary based on incremental computation according to any one of claims 1 to 9.
CN201911373671.1A 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method Active CN111159483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911373671.1A CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911373671.1A CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Publications (2)

Publication Number Publication Date
CN111159483A true CN111159483A (en) 2020-05-15
CN111159483B CN111159483B (en) 2023-07-04

Family

ID=70558533

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911373671.1A Active CN111159483B (en) 2019-12-26 2019-12-26 Tensor calculation-based social network diagram abstract generation method

Country Status (1)

Country Link
CN (1) CN111159483B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881191A (en) * 2020-08-05 2020-11-03 厦门力含信息技术服务有限公司 Client portrait key feature mining system and method under mobile internet
CN112507245A (en) * 2020-12-03 2021-03-16 中国人民大学 Social network friend recommendation method based on graph neural network
CN113139098A (en) * 2021-03-23 2021-07-20 中国科学院计算技术研究所 Abstract extraction method and system for big homogeneous relation graph
CN113157981A (en) * 2021-03-26 2021-07-23 支付宝(杭州)信息技术有限公司 Graph network relation diffusion method and device
CN112287118B (en) * 2020-10-30 2023-06-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Event mode frequent subgraph mining and prediction method

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312797A1 (en) * 2009-06-05 2010-12-09 Xerox Corporation Hybrid tensor-based cluster analysis
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN107656928A (en) * 2016-07-25 2018-02-02 长沙有干货网络技术有限公司 The method that a kind of isomery social networks of user clustering is recommended
CN107767280A (en) * 2017-10-16 2018-03-06 湖北文理学院 A kind of high-quality node detecting method based on element of time
US20180204117A1 (en) * 2017-01-19 2018-07-19 Google Inc. Dynamic-length stateful tensor array
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100312797A1 (en) * 2009-06-05 2010-12-09 Xerox Corporation Hybrid tensor-based cluster analysis
CN107656928A (en) * 2016-07-25 2018-02-02 长沙有干货网络技术有限公司 The method that a kind of isomery social networks of user clustering is recommended
US20180204117A1 (en) * 2017-01-19 2018-07-19 Google Inc. Dynamic-length stateful tensor array
US20180349477A1 (en) * 2017-06-06 2018-12-06 Facebook, Inc. Tensor-Based Deep Relevance Model for Search on Online Social Networks
CN107545509A (en) * 2017-07-17 2018-01-05 西安电子科技大学 A kind of group dividing method of more relation social networks
CN107767280A (en) * 2017-10-16 2018-03-06 湖北文理学院 A kind of high-quality node detecting method based on element of time
CN109697467A (en) * 2018-12-24 2019-04-30 宁波大学 A kind of summarization methods of complex network figure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
PAULI MIETTINEN, 《IEEE/WALK’N’MERGE: A SCALABLE ALGORITHM FOR BOOLEAN TENSOR FACTORIZATION》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111881191A (en) * 2020-08-05 2020-11-03 厦门力含信息技术服务有限公司 Client portrait key feature mining system and method under mobile internet
CN111881191B (en) * 2020-08-05 2021-06-11 留洋汇(厦门)金融技术服务有限公司 Client portrait key feature mining system and method under mobile internet
CN112287118B (en) * 2020-10-30 2023-06-02 西南电子技术研究所(中国电子科技集团公司第十研究所) Event mode frequent subgraph mining and prediction method
CN112507245A (en) * 2020-12-03 2021-03-16 中国人民大学 Social network friend recommendation method based on graph neural network
CN113139098A (en) * 2021-03-23 2021-07-20 中国科学院计算技术研究所 Abstract extraction method and system for big homogeneous relation graph
CN113139098B (en) * 2021-03-23 2023-12-12 中国科学院计算技术研究所 Abstract extraction method and system for homogeneity relation large graph
CN113157981A (en) * 2021-03-26 2021-07-23 支付宝(杭州)信息技术有限公司 Graph network relation diffusion method and device
CN113157981B (en) * 2021-03-26 2022-12-13 支付宝(杭州)信息技术有限公司 Graph network relation diffusion method and device

Also Published As

Publication number Publication date
CN111159483B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN111159483B (en) Tensor calculation-based social network diagram abstract generation method
Saldana et al. How many communities are there?
Zhu et al. Differential privacy and applications
Wyse et al. Inferring structure in bipartite networks using the latent blockmodel and exact ICL
CN112182245B (en) Knowledge graph embedded model training method and system and electronic equipment
Thirumuruganathan et al. Approximate query processing for data exploration using deep generative models
Wang et al. Multiway clustering via tensor block models
Wang et al. A united approach to learning sparse attributed network embedding
Yu et al. Zinb-based graph embedding autoencoder for single-cell rna-seq interpretations
Kacem et al. MapReduce-based k-prototypes clustering method for big data
Huang et al. Spectral clustering via adaptive layer aggregation for multi-layer networks
Li et al. Greedy optimization for K-means-based consensus clustering
Pu et al. Stochastic mirror descent for low-rank tensor decomposition under non-euclidean losses
CN110717043A (en) Academic team construction method based on network representation learning training
Dempsey et al. Hierarchical network models for exchangeable structured interaction processes
Salem et al. Clustering categorical data using the k-means algorithm and the attribute’s relative frequency
CN108154380A (en) The method for carrying out the online real-time recommendation of commodity to user based on extensive score data
Lu et al. An improved k-means distributed clustering algorithm based on spark parallel computing framework
Zhang et al. Node-level community detection within edge exchangeable models for interaction processes
CN117059284A (en) Diabetes parallel attribute reduction method based on co-evolution discrete particle swarm optimization
Ng et al. Inference and sampling for archimax copulas
Zhang et al. Perturbation analysis of randomized svd and its applications to high-dimensional statistics
US20240104387A1 (en) Learning logical rules over graph structured data using message passing
Li et al. An alternating nonmonotone projected Barzilai–Borwein algorithm of nonnegative factorization of big matrices
Chen et al. A hybrid tensor factorization approach for QoS prediction in time-aware mobile edge computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant