CN110851491A - Network link prediction method based on multiple semantic influences of multiple neighbor nodes - Google Patents

Network link prediction method based on multiple semantic influences of multiple neighbor nodes Download PDF

Info

Publication number
CN110851491A
CN110851491A CN201910985752.0A CN201910985752A CN110851491A CN 110851491 A CN110851491 A CN 110851491A CN 201910985752 A CN201910985752 A CN 201910985752A CN 110851491 A CN110851491 A CN 110851491A
Authority
CN
China
Prior art keywords
node
semantic
influence
network
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910985752.0A
Other languages
Chinese (zh)
Other versions
CN110851491B (en
Inventor
王博
宋美贤
胡清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201910985752.0A priority Critical patent/CN110851491B/en
Publication of CN110851491A publication Critical patent/CN110851491A/en
Application granted granted Critical
Publication of CN110851491B publication Critical patent/CN110851491B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Business, Economics & Management (AREA)
  • Economics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network link prediction method based on multiple semantic influences of multiple neighbor nodes, relates to data mining and topological structure analysis, and belongs to a research problem in the field of social computing. The method comprises the following steps: and analyzing data, namely analyzing interest characteristics of the nodes and network structure characteristics based on node behaviors and node relation data in the social network. And (3) model training, wherein the model combines multiple semantic influences of multiple neighbor nodes to obtain an embedded vector of each node. Predictive analysis uses the similarity between the embedded vectors of node pairs to measure the probability that a buddy link exists. The invention does not use the constant influence scores of the neighbors, but simulates the special semantic influence of each neighbor on the node. The method jointly simulates the local level semantic influence and the global level semantic influence of the neighbor nodes in the network embedding training, and trains the joint embedding vector based on the semantic influence of all the neighbor nodes for each node.

Description

Network link prediction method based on multiple semantic influences of multiple neighbor nodes
Technical Field
The invention relates to data mining and topological structure analysis, and belongs to a research problem in the field of social computing. A network link prediction method combining multiple semantic effects of multiple neighbor nodes is provided.
Background
Among the many tasks in social networks, link prediction is of great importance. This task includes two problems: the first is to infer social links that may be generated in the social network in the future, and the other is to recreate existing links that are missing from the current snapshot of the social network. The aim of the invention is to solve the latter, i.e. to reconstruct missing links in a social network.
To implement link prediction, topology information of a network is widely used in a conventional link prediction method, which is referred to as a topology-based method. Topology-based link prediction only considers the structural information of the social network. Inspired by Network Embedding (NE) technology, a large number of topology-based models have been proposed in recent years for learning node Embedding vectors and further for link prediction. For example, Deepwalk[1]And (4) learning the embedded vector representation of the node by considering the node string obtained by random walk as a sentence and combining a Skip-Gram method. Topology-based methods ignore node attributes that are actually useful for link prediction. By jointly modeling topological and semantic information, the hybrid approach can provide better performance. E.g. TADW[2]Deepwalk-based matrix decomposition is improved in conjunction with textual information.
The invention predicts the probability of social links between two people by embedding different types of attributes into a uniform space and calculating the similarity of the embedded vectors. The idea of predicting the connection with similarity is closely related to the theory of homogeneity in sociology. To explain the similarities between individuals in social networks, the theory of homogeneity proposes two principles: selection and influence. Selection principle explains the similarity of social connections by assuming people are similar to others, and the influencing principle assumes that similarity stems from the fact that people become more similar to their friends over time. Compared with the influence principle, the selection principle is more intuitive and is widely applied to the current link prediction research: people tend to select friends that are similar to themselves in structural or semantic attributes.
However, influence also plays an important role in establishing social connections. The theory of homogeneity in sociology indicates that people influence each other in existing relationships. By this way of influencing, the neighbourhood of a person will influence the selection of a new friend of a person. Psychological studies also support a co-role in the influence and selection in human selection behavior. In psychology, the difference between influence and choice can be understood as two causes: intrinsic and extrinsic motivations, which together drive selective behavior[3]. The intrinsic motivation is determined by the intrinsic interest of the person, and the extrinsic motivation comes from the extrinsic influence.
In the invention, the influence of the neighbors is introduced into the link prediction task. For this reason, there are two main challenges:
(1) user nodes in a social network often have different impacts on different neighbor nodes. However, in the conventional method, a user node has only a constant influence score, and when the user node influences different neighbor nodes around him/her, a slightly different influence cannot be obtained. Thus, if one wants to know how a given user node is affected by different neighbor nodes during social link establishment, one needs to model the pairwise impact between friend nodes.
(2) The impact between interpersonal relationships is usually semantic, such as research interest or political standpoints. Such semantics may exist at different language levels. In one aspect, the local level semantic impact describes the interaction of two user nodes in some specific term semantics. On the other hand, global level semantic impact refers to the semantic impact of the overall interest of neighboring nodes.
[ reference documents ]
[1]Bryan Perozzi,Rami Al-Rfou,and Steven Skiena.2014.DeepWalk:onlinelearning of social representations.In Proceedings of the 20th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining,KDD’14,NewYork,NY,USA-August 24-27,2014.701–710.
[2]Cheng Yang,Zhiyuan Liu,Deli Zhao,Maosong Sun,and EdwardY.Chang.2015.Network Representation Learning with Rich Text Information.InProceedings of the Twenty-Fourth International Joint Conference on ArtificialIntelligence,IJCAI2015,Buenos Aires,Argentina,July 25-31,2015.2111–2117.
[3]Richard M.Ryan and Edward L.Deci.2000.Intrinsic and ExtrinsicMotivations:Classic Definitions and New Directions.Contemporary EducationalPsychology 25,1(2000),54–67.
Disclosure of Invention
The invention designs a network link prediction method combining multiple semantic influences of multiple neighbor nodes. Namely, link prediction is carried out based on a network embedding method, and each neighbor node of each node has multi-level semantic influence. The invention aims to predict the probability of friend links between certain node pairs based on relevant topological information and interest text information.
The invention provides a network link prediction method based on multiple semantic influences of multiple neighbor nodes, which comprises the following steps:
step one, data analysis is used for analyzing node behavior data and relationship data among nodes in a social network; analyzing related attribute vectors from interest attributes of the nodes and friend attributes of the nodes respectively; obtaining node interest characteristics and network structure characteristics;
secondly, model training is carried out, and the model training is used for constructing a model for obtaining node embedded vectors in the social network; based on the node interest characteristics and the network structure characteristics obtained by the data analysis module, the model models multiple semantic influences of multiple neighbor nodes to obtain an embedded vector of each node;
and step three, prediction analysis, namely measuring the probability of friend links between corresponding node pairs by using the similarity between the embedding vectors of the node pairs.
Furthermore, in the first step of the network link prediction method based on multiple semantic influences of multiple neighbor nodes, the method is to perform the following stepsA social network is denoted as G ═ (N, E, S), nodes in the social network all have text attributes that imply interest information, where N ═ { u ═ i1,u2,...,unThe method comprises the steps that A, a node set of a social network is obtained, E is a friend link set in the social network, and S is a text attribute set of the node; node uiIs represented as a word sequence Si=(w1,w2,...,wn) Wherein w istIs a sequence of words SiThe t-th word in (1).
In the second step of the network link prediction method based on multiple semantic influences of multiple neighbor nodes, the training target is to obtain a network embedded matrix V ═ V1,v2,...,vn]V is formed by the combination of the embedded vectors of all nodes, whereIs node uiThe embedded vector of (2); to train the embedded vector for each node in the network, the sum of the probabilities of all known edges is maximized, as follows:
Figure BDA0002236625920000032
wherein L (e) is a topology-based objective function LT(e) And an objective function L based on influenceI(e) Topology-based and impact-based embedding are mapped into the same representation space;
L(e)=αLT(e)+(1-α)LI(e)
wherein the target function is based on topology
Figure BDA0002236625920000033
Impact-based objective function
Figure BDA0002236625920000034
wijIs a weight of an edge in a social network to represent the strength or polarity of a friend's relationship, which makes the present invention applicable to various networks;
when an influence-based embedded vector of a node is obtained in a model training process, semantic influence of each neighbor of the node is modeled by semantics of each neighbor and an interest text of the node; the semantic influence is modeled at local and global levels respectively and is combined into a combined embedded vector based on influence; the local semantic influence is used for capturing text semantic influence of a local area, and the text of the local area can be interpreted by using certain term vocabularies in the interest text; capturing the influence caused by the global interest semantics of the neighbor, namely the semantic influence caused by the global semantics described by the whole semantics of the interest text;
all neighbors are paired to node uiIs averaged to generate a finalAs follows:
Figure BDA0002236625920000037
where m represents node uiThe number of the neighbor nodes of (1),
Figure BDA0002236625920000041
represents a neighbor node ukTo node uiInfluence-based embedding of (1); embedding by connecting local level semantic impactsAnd global level semantic impact embedding
Figure BDA0002236625920000043
Obtaining neighbor node ukTo node uiThe impact-based embedding of (1), namely:
Figure BDA0002236625920000044
wherein
Figure BDA0002236625920000045
And is
Figure BDA0002236625920000046
In the second step, the embedded vector training based on the local semantic influence is based on a Convolutional Neural Network (CNN) and an Attention Mechanism (Attention Mechanism); the training comprises the following steps: obtaining a pair of friend nodes ui,ukText information sequence Si,SkBased on the search layer, the convolution layer, the attention layer and the output layer, the final embedded vector based on the local semantic influence is obtained;
based on a text information sequence SiObtaining a text embedding matrix X ═ X by the lookup layer1,x2,...,xn]Then, based on the following convolution formula, a local feature matrix C is obtained(i)=[c1,c2,...,cn-h+1];
ci=f(Wcxi:i+h-1+b)
In the same manner, node u is acquiredi,ukLocal feature matrix C of(i)
Coupling local semantic relevance of a group of friend nodes by combining an attention mechanism, and generating an attention vector for each of two local feature matrixes, so that local semantic information from a neighbor node directly influences an embedded vector of the node; when generating the attention vector, first, the local feature matrix C is used(i),C(k)A semantic matching matrix M for local semantic influence is constructed, the goal is to obtain semantic matching signals, and the calculation mode is as follows:
Figure BDA0002236625920000048
wherein the wordsSemantic matching matrix
Figure BDA0002236625920000049
MxyRepresents the x row and y column elements of the matrix M; performing mean pooling and softmax operation on the semantic matching matrix M to generate an attention vector, wherein the calculation mode is as follows:
a(i)=softmax(meanrow(M))
a(k)=softmax(meancol(M))
wherein, a(i)Are respectively local feature matrix C(i),C(k)Attention vector of (1), meanrow(. and mean)col(. cndot.) represents mean pooling of the matrix in the row and column directions, respectively;
node ukTo node uiEmbedded vector based on local level semantic influenceThe calculation is as follows:
in the same manner, node u is calculatediTo node ukEmbedded vector based on local level semantic influence
Figure BDA00022366259200000413
In the second step, the embedded vector training based on the global semantic influence is to obtain the global semantic influence by using a Bi-GRU model (Bi-GRU), and the method comprises the following steps:
given node uiFirst, obtain node uiThe corresponding text embedding matrix X, the tth hidden state component of the GRU model (Gated reinforced RecurrentUnit, GRU) is calculated as follows:
rt=σ(Wxrxt+Whrht-1)
zt=σ(Wxzxt+Whzht-1)
Figure BDA0002236625920000051
Figure BDA0002236625920000052
obtaining node uiForward hidden state of
Figure BDA0002236625920000053
And a backward hidden state
Figure BDA0002236625920000054
And will be
Figure BDA0002236625920000055
And
Figure BDA0002236625920000056
obtaining the hidden layer context state of the Bi-GRU model after connection
Figure BDA0002236625920000057
Applying mean pooling for all historical hidden states, i.e.:
the size of the vector is mapped to the corresponding dimension as follows:
wherein, the matrix
Figure BDA00022366259200000510
Is a projection matrix; vector quantity
Figure BDA00022366259200000511
Is node ukTo node uiBased on global level semantic impact. In the same manner, node u is calculatediTo node ukEmbedded vector based on global level semantic influence
Figure BDA00022366259200000512
In step two, model optimization is performed on the embedded vector of each node in the training network, and the model optimization comprises the following steps:
the original target function is accelerated by adopting a negative sampling algorithm, namely each known edge (u)i,uk) The following objective functions are specified:
Figure BDA00022366259200000513
wherein K represents the number of corresponding negative sampling edges; σ (-) denotes the sigmoid function.
In the third step of the network link prediction method based on multiple semantic influences of multiple neighbor nodes, the probability of friend links existing between corresponding node pairs is measured by using the similarity between embedded vectors of the node pairs; when predictive analysis is performed, node u in social networkiAnd ujForm a link eijThe probability of (c) is:
Figure BDA00022366259200000514
wherein v isi,vjE is respectively node ui,ujThe embedded vector of each node is a topology-based embedded vector
Figure BDA00022366259200000515
And the node's embedded vector based on influence
Figure BDA00022366259200000516
A combination of (a):
compared with the prior art, the invention has the following advantages:
(1) in the method of the invention, a joint embedding vector with the semantic influence of his/her neighbors is trained for each user by using the observed neighbor relations and the text attributes of the user. Rather than using a constant influence score for a neighbor, the present invention models the specific influence of each neighbor on the user. The impact is modeled based on the neighbors and the text attributes of the user. Finally, for any pair of nodes that are not connected in the current network, the missing links between the pair of nodes are predicted by calculating the similarity between their embedded vectors.
(2) In the invention, the local level semantic influence and the global level semantic influence of the neighbor nodes in the network embedding training are jointly simulated. Semantic influence is modeled on multiple levels, so that the semantic influence relation between friend user pairs can be more fully modeled, and the link prediction accuracy and robustness can be improved.
Drawings
FIG. 1 is a schematic diagram of the network link prediction based on multiple semantic effects of multiple neighboring nodes according to the present invention;
FIG. 2 is a diagram of a network link prediction framework based on multiple semantic effects of multiple neighboring nodes according to the present invention.
FIG. 3 is a framework diagram of the module for modeling multiple semantic influences in step two of the present invention.
Detailed Description
The technical solutions of the present invention are further described in detail with reference to the accompanying drawings and specific embodiments, which are only illustrative of the present invention and are not intended to limit the present invention.
The network link prediction method based on multiple semantic influences of multiple neighbor nodes comprises the following three steps: data parsing, model training, and predictive analysis.
1. Data analysis: for analyzingAnalyzing related attribute vectors from the interest attributes of the users and the friend attributes of the users respectively according to the user behaviors and the user relationship data in the social network; and obtaining interest characteristics of the nodes and network structure characteristics. The social network is represented as G ═ (N, E, S), and nodes in the social network all have text attributes that imply interest information. Wherein N ═ { u ═1,u2,...,unIs the set of nodes of the social network. E is the set of friend links in the social network. S is the text attribute set of the node. Node uiIs represented as a word sequence Si=(w1,w2,...,wn) Wherein w istIs a sequence of words SiThe t-th word in (1).
2. Model training: the model is used for constructing and obtaining node embedded vectors in the social network; based on the node interest characteristics and the network structure characteristics obtained by the data analysis module, the model models multiple semantic influences of multiple neighbor nodes to obtain an embedded vector of each node; the training objective is to obtain the network embedding matrix V ═ V1,v2,...,vn]V is formed by the combination of the embedded vectors of all nodes, whereIs node uiThe embedded vector of (2). To train the embedded vector for each node in the network, the sum of the probabilities of all known edges is maximized.
Wherein L (e) is a topology-based objective function LT(e) And an objective function L based on influenceI(e) The topology and impact embedding are mapped into the same representation space. As shown in the following formula:
L(e)=αLT(e)+(1-α)LI(e) (2)
wherein the target function is based on topology
Figure BDA0002236625920000071
Impact-based objective function
Figure BDA0002236625920000072
Figure BDA0002236625920000073
wijIs a weight of an edge in a social network to represent the strength or polarity of a friend's relationship, which makes the present invention applicable to various networks.
Further, when obtaining an impact-based embedded vector for a node in the training process, the impact of each neighbor of the node is considered (an example of the impact in a social network is shown in FIG. 1). The impact is modeled with semantics of each neighbor and the interest text of the node. Semantic effects are modeled at the local and global levels, respectively, and merged into a joint effect-based embedded vector (shown in the left and middle portions of FIG. 2). Local level semantic effects may capture specific semantic effects that may be interpreted with the semantics of certain terms in the text of interest. While the global level semantic impact will capture the semantic impact caused by the entire text of interest of the neighbor (the semantic impact modeling process is shown in figure 3).
All neighbors are paired to node uiIs averaged to generate a final
Figure BDA0002236625920000074
Figure BDA0002236625920000075
Embedding by connecting local level semantic impacts
Figure BDA0002236625920000076
And global level semantic impact embedding
Figure BDA0002236625920000077
Obtaining neighbors ukFor u is pairediThe impact-based embedding of (1), namely:wherein
Figure BDA0002236625920000079
And is
Figure BDA00022366259200000710
Next, the details of the training of the influence-based embedding vector will be described.
2.1 Embedded vectors based on local level semantic impact
When the embedded vector based on the local semantic influence is obtained, the embedded vector is mainly based on a convolutional neural network and an attention mechanism. Obtaining a pair of friend nodes ui,ukText information sequence Si,SkBased on the search layer, the convolutional layer, the attention layer and the output layer, the final embedded vector based on the local semantic influence can be obtained.
Based on a text information sequence SiObtaining a text embedding matrix X ═ X by the lookup layer1,x2,...,xn]Then, based on the following convolution formula, a local feature matrix C is obtained(i)=[c1,c2,...,cn-+1]。
ci=f(Wcxi:i+h-1+b) (4)
In the same manner, node u is acquiredi,ukLocal feature matrix C of(i)
Figure BDA00022366259200000711
In combination with an attention mechanism, the local semantic relevance of a group of friend nodes is coupled, and an attention vector is generated for each of two local feature matrices, so that the local semantic information from a friend node can directly influence the embedded vector of the node.
To obtain the attention vector, first a local feature matrix C is used(i),C(k)And constructing a semantic matching matrix M for local semantic influence, wherein the aim is to acquire a semantic matching signal. It calculatesThe method is as follows:
Figure BDA00022366259200000712
wherein the semantic matching matrix
Figure BDA0002236625920000081
MxyRepresenting the x-th row and y-th column elements of the matrix M.
And performing mean pooling and softmax operation on the semantic matching matrix M to generate an attention vector. The calculation method is as follows:
a(i)=softmax(meanrow(M)) (6)
a(k)=softmax(meancol(M)) (7)
wherein a is(i)
Figure BDA0002236625920000082
Are respectively local feature matrix C(i),C(k)The attention vector of (1). mean is a measure ofrow(. and mean)col(. cndot.) denotes mean pooling of the matrix in the row and column directions, respectively.
With node pair ui,ukFor example, node ukTo node uiLocal semantic influence of embedding vector
Figure BDA0002236625920000083
The calculation is as follows:
Figure BDA0002236625920000084
node uiTo node ukLocal semantic influence of embedding vector
Figure BDA0002236625920000085
The calculation method of (2) is the same as the above formula.
2.2 Embedded vectors based on Global level semantic impact
Bi-GRU models (Bi-GRU) are commonly used to capture global level semantics and have been successfully applied to various NLP tasks. It models context dependencies using forward GRUs and backward GRUs. Thus, two hidden representations can be obtained, and then the forward hidden state and the backward hidden state of each word are concatenated. Given a node ui, first obtain its text embedding matrix X, the computing mode of the t-th hidden state component of the GRU model is as follows:
rt=σ(Wxrxt+Wrht-1) (9)
zt=σ(Wxzxt+Wzht-1) (10)
obtaining node uiForward hidden state of
Figure BDA0002236625920000088
And a backward hidden state
Figure BDA0002236625920000089
And connecting the two to obtain the hidden layer context state of the Bi-GRU model
In the present invention, instead of simply using the hidden state representation in the final state as global semantics, mean pooling is applied to all historical hidden states, i.e.:
Figure BDA00022366259200000811
to match the pooled vector dimensions with the target dimensions, the size of the vector is mapped to the corresponding dimensions:
Figure BDA00022366259200000812
wherein, the matrix
Figure BDA00022366259200000813
Is a projection matrix; vector quantity
Figure BDA00022366259200000814
Is node ukTo node uiBased on global level semantic impact. Node uiTo node ukEmbedded vector based on global level semantic influence
Figure BDA00022366259200000815
The calculation method of (2) is the same as the above process.
2.3 model optimization
The present invention aims to maximize each known edge (u)i,uk) Conditional probability in between. In order to reduce the calculation cost, the original target function is accelerated by adopting a negative sampling algorithm. I.e. each known edge (u)i,uk) The following objective functions are specified:
Figure BDA0002236625920000091
where K represents the number of corresponding negative sampling edges. σ (-) denotes the sigmoid function.
3. And (3) prediction analysis: the similarity between the embedded vectors of node pairs is used to measure the probability that a buddy link exists between the respective node pairs.
The probability is measured based on the similarity between the embedded vectors of a pair of user nodes, and a link prediction is performed (as shown in the right part of fig. 2). For example, node u in a social networkiAnd ujForm a link edge eijThe probability of (c) is:
Figure BDA0002236625920000092
wherein the content of the first and second substances,
Figure BDA0002236625920000093
is node ui,ujEach user's embedding vector is a combination of a topology-based embedding vector and the user's impact-based embedding vector, namely:
the experimental material of the invention has four social network data sets, and these types of data sets are widely used in related research, which are respectively: the Cora citation network, the HepTh citation network, the Twitter social network, and the Coauthorship corporate network. The diversity of the data sets helps to verify the robustness of the present invention. Table 1 summarizes the relevant information for the four data sets.
Table 1 data set information statistics
Figure BDA0002236625920000095
Through a link prediction algorithm, similarity scores between embedded vectors of each pair of nodes in the network can be obtained after prediction work. Although the higher the similarity score is, the higher the possibility that a link exists between nodes is, a corresponding evaluation index is also required to evaluate the feasibility and the accuracy of the link prediction algorithm. To test the accuracy of the algorithm, the link edges in the network are typically divided into a test set and a training set, and the edges in the test set and the edges in the network that are not present are referred to as unknown edges. After calculation by the link prediction algorithm, each unknown edge has a similarity score, and the higher the score is, the higher the possibility that the edge exists is.
The currently commonly used index for evaluating the accuracy of the link prediction algorithm is AUC. AUC refers to the area under the ROC curve, and the effect of the classifier is often evaluated in the theory of signal detection. The traditional AUC measure requires that the AUC value be determined by plotting an ROC curve and calculating its area. When AUC is used as an index for evaluating the accuracy of the link prediction algorithm, it can be understood that the probability that an edge that does not exist is randomly selected in the network is lower than the probability that an edge is randomly selected in the test set.
When using AUC evaluation index, each time taking one edge from nonexistent edge neutralization test set, if the fraction value of nonexistent edge is less than the fraction value of edge in test set, adding 1 point; if the two fraction values are equal, 0.5 points are added. When n times are compared independently, if there are n 'times plus 1 point, and n' times plus 0.5 point, then the value of AUC is defined as:
Figure BDA0002236625920000101
when the data set is separated into a training set and a test set, subsets of different proportions, namely 20%, 40%, 60% and 80%, are randomly selected from the data set to the training network. For each part of the training set, the embedding vector is first trained with the training set. The remaining instances are then used as test data sets for testing the network for evaluating the performance of the link prediction method. Tables 2, 3, 4 and 5 show the experimental effects of the present invention on four actual data sets, and compare the corresponding effects with the performance effects of the existing DeepWalk and TADW models.
TABLE 2 AUC Performance indicators based on the Coautahorship Co-partner network dataset
Figure BDA0002236625920000102
TABLE 3 AUC Performance indicators based on Cora citation network datasets
Figure BDA0002236625920000103
TABLE 4 AUC Performance index based on HepTh citation network dataset
TABLE 5 AUC Performance indicators based on Twitter social network dataset
Figure BDA0002236625920000105
From the performance evaluation results, the invention achieves significant improvement over the baseline model under different data sets and different proportions.
The above embodiments are merely illustrative, and not restrictive, and various changes and modifications may be made by those skilled in the art without departing from the spirit and scope of the invention, and all equivalent technical solutions are intended to be included within the scope of the invention.

Claims (7)

1. A network link prediction method based on multiple semantic influences of multiple neighbor nodes is characterized by comprising the following steps:
step one, data analysis is used for analyzing node behavior data and relationship data among nodes in a social network; analyzing related attribute vectors from interest attributes of the nodes and friend attributes of the nodes respectively; obtaining node interest characteristics and network structure characteristics;
secondly, model training is carried out, and the model training is used for constructing a model for obtaining node embedded vectors in the social network; based on the node interest characteristics and the network structure characteristics obtained by the data analysis module, the model models multiple semantic influences of multiple neighbor nodes to obtain an embedded vector of each node;
and step three, prediction analysis, namely measuring the probability of friend links between corresponding node pairs by using the similarity between the embedding vectors of the node pairs.
2. The method according to claim 1, wherein in step one, the social network is represented as G ═ (N, E, S), nodes in the social network all have text attributes that imply interest information, where N ═ u ═ S1,u2,...,unIs a set of nodes of the social network,e is a friend link set in the social network, and S is a text attribute set of the node; node uiIs represented as a word sequence Si=(w1,w2,...,wn) Wherein w istIs a sequence of words SiThe t-th word in (1).
3. The method for predicting network links based on multiple semantic influences of multiple neighboring nodes according to claim 1, wherein in step two, the training objective is to obtain a network embedding matrix V ═ V-1,v2,...,vn]V is formed by the combination of the embedded vectors of all nodes, where
Figure FDA0002236625910000011
Is node uiThe embedded vector of (2); to train the embedded vector for each node in the network, the sum of the probabilities of all known edges is maximized, as follows:
Figure FDA0002236625910000012
wherein L (e) is a topology-based objective function LT(e) And an objective function L based on influenceI(e) Topology-based and impact-based embedding are mapped into the same representation space;
L(e)=αLT(e)+(1-α)LI(e)
wherein the target function is based on topology
Figure FDA0002236625910000013
Impact-based objective function
Figure FDA0002236625910000015
wijIs a weight of an edge in a social network to represent the strength or polarity of a friend's relationship, which makes it selfThe invention is applicable to various networks;
when an influence-based embedded vector of a node is obtained in a model training process, semantic influence of each neighbor of the node is modeled by semantics of each neighbor and an interest text of the node; the semantic influence is modeled at local and global levels respectively and is combined into a combined embedded vector based on influence; the local semantic influence is used for capturing text semantic influence of a local area, and the text of the local area can be interpreted by using certain term vocabularies in the interest text; capturing the influence caused by the global interest semantics of the neighbor, namely the semantic influence caused by the global semantics described by the whole semantics of the interest text;
all neighbors are paired to node uiIs averaged to generate a final
Figure FDA0002236625910000021
As follows:
Figure FDA0002236625910000022
where m represents node uiThe number of the neighbor nodes of (1),represents a neighbor node ukTo node uiInfluence-based embedding of (1); embedding by connecting local level semantic impacts
Figure FDA0002236625910000024
And global level semantic impact embedding
Figure FDA0002236625910000025
Obtaining neighbor node ukTo node uiThe impact-based embedding of (1), namely:
Figure FDA0002236625910000026
wherein
Figure FDA0002236625910000027
And is
Figure FDA0002236625910000028
4. The method of claim 3, wherein the network link prediction method based on multiple semantic effects of multiple neighboring nodes comprises: the embedded vector training based on the local semantic influence is based on a convolutional neural network and an attention mechanism; the training comprises the following steps: obtaining a pair of friend nodes ui,ukText information sequence Si,SkBased on the search layer, the convolution layer, the attention layer and the output layer, the final embedded vector based on the local semantic influence is obtained;
based on a text information sequence SiObtaining a text embedding matrix X ═ X by the lookup layer1,x2,...,xn]Then, based on the following convolution formula, a local feature matrix C is obtained(i)=[c1,c2,...,cn-h+1];
ci=f(Wcxi:i+h-1+b)
In the same manner, node u is acquiredi,ukLocal feature matrix C of(i)
Coupling local semantic relevance of a group of friend nodes by combining an attention mechanism, and generating an attention vector for each of two local feature matrixes, so that local semantic information from a neighbor node directly influences an embedded vector of the node; when generating the attention vector, first, the local feature matrix C is used(i),C(k)A semantic matching matrix M for local semantic influence is constructed, the goal is to obtain semantic matching signals, and the calculation mode is as follows:
Figure FDA00022366259100000210
wherein, the semantic matching matrix
Figure FDA00022366259100000211
MxyRepresents the x row and y column elements of the matrix M; performing mean pooling and softmax operation on the semantic matching matrix M to generate an attention vector, wherein the calculation mode is as follows:
a(i)=softmax(meanrow(M))
a(k)=softmax(meancol(M))
wherein, a(i)Are respectively local feature matrix C(i),C(k)Attention vector of (1), meanrow(. and mean)col(. cndot.) represents mean pooling of the matrix in the row and column directions, respectively;
node ukTo node uiEmbedded vector based on local level semantic influenceThe calculation is as follows:
in the same manner, node u is calculatediTo node ukEmbedded vector based on local level semantic influence
Figure FDA00022366259100000215
5. The method of claim 3, wherein the network link prediction method based on multiple semantic effects of multiple neighboring nodes comprises: the embedded vector training based on the global semantic influence utilizes a Bi-GRU model to obtain the global semantic influence, and comprises the following steps:
given node uiFirst, obtain node uiThe corresponding text embedding matrix X, the tth hidden state component of the GRU model is calculated as follows:
rt=σ(Wxrxt+Whrht-1)
zt=σ(Wxzxt+Whzht-1)
Figure FDA0002236625910000031
Figure FDA0002236625910000032
obtaining node uiForward hidden state ofAnd a backward hidden state
Figure FDA0002236625910000034
And will be
Figure FDA0002236625910000035
And
Figure FDA0002236625910000036
obtaining the hidden layer context state of the Bi-GRU model after connection
Figure FDA0002236625910000037
Applying mean pooling for all historical hidden states, i.e.:
Figure FDA0002236625910000038
the size of the vector is mapped to the corresponding dimension as follows:
Figure FDA0002236625910000039
wherein, the matrixIs a projection matrix; vector quantityIs node ukTo node uiBased on global level semantic impact; in the same manner, node u is calculatediTo node ukEmbedded vector based on global level semantic influence
Figure FDA00022366259100000312
6. The method of claim 3, wherein the network link prediction method based on multiple semantic effects of multiple neighboring nodes comprises: model optimization is carried out on the embedded vector of each node in the training network, and the model optimization comprises the following steps:
the original target function is accelerated by adopting a negative sampling algorithm, namely each known edge (u)i,uk) The following objective functions are specified:
Figure FDA00022366259100000313
wherein K represents the number of corresponding negative sampling edges; σ (-) denotes the sigmoid function.
7. The method for network link prediction based on multiple semantic effects of multiple neighboring nodes according to claim 1, wherein in step three, the similarity between the embedded vectors of node pairs is used to measure the probability of friend links existing between corresponding node pairs; when predictive analysis is performed, node u in social networkiAnd ujForm a link eijThe probability of (c) is:
wherein v isi
Figure FDA0002236625910000042
Are respectively node ui,ujThe embedded vector of each node is a topology-based embedded vector
Figure FDA0002236625910000043
And the node's embedded vector based on influenceA combination of (a):
Figure FDA0002236625910000045
CN201910985752.0A 2019-10-17 2019-10-17 Network link prediction method based on multiple semantic influence of multiple neighbor nodes Active CN110851491B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910985752.0A CN110851491B (en) 2019-10-17 2019-10-17 Network link prediction method based on multiple semantic influence of multiple neighbor nodes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910985752.0A CN110851491B (en) 2019-10-17 2019-10-17 Network link prediction method based on multiple semantic influence of multiple neighbor nodes

Publications (2)

Publication Number Publication Date
CN110851491A true CN110851491A (en) 2020-02-28
CN110851491B CN110851491B (en) 2023-06-30

Family

ID=69597634

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910985752.0A Active CN110851491B (en) 2019-10-17 2019-10-17 Network link prediction method based on multiple semantic influence of multiple neighbor nodes

Country Status (1)

Country Link
CN (1) CN110851491B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489192A (en) * 2020-03-27 2020-08-04 北京理工大学 Product share trend prediction method integrating ICT supply chain network topological features
CN112100514A (en) * 2020-08-31 2020-12-18 浙江工业大学 Social network service platform friend recommendation method based on global attention mechanism representation learning
CN112446542A (en) * 2020-11-30 2021-03-05 西安电子科技大学 Social network link prediction method based on attention neural network
CN112507246A (en) * 2020-12-13 2021-03-16 天津大学 Social recommendation method fusing global and local social interest influence
CN113052712A (en) * 2021-03-05 2021-06-29 浙江师范大学 Social data analysis method and system and storage medium
CN113784380A (en) * 2021-07-28 2021-12-10 南昌航空大学 Topology prediction method adopting graph attention network and fusion neighborhood
CN114932582A (en) * 2022-06-16 2022-08-23 上海交通大学 Robot small-probability failure prediction method based on Bi-GRU self-encoder
CN115345262A (en) * 2022-10-18 2022-11-15 南京工业大学 Neural network model key data mining method based on influence score and application
CN117010409A (en) * 2023-10-07 2023-11-07 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102348250A (en) * 2010-07-29 2012-02-08 华为技术有限公司 Routing method and node device of delay tolerant network
CN104615608A (en) * 2014-04-28 2015-05-13 腾讯科技(深圳)有限公司 Data mining processing system and method
CN106952167A (en) * 2017-03-06 2017-07-14 浙江工业大学 A kind of catering trade good friend Lian Bian influence force prediction methods based on multiple linear regression
CN107784124A (en) * 2017-11-23 2018-03-09 重庆邮电大学 A kind of LBSN super-networks link Forecasting Methodology based on time-space relationship
CN109189936A (en) * 2018-08-13 2019-01-11 天津科技大学 A kind of label semanteme learning method measured based on network structure and semantic dependency
CN109992725A (en) * 2019-04-10 2019-07-09 哈尔滨工业大学(威海) A kind of social networks representation method based on two-way range internet startup disk
CN110110094A (en) * 2019-04-22 2019-08-09 华侨大学 Across a network personage's correlating method based on social networks knowledge mapping

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102348250A (en) * 2010-07-29 2012-02-08 华为技术有限公司 Routing method and node device of delay tolerant network
CN104615608A (en) * 2014-04-28 2015-05-13 腾讯科技(深圳)有限公司 Data mining processing system and method
CN106952167A (en) * 2017-03-06 2017-07-14 浙江工业大学 A kind of catering trade good friend Lian Bian influence force prediction methods based on multiple linear regression
CN107784124A (en) * 2017-11-23 2018-03-09 重庆邮电大学 A kind of LBSN super-networks link Forecasting Methodology based on time-space relationship
CN109189936A (en) * 2018-08-13 2019-01-11 天津科技大学 A kind of label semanteme learning method measured based on network structure and semantic dependency
CN109992725A (en) * 2019-04-10 2019-07-09 哈尔滨工业大学(威海) A kind of social networks representation method based on two-way range internet startup disk
CN110110094A (en) * 2019-04-22 2019-08-09 华侨大学 Across a network personage's correlating method based on social networks knowledge mapping

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
TU C, LIU H等: "CANE: Context-Aware Network Embedding for Relation Modeling", 《PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS(ACL 2017)》 *
YANG C, LIU Z等: "Network Representation Learning with Rich Text", 《PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE,(IJCAI 2015)》 *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489192A (en) * 2020-03-27 2020-08-04 北京理工大学 Product share trend prediction method integrating ICT supply chain network topological features
CN112100514A (en) * 2020-08-31 2020-12-18 浙江工业大学 Social network service platform friend recommendation method based on global attention mechanism representation learning
CN112100514B (en) * 2020-08-31 2021-10-26 浙江工业大学 Friend recommendation method based on global attention mechanism representation learning
CN112446542A (en) * 2020-11-30 2021-03-05 西安电子科技大学 Social network link prediction method based on attention neural network
CN112446542B (en) * 2020-11-30 2023-04-07 山西大学 Social network link prediction method based on attention neural network
CN112507246B (en) * 2020-12-13 2022-09-13 天津大学 Social recommendation method fusing global and local social interest influence
CN112507246A (en) * 2020-12-13 2021-03-16 天津大学 Social recommendation method fusing global and local social interest influence
CN113052712A (en) * 2021-03-05 2021-06-29 浙江师范大学 Social data analysis method and system and storage medium
CN113052712B (en) * 2021-03-05 2022-05-31 浙江师范大学 Social data analysis method and system and storage medium
CN113784380A (en) * 2021-07-28 2021-12-10 南昌航空大学 Topology prediction method adopting graph attention network and fusion neighborhood
CN113784380B (en) * 2021-07-28 2023-05-23 南昌航空大学 Topology prediction method adopting graph attention network and fusion neighborhood
CN114932582A (en) * 2022-06-16 2022-08-23 上海交通大学 Robot small-probability failure prediction method based on Bi-GRU self-encoder
CN114932582B (en) * 2022-06-16 2024-01-23 上海交通大学 Robot small probability failure prediction method based on Bi-GRU self-encoder
CN115345262A (en) * 2022-10-18 2022-11-15 南京工业大学 Neural network model key data mining method based on influence score and application
CN115345262B (en) * 2022-10-18 2022-12-27 南京工业大学 Neural network model key data mining method based on influence scores
CN117010409A (en) * 2023-10-07 2023-11-07 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis
CN117010409B (en) * 2023-10-07 2023-12-12 成都中轨轨道设备有限公司 Text recognition method and system based on natural language semantic analysis

Also Published As

Publication number Publication date
CN110851491B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN110851491B (en) Network link prediction method based on multiple semantic influence of multiple neighbor nodes
CN111538819B (en) Method for constructing question-answering system based on document set multi-hop reasoning
CN113628294B (en) Cross-mode communication system-oriented image reconstruction method and device
KR102234850B1 (en) Method and apparatus for complementing knowledge based on relation network
CN111506714A (en) Knowledge graph embedding based question answering
CN112199608B (en) Social media rumor detection method based on network information propagation graph modeling
CN113780002B (en) Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning
CN112289467B (en) Low-resource scene migratable medical inquiry dialogue system and method
CN113779996B (en) Standard entity text determining method and device based on BiLSTM model and storage medium
CN113449204B (en) Social event classification method and device based on local aggregation graph attention network
CN116932722A (en) Cross-modal data fusion-based medical visual question-answering method and system
Hu et al. One-bit supervision for image classification
CN115204171A (en) Document-level event extraction method and system based on hypergraph neural network
CN113806564B (en) Multi-mode informative text detection method and system
CN114942998A (en) Entity alignment method for sparse knowledge graph neighborhood structure fusing multi-source data
CN112015890B (en) Method and device for generating movie script abstract
Cai et al. Semantic and correlation disentangled graph convolutions for multilabel image recognition
CN109582953B (en) Data support scoring method and equipment for information and storage medium
CN116306834A (en) Link prediction method based on global path perception graph neural network model
CN111274374B (en) Data processing method and device, computer storage medium and electronic equipment
Zhou et al. What happens next? Combining enhanced multilevel script learning and dual fusion strategies for script event prediction
Irani et al. ArguSense: Argument-Centric Analysis of Online Discourse
CN117764085B (en) Machine reading understanding method based on cross-graph collaborative modeling
Li et al. A crowdsourcing based human-in-the-loop framework for denoising uus in relation extraction tasks
Valls et al. Information flow in graph neural networks: A clinical triage use case

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant