CN111931903A

CN111931903A - Network alignment method based on double-layer graph attention neural network

Info

Publication number: CN111931903A
Application number: CN202010654776.0A
Authority: CN
Inventors: 卢美莲; 戴银龙
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2020-07-09
Filing date: 2020-07-09
Publication date: 2020-11-13
Anticipated expiration: 2040-07-09
Also published as: CN111931903B

Abstract

The invention provides a network alignment method based on a double-layer graph attention neural network, which comprises two stages of network embedded representation and embedded vector space alignment. In a network embedding representation stage, a double-layer graph attention neural network is proposed to carry out network representation learning so as to extract an embedding vector of a user in a social network; in the embedding vector space alignment stage, a classification model is built by using the obtained embedding vector of the social network user nodes and part of known anchor link sets to predict the anchor links among different social networks, and a bidirectional embedding vector space alignment strategy is provided to meet the one-to-one matching constraint of user entities among different social networks. Through the arrangement, the method can effectively capture different influence weights among the user in the social network, the neighbor users and among various characteristics, so that the accurate representation of the user in the social network is learned, and the accuracy of anchor link prediction among different social networks is improved.

Description

Network alignment method based on double-layer graph attention neural network

Technical Field

The invention relates to the technical field of data mining and machine learning, in particular to a network alignment method based on a double-layer graph attention neural network.

Background

With the rapid development of the internet and mobile devices, online social networks have become an indispensable hot platform for people to share and communicate information. Because different social networking platforms provide different services, a person will typically register accounts on multiple social networking platforms at the same time to meet their different needs. These users shared by different social networking platforms naturally form anchor links connecting different social networks, facilitating information interaction between different social networks. Mining information interaction across multiple social fields can be effectively applied to multiple downstream social network applications such as cross-domain link prediction, cross-domain recommendation, cross-domain information diffusion and the like. However, in today's society, these social networking platforms are typically maintained separately by different companies, with some information isolation from each other. Therefore, aligning accounts belonging to the same user in different social platforms has become an urgent research topic to be solved. Current research on network alignment methods can be mainly divided into two categories: unsupervised-based network alignment methods and supervised-based network alignment methods. Wherein:

(1) the network alignment method based on unsupervised comprises the following steps: unsupervised-based network alignment models attempt to align user accounts between different social networks without known anchor links. In this type of method, researchers typically measure user similarity between different social networks according to the rarity of the user's user name in the social network and the consistency of the neighborhood structure, and then predict the anchor link using a greedy method or a method that minimizes the structural consistency of the two social networks.

(2) The network alignment method based on supervision comprises the following steps: the general idea of a supervised-based network alignment model is to translate the alignment problem between different social networks into a classification problem about anchor links, i.e. to determine whether any two users between different social networks have an anchor link relationship. Early researches constructed classification models by manually extracting certain characteristics of users in social networks, and although the method can solve the alignment problem of partial users in partial social network scenes to a certain extent, the method still has great limitations. Firstly, the method for manually extracting the user features is very complicated, and people usually cannot directly judge which features are effective, and the effective features of different social network scenes are possibly different; secondly, the social network platform often hides partial real information of the user for the purpose of protecting the privacy of the user, so that partial information loss often exists in manual extraction of user features, and the accuracy of the anchor link prediction task is affected.

In recent years, inspired by the widespread success of web presentation learning in a single social network analysis task, some researchers began to apply web presentation learning to a network alignment task between multiple social networks, and this type of approach attempted to learn a common embedded vector space for users in different social networks without manually extracting the effective features of the users in the social networks. These methods, while attempting to model a user's behavior in a social network from multiple aspects of the user's social structure and profile information, ignore the differences in the impact weights of their different neighboring user nodes when capturing their representations of user nodes, and the different impact weights of different attribute information on user information interactions.

In summary, in view of the importance of network alignment research on multiple social network analysis tasks and some limitations of existing research, the present invention aims to provide a network alignment method based on a two-layer graph attention neural network, which enables a model to learn an accurate representation of a user in a social network by combining a user-level attention mechanism and a feature-level attention mechanism, thereby improving the prediction accuracy of anchor links.

Disclosure of Invention

In view of this, the present invention provides a network alignment method based on a two-layer graph attention neural network, which can effectively capture different influence weights between a user and neighboring users and between various features in a social network while modeling a user social behavior by using information such as an attribute, a local social structure, and a global social structure of the user in the social network, so as to learn an accurate representation of the user in the social network, and improve accuracy of anchor link prediction between different social networks.

Based on the above object, the present invention provides a network alignment method based on a two-layer graph attention neural network, which is characterized by comprising:

basic definition: the social network abstraction is a directed graph G ═ (V, E, X), where V ═ V _i1, …, N represents a set of user nodes in the social network, and N is the number of user nodes in the social network; e ═ E_i,j＝(v_i,v_j)|v_i∈V,v_je.V represents the set of relationships between users in a social network, e_i,j＝(v_i,v_j) Representing a user v_iAnd user v_jThere is an association between them; x ═ X_iI 1, …, N represents a set of feature vectors for all users, for each user node v_iAll have a node feature vector x_iCorrespondingly, the feature vector can be extracted from the aspects of personal data, behaviors and network social structure information of the user node, and the two networks to be aligned are named as a source social network and a target social network without loss of generality and are respectively named as G^sAnd G^tRepresents;

for any two users from different social networks

And

we use

Representing an anchor link relationship between the source social network and the target social network, wherein

And

the same user is respectively in different social networks G^sAnd G^tAccount (2) of (1); anchor links are one-to-one between two users in different social networksThe link relation is that the condition that two anchor links share the same user account of the same social network does not exist;

two different social networks G^sAnd G^tThe set formed by all the anchor link relations between the anchor links is called as an anchor link set and is used

Is shown in which

Representing a user account in the source social network,

representing a user account in a target social network; for two different social networks G^s＝(V^s,E^s,X^s) And G^t＝(V^t,E^t,X^t) Network alignment aims to discover a set of anchor-linked sets T between two social networks, where any element e 'in the set T'_ijE T denotes two user accounts in different social networks

And

an anchor link between;

s1, a network preprocessing module: preprocessing a social network according to the input network type and the contained user attribute information, and constructing an initialized user node characteristic vector matrix;

s2, network embedded representation module: taking an initialized user node characteristic vector matrix obtained by a network preprocessing module and an adjacency matrix of a social network as input, and capturing a complex information interaction relation of a user in the social network through a double-layer graph attention neural network so as to learn potential information of user nodes in the social network and obtain an accurate user node embedded vector;

s3, embedding vector space alignment module: constructing a classification model according to the user node embedded vectors of the source social network and the target social network learned in the step S2 to predict anchor links, and adopting a bidirectional embedded vector space alignment strategy to meet the one-to-one matching constraint of user accounts among different social networks;

and S4, solving the intersection to complete network alignment.

Preferably, the step S2 includes the following: user v_iIs expressed as x_iExtracting a plurality of feature vectors of the user according to the network type and transversely stacking the feature vectors to generate an initialized feature vector representation of the user

Wherein d represents the dimension of the user initialized feature vector, d ', d ", d'" appearing hereinafter represent different dimensions respectively, and the initialized feature vectors of all users in the social network are constructed into a state matrix X, wherein each row is the feature vector of a specific user node: x ═ X₁,x₂,…x_N)^T。

Preferably, the network type is a topological network, the feature vectors of the user are initialized randomly in a random matrix mode, and the weight parameters of the random matrix are learned through a training stage of a double-layer graph attention neural network model.

Preferably, if the network type is an attribute network, vectorizing the user attribute by using the following method: randomly initializing user data information such as a user name and the like in a word embedding mode to obtain a user name characteristic vector; mining the language style of the user from the long text information of the user by adopting a Doc2Vec model, and learning the text characteristic vector of the user; vector initialization is carried out on the user track information through spatial clustering to obtain a spatial feature vector of a user; and directly taking the user scoring and sign-in times as a characteristic dimension of the user, and carrying out vector initialization.

Preferably, the step S2 includes the following:

s2.1, embedding a user layer node into a representation submodule: the system is responsible for capturing different influence weights among users to perform weighted aggregation on local neighborhood information of the users in the social network, so that node embedded vectors of user levels are learned;

s2.2, embedding the feature layer nodes into a representation submodule: the method is used for learning the influence weight among different features of a user so as to capture the interactive relation among the features with finer granularity, thereby learning the node embedding vector of the user at the feature level;

s2.3, embedding a vector fusion submodule: the method is responsible for reserving and resetting user embedded vectors from different levels of user levels and feature levels so as to fuse node embedded vectors of multiple visual angles and improve the accuracy of network alignment tasks.

Preferably, S2.1 includes the following: using a learnable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

according to the related knowledge of the attention neural network of the graph, for any two user nodes v_iAnd v_jFirst, the relationship strength e between the two user nodes is calculated_ij：

Wherein

And

representing a user node v_iAnd v_jThe embedded vectors at the l-th layer,

represents the weight of the l-th layerThe parameter, "| |" is a concatenation operator, which represents that two vectors are transversely spliced, LeakyReLU (·) is an activation function of a neuron, and a user node v is calculated_iWhen the neighborhood information is aggregated, the information contribution proportion from different neighbors adopts a softmax (·) function to normalize the user node and all the neighbor user nodes v_k∈N(v_i) The strength of the relationship between the two is calculated as follows:

a_ijis called user node v_iAnd v_jAttention coefficient of (a) in between_ijThe larger the value of (v), the more closely the relationship between the two users is represented, and the user node v is obtained according to calculation_iEach user node v attention coefficient with all its neighbor node users (including itself)_iThe new embedded vector of (2) can be defined as follows:

wherein (-) is an activation function of a neuron, and a user-level node embedding vector h of each user in the social network can be obtained by performing linear aggregation of different influence weights on neighborhood information of the user_iThe user-level vector matrix M ═ (h) is formed₁,h₂,…,h_N)^T。

Preferably, S2.2 includes the following: the user-level vector matrix M obtained in S2.1 is (h)₁,h₂,…,h_N)^TAs input, and taking into account the multidimensional attention between the features of any two user nodes in the social network, i.e. calculating an attention coefficient for each corresponding dimension of the two user node vectors,

suppose h_iAnd h_jDistribution represents two user nodes v in a social network_iAnd v_jThe node of (2) is embedded into the vector, then two usersThe relationship between node-embedded vectors can be defined as:

f(h_i,h_j)＝W₅·tanh(W₄·h_i+W₃·h_j+b₂)+b₁

wherein W₃，W₄And

is a parameter matrix, b₁，

Is a bias term, tanh (-) is an activation function of a neuron,

using a feed-forward neural network to operate according to f (h)_i,h_j) Calculating the dependency relationship of any two user nodes on the characteristic level, and enabling beta_ijRepresenting a user node v_iAnd v_jThe attention coefficient vector of the feature layer of [ beta ]_ij]_kSimilarly, in order to compare the attention coefficients of the features of the corresponding dimensions between different attention coefficient vectors, the attention coefficient vectors of all the neighbors of the user are normalized according to the corresponding feature dimensions by using the softmax (·) function, and then:

calculating attention coefficient [ beta ] between each dimension of any two users_ij]_kThese attention coefficients can then be combined into an attention coefficient vector β between the two users according to the corresponding feature dimensions_ij＝([β_ij]₁,[β_ij]₂,…,[β_ij]_d″) The dimensions of the attention vector and the user node vector are the same, each dimension [ β [ ]_ij]_kCorresponding to the weight of influence of each dimension of the user node vector, [ beta ]_ij]_kThe larger, the two user nodes v are represented_iAnd v_jThe stronger the correlation degree of the characteristics of the kth dimension is, finally, for any user node v in the social network_iThe method is characterized in that the embedded vectors of the neighbor users are weighted and linearly aggregated according to the learned attention coefficient vectors among different user nodes, and the method is different from the aggregation mode of the user layer attention mechanism, wherein the aggregation function of the feature layer attention mechanism is to aggregate neighborhood information by adopting a mode of element-by-element multiplication:

wherein

Representing the element-by-element product between two vectors with the same shape to obtain a vector with the same shape, wherein the activation function of the neuron linearly aggregates the neighborhood information of the users according to the influence weights of different features to obtain the node embedded representation of the feature level of each user in the social network

Forming a feature level vector matrix

Preferably, S2.3 includes the following: embedding vector fusion submodule changes user level vector matrix M into (h)₁,h₂,…,h_N)^TAnd feature level vector matrix

As input, the weight parameters of node embedding vectors of different levels of the same user are learned automatically by using a gating mechanism to effectively retain and reset information representations from different levels,

for any user node v in social network_iThe module first calculates a user-level node embedding vector h_iAnd feature level node embedded vectors

The weight relationship vector between them is calculated as follows:

wherein

And

is a parameter matrix of the gated neural network,

is a bias term, sigmoid (·) is an activation function of a neuron, and according to learning a weight relation vector F, user node embedded vectors from different layers can be selectively retained and reset, and the final user node embedded vector is expressed as follows:

wherein

Representing selective retention of node-embedded vectors at the user level,

represents the selective resetting of node-embedded vectors to the feature level, where 1-F is a vector operation, representing subtracting 1 from each dimension of the vector F,

embedding according to user level nodes of users in social networkThe in representation and the feature level node embedded representation are fused by using a gating mechanism, and the final node embedded representation z of each user in the social network can be obtained_iThe component node embedding vector matrix Z ═ (Z)₁,z₂,…,z_N)^T；

For any given pair of user nodes v in a social network_iAnd v_jThe node embedding vectors are respectively z_iAnd z_jThen the probability of an edge existing between the two nodes can be expressed as:

wherein

Is a sigmoid function;

in order to optimize the model parameters of the two-layer graph attention neural network, we need to define an objective function of the model whose goal is to maximize the probability of observable edge occurrences in the social network, i.e.:

to avoid trivial solution, for each edge e that can be observed_i,j＝(v_i,v_j) We use a negative sampling technique to maximize the objective function, namely:

the first item models positive examples in the social network, the second item models negative examples by randomly generating edges associated with nodes through a negative sampling technology, and the sampled probability of each node satisfies

K denotes the number of edges of the negative example of the sample, d_vRepresenting degrees of user node v; according to the objective function, parameters of the double-layer graph attention neural network model can be learned by adopting a back propagation optimization algorithm, so that node vector matrixes of the source social network and the target social network are obtained respectively

And

wherein | V^sI represents the number of user nodes in the source social network, | V^tI represents the number of user nodes in the target social network;

representing users in a source social network

The node of (a) embeds the vector,

representing users in a target social network

The node of (1) embeds a vector; z^sAnd Z^tAlso referred to as the embedded vector space for the source social network and the target social network.

Preferably, the step S3 includes the following: based on the step of S2, obtaining a node embedded vector matrix of the source social network

Embedding vector matrix with nodes of target social network

Each row of the node vector matrix represents a node embedded vector corresponding to one user in the social network, and the whole node vector matrix is also called an embedded vector space corresponding to the social network;

defining a mapping function M that maps user node vectors from one embedded vector space to another, assuming we now project the source social network to the target social network to find a target node that matches the source social network, given a partially known set of anchor links T as the supervisory information, the target function can be defined as:

wherein M is_s→t() represents a mapping function from a source social network to a target social network, the mapping function being constructed using a multi-tiered perceptron, θ being a weighting parameter for the multi-tiered perceptron; the objective function aims to minimize the distance from a source user node to a target user node after mapping the source user node in a user pair with an anchor link relation to the target social network, construct a classification model to predict whether any two users between different social networks have an anchor link, and select the target user node closest to the source user node after projection to construct a candidate anchor link.

From the above, it can be seen that the complex interaction behavior of the user in the social network is modeled by using the graph neural network and the information of the user in the social network, such as the attributes, the local social structure and the global social structure, so that a more accurate user node embedding vector can be obtained;

the invention provides a double-layer attention neural network to learn attention coefficients between users at a user level and a characteristic level respectively, and capture differences of influence weights between different users from multiple visual angles, so that the learned user node embedded representation is more in line with the actual situation of the users in a social network;

the invention provides a bidirectional embedded vector space alignment strategy to predict anchor links among different social networks, so that users among different social networks are aligned and meet a one-to-one matching constraint relationship. Meanwhile, further confirmation of the bidirectional embedded vector space alignment strategy also improves the accuracy of anchor link prediction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a method framework diagram of the present invention;

FIG. 2 is a fused representation of user node embedded representations from different perspectives of the present invention;

FIG. 3 is a schematic diagram of the spatial alignment strategy of the bi-directional embedded vector of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

It is to be noted that technical terms or scientific terms used in the embodiments of the present invention should have the ordinary meanings as understood by those having ordinary skill in the art to which the present disclosure belongs, unless otherwise defined. The use of "first," "second," and similar terms in this disclosure is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", and the like are used merely to indicate relative positional relationships, and when the absolute position of the object being described is changed, the relative positional relationships may also be changed accordingly.

As shown in fig. 1 to 3, the embodiment:

first, the present invention describes the social network alignment problem as follows:

the social network abstraction is a directed graph G ═ (V, E, X), where V ═ V _i1, …, N represents a set of user nodes in the social network, and N is the number of user nodes in the social network; e ═ E_i,j＝(v_i,v_j)|v_i∈V,v_je.V represents the set of relationships between users in a social network, e_i,j＝(v_i,v_j) Representing a user v_iAnd user v_jThere is an association between them; x ═ X_iI 1, …, N represents a set of feature vectors for all users, for each user node v_iAll have a node feature vector x_iCorrespondingly, the feature vector can be extracted from the aspects of personal data, behaviors, structural attributes and the like of the user node. For each edge e in the relationship set_i,jLet us give w_i,jRepresenting the weight of the edge. Wherein if two users have a link relationship in the social network, w _i,j1, otherwise, w_i,j0. Matrix array

Referred to as the adjacency matrix of graph G. Without loss of generality, we name the two networks to be aligned as the source social network and the target social network, and use G respectively^sAnd G^tAnd (4) performing representation.

For any two users from different social networks

And

we use

Represents the association relationship, w'_i,jRepresenting the relationship weight. If it is not

And

are two different accounts of the same user in the source and target social networks, respectively, we then let w'_i,j1 denotes that there is an anchor link relationship between the two users, otherwise, w'_i,j0. The goal of network alignment is to find a set of anchor links between different social networks

Wherein

Representing user nodes

From the source social network, the user may select,

representing user nodes

From target social network, w'_i,jTwo user nodes are denoted by 1

And

belonging to the same user entity in the real world.

Secondly, referring to fig. 1, the present invention provides a network alignment method framework, which mainly comprises a network preprocessing module, a network embedded representation module and an embedded vector space alignment module. Wherein

A network preprocessing module: for the social network G ═ (V, E), we first perform preprocessing work on the network according to the type of the input network and the contained information, thereby constructing an initialized user vector matrix. Common network types can be divided into a topological network and an attribute network, and for the topological network, the invention adopts an embedding layer to randomly initialize a node embedding vector x of a user_iLearning the weight parameters of the embedding layer through a training stage of the double-layer graph attention neural network model; for the attribute network, the user attribute is usually vectorized in different ways according to the contained information. For short text attributes similar to the user name, an embedding layer can be adopted for random initialization; for the long text attribute similar to the user comment, a theme model and other methods are usually adopted to learn the theme preference of the user; regarding the check-in information of the user, the access preference of the user to merchants in the same area is considered to be similar, and the check-in information of the user is initialized by adopting a spatial clustering method; the numerical attributes such as the user scoring times and the check-in times can be directly used as one dimension of the user attributes. After extracting the attribute vectors of the various attributes of the user, we stack these attribute vectors horizontally to generate the initialization vector of the user in the attribute network. Thus, for each user in the social network, we ultimately generate an initialized representation of the user

Where d represents the dimension of the user initialization vector. Constructing a state matrix X from the initialization vectors of all users in the social network, wherein each row is a feature vector of a specific user node: x ═ X₁,x₂,…x_N)^T。

Network embedded representation module: the module sets the initialized user node characteristic vector matrix X (X) obtained by the network preprocessing module as (X)₁,x₂,…x_N)^TAdjacency matrix with social network

As an input, capturing complex information interaction relation of a user in a social network through a two-layer graph attention machine neural network to learn potential information representation z of a user node in the social network_i. The module may be further subdivided into three sub-modules: the system comprises a user layer node embedded expression submodule, a characteristic layer node embedded expression submodule and an embedded vector fusion submodule.

1. User layer node embedded representation submodule

The user layer node embedded representation submodule is responsible for capturing different influence weights among users to conduct weighted aggregation on local neighborhood information of the users in the social network, and therefore node embedded representation of the user level is learned. In order to ensure that the node vector has enough information representation capability, the invention firstly utilizes a learnable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

for any two user nodes v_iAnd v_jFirst, the strength e of the relationship between the two users is calculated_ij：

Wherein

And

representing a user node v_iAnd v_jIn the vector representation at the l-th layer,

representing the weight parameter of the l-th layer, "|" is a concatenation operator representing the horizontal concatenation of two vectors, and LeakyReLU (·) is the activation function of neurons. For calculating the user node v_iWhen the neighborhood information is aggregated, the information contribution proportion from different neighbors adopts the softmax (·) function to carry out the aggregation on the user node and all the neighbor user nodes v_k∈N(v_i) The relationship strength of (2) is normalized, and the calculation mode is as follows:

a_ijis called a user node v_iAnd v_jAttention coefficient of (a) in between_ijThe larger the value of (a), the more closely the relationship between the two users is represented. According to the user node v obtained by calculation_iEach user node v attention coefficient with all its neighbor node users (including itself)_iThe new potential information representation of (a) may be defined as follows:

where (-) is the activation function of the neuron. According to the linear aggregation of different influence weights on the neighborhood information of the users, the user-level node embedded representation h of each user in the social network can be obtained_iThe user-level vector matrix M ═ (h) is formed₁,h₂,…,h_N)^T。

2. Feature level node embedded representation submodule

The feature layer node embedded representation submodule is responsible for learning influence weights among different features of a user so as to capture an interactive relation among features with finer granularity, and therefore node embedded representation of the user at a feature level is learned. Feature level node embedded representationThe user level vector matrix M obtained in the previous stage of the module is (h)₁,h₂,…,h_N)^TAs input, and taking into account the multidimensional attention of features between any two nodes in the social network, an attention coefficient is calculated for each corresponding dimension of the two user node vectors.

Suppose h_iAnd h_jDistribution represents two user nodes v in a social network_iAnd v_jThe relationship between two user node embedding vectors can be defined as:

f(h_i,h_j)＝W₅·tanh(W₄·h_i+W₃·h_j+b₂)+b₁

wherein W₃，W₄And

is a parameter matrix, b₁，

Is a bias term, and tanh (-) is the activation function of a neuron. Using a feed-forward neural network to operate according to f (h)_i,h_j) Calculating the dependency relationship of any two user nodes on the characteristic level. Let beta_ijRepresenting a user node v_iAnd v_jThe attention coefficient vector of the feature layer of [ beta ]_ij]_kRepresenting the k-th dimension of the attention vector. Similarly, in order to compare the attention coefficients of the features of the corresponding dimensions between different attention coefficient vectors, the attention coefficient vectors of all the neighbors of the user are normalized according to the corresponding feature dimensions by using the softmax (·) function, and then:

calculating attention coefficient [ beta ] between each dimension of any two users_ij]_kThese attention coefficients can then be usedCombining the attention coefficient vectors beta between the two users according to the corresponding feature dimensions_ij＝([β_ij]₁,[β_ij]₂,…,[β_ij]_d). The dimensions of the attention vector and the user node vector are the same, each dimension [ β ]_ij]_kCorresponding to the weight of influence of each dimension of the user node vector, [ beta ]_ij]_kThe larger, the two user nodes v are represented_iAnd v_jThe stronger the correlation of the features of the k-th dimension (c).

Finally, for any user node v in the social network_iAnd carrying out weighted linear aggregation on potential information representations of neighbor users according to the learned attention coefficient vectors among different user nodes. Different from the aggregation mode of the user layer attention mechanism, the aggregation function of the feature layer attention mechanism aggregates the neighborhood information in an element-by-element multiplication mode:

wherein

Represents the element-wise product between two vectors of the same shape, resulting in a vector of the same shape, (-) being the activation function of the neuron. According to the linear aggregation of the neighborhood information of the users according to the influence weights of different features, the node embedded representation of the feature level of each user in the social network can be obtained

Forming a feature level vector matrix

3. Embedded vector fusion submodule

The embedded vector fusion submodule is responsible for reserving and resetting user potential information from different levels of a user level and a characteristic level so as to fuse node embedded representation of multiple visual angles and improve the accuracy of a subsequent network alignment task. Embedding vector fusion submodule changes user level vector matrix M into (h)₁,h₂,…,h_N)^TAnd feature level vector matrix

As input, the weight parameters of different levels of node-embedded representations of the same user are learned automatically using a gating mechanism to efficiently retain and reset information representations from different levels.

As shown in FIG. 2, for any one user node v in the social network_iThe module first computes a user-level node-embedded representation h_iAnd feature level node embedding representation

The weight relationship vector between them is calculated as follows:

wherein

And

is a parameter matrix of the gated neural network,

is a bias term, sigmoid (·) is the activation function of neurons. According to the learned weight relation vector F, the selection can be madeThe potential information of users from different layers is represented to be reserved and reset, and the final user node embedding is represented as follows:

wherein

Represents the selective retention of node-embedded representations at the user level,

represents the selective resetting of the node-embedded representation of the feature level, where 1-F is a vector operation, representing subtracting 1 from each dimension of the vector F.

According to the method, the user level node embedded representation and the characteristic level node embedded representation of the user in the social network are fused by using a gating mechanism, and the final node embedded representation z of each user in the social network can be obtained_iThe component node embedding vector matrix Z ═ (Z)₁,z₂,…,z_n)^T。

wherein

Is a sigmoid function;

And

representing users in a source social network

The node of (a) embeds the vector,

representing users in a target social network

An embedded vector space alignment module: based on the modules, node embedded vector matrixes of source social networks can be obtained respectively

Embedding vector matrix with nodes of target social network

Each row of the node vector matrix represents a node embedded representation corresponding to one user in the social network, and the whole node vector matrix is called an embedded vector space corresponding to the social network. In order to allow the two embedded vector spaces to be aligned effectively, we need to project the embedded vector space of the source social network and the embedded vector space of the target social network to a common vector space.

First, we define a mapping function M that maps the user node vector from one embedded vector space to another. Suppose we now project the source social network to the target social network to find a target node that matches the source social network. Given a partially known set of anchor links T as supervisory information, the objective function can be defined as:

wherein M is_s→t() representing a transition from a source social network to a target social networkAnd mapping function, the invention adopts multilayer perceptron to construct mapping function, and theta is weight parameter of multilayer perceptron. The objective function is to minimize the distance from the target user node after mapping the source user node in the pair of users with anchor link relationship to the target social network, so as to construct a classification model to predict whether any two users between different social networks have anchor links. Since users typically have only one active account in different social networking platforms, target user nodes closest to the source user node after the projection are selected herein to construct candidate anchor links.

Since user alignment issues between different social networks typically satisfy one-to-one matching constraints, i.e., the same user entity, there is at most only one active account in different social network platforms. Whereas, as shown in fig. 3(a), a one-way embedded vector space mapping may result in a one-to-many matching relationship between social networks, which is contrary to the actual network scenario. Therefore, the invention provides a bidirectional embedded vector space alignment strategy to ensure that a network alignment task between two social networks meets a one-to-one matching constraint relationship. Referring to fig. 3, the present invention will describe a specific implementation flow of the module with reference to a specific example, and the steps are as follows:

step 1: constructing a multi-layered perceptron model projected from a source social network to a target social network based on a known anchor link set T

Learning a weight parameter θ by minimizing a distance from a source user node in an anchor link to a corresponding target user node after projection to a target social network₁。

Step 2: based on learned multi-layer perceptron model

For each user node in the source social network

As shown in FIG. 3(a), first, the following steps are carried outIt projects to the target embedded vector space, then finds the target user node closest to the projected node in the target social network to form an anchor link with the source user node, and adds it to the candidate anchor link set

And step 3: constructing a multi-layered perceptron model projected from a target social network to a source social network based on a known anchor link set T

Learning a weight parameter θ by minimizing a distance from a target user node in an anchor link to a corresponding source user node after projection to the source social network₂。

And 4, step 4: based on learned multi-layer perceptron model

For each user node in a target social network

As shown in FIG. 3(b), it is first projected into the source embedded vector space, then the source user node closest to the node it is projected is found in the source social network to form an anchor link with the target user node, and it is added to the set of candidate anchor links

And 5: taking a candidate anchor link set A₁And A₂Is used as the final predicted anchor link set A ═ A₁∩A₂。

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the idea of the invention, also features in the above embodiments or in different embodiments may be combined, steps may be implemented in any order, and there are many other variations of the different aspects of the invention as described above, which are not provided in detail for the sake of brevity.

In addition, well known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures for simplicity of illustration and discussion, and so as not to obscure the invention. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the invention, and also in view of the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the present invention is to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the invention, it should be apparent to one skilled in the art that the invention can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.

While the present invention has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures, such as Dynamic RAM (DRAM), may use the discussed embodiments.

The embodiments of the invention are intended to embrace all such alternatives, modifications and variances that fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements and the like that may be made without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A network alignment method based on a double-layer graph attention neural network is characterized by comprising the following steps:

basic definition: the social network abstraction is a directed graph G ═ (V, E, X), where V ═ V_i1, N represents a set of user nodes in the social network, and N is the number of user nodes in the social network; e ═ E_i，j＝(v_i，v_j)|v_i∈V，v_je.V represents the set of relationships between users in a social network, e_i，j＝(v_i，v_j) Representing a user v_iAnd user v_jThere is an association between them; x ═ X_iI 1.. N } represents a set of feature vectors for all users, and v for each user node_iAll have a node feature vector x_iCorrespondingly, the feature vector can be extracted from the aspects of personal data, behaviors and network social structure information of the user node, and the two networks to be aligned are named as a source social network and a target social network without loss of generality and are respectively named as G^sAnd G^tRepresents;

for any two users from different social networks

And

we use

And

the same user is respectively in different social networks G^sAnd G^tAccount (2) of (1); the anchor link is two users in different social networksThe two anchor links share the same user account of the same social network;

Is shown in which

Representing a user account in the source social network,

representing a user account in a target social network; for two different social networks G^s＝(V^s，E^s，X^s) And G^t＝(V^t，E^t，X^t) Network alignment aims to discover a set of anchor-linked sets T between two social networks, where any element e 'in the set T'_ijE T denotes two user accounts in different social networks

And

an anchor link between;

and S4, solving the intersection to complete network alignment.

2. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 1, wherein the step of S2 includes the following steps: user v_iIs expressed as x_iExtracting a plurality of feature vectors of the user according to the network type and transversely stacking the feature vectors to generate an initialized feature vector representation of the user

Wherein d represents the dimension of the user initialized feature vector, d ', d ", d'" appearing hereinafter represent different dimensions respectively, and the initialized feature vectors of all users in the social network are constructed into a state matrix X, wherein each row is the feature vector of a specific user node: x ═ X₁，x₂，...x_N)^T。

3. The method according to claim 2, wherein if the network type is a topological network, the eigenvector of the user is initialized randomly in a random matrix manner, and the weight parameter of the random matrix is learned through a training phase of the two-layer graph attention neural network model.

4. The method of claim 2, wherein if the network type is an attribute network, vectorizing the user attribute in the following manner is performed: randomly initializing user data information such as a user name and the like in a word embedding mode to obtain a user name characteristic vector; mining the language style of the user from the long text information of the user by adopting a Doc2Vec model, and learning the text characteristic vector of the user; vector initialization is carried out on the user track information through spatial clustering to obtain a spatial feature vector of a user; and directly taking the user scoring and sign-in times as a characteristic dimension of the user, and carrying out vector initialization.

5. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 2, wherein the step of S2 includes the following steps:

6. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 5, wherein S2.1 comprises the following contents: using a learnable transformation matrix

Converting the input vector into a high-dimensional vector, namely:

Wherein

And

representing a user node v_iAnd v_jEmbedding vectors at layer l, W₂ ^(l)Representing the weight parameter of the l-th layer, "|" is a tandem operator and represents that two vectors are transversely spliced, LeakyReLU (·) is an activation function of a neuron, and a user node v is calculated_iWhen the neighborhood information is aggregated, the information contribution proportion from different neighbors adopts a softmax (·) function to normalize the user node and all the neighbor user nodes v_k∈N(v_i) The strength of the relationship between the two is calculated as follows:

wherein (-) is an activation function of a neuron, and a user-level node embedding vector h of each user in the social network can be obtained by performing linear aggregation of different influence weights on neighborhood information of the user_iThe user-level vector matrix M ═ (h) is formed₁，h₂，...，h_N)^T。

7. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 6, wherein S2.2 comprises the following contents: the user-level vector matrix M obtained in S2.1 is (h)₁，h₂，...，h_N)^TAs input, and taking into account the multidimensional attention between the features of any two user nodes in the social network, i.e. calculating an attention coefficient for each corresponding dimension of the two user node vectors,

suppose h_iAnd h_jDistribution represents two user nodes v in a social network_iAnd v_jThe embedded vectors of two user nodes, the relationship between the embedded vectors of two user nodes can be defined as:

f(h_i，h_j)＝W₅·tanh(W₄·h_i+W₃·h_j+b₂)+b₁

wherein W₃，W₄And

is a parameter matrix, b₁，

Is a bias term, tanh (-) is an activation function of a neuron,

using a feed-forward neural network to operate according to f (h)_i，h_j) Calculating the dependency relationship of any two user nodes on the characteristic level, and enabling beta_ijRepresenting a user node v_iAnd v_jThe attention coefficient vector of the feature layer of [ beta ]_ij]_kSimilarly, in order to compare the attention coefficients of the features of the corresponding dimensions between different attention coefficient vectors, the attention coefficient vectors of all the neighbors of the user are normalized according to the corresponding feature dimensions by using the softmax (·) function, and then:

calculating attention coefficient [ beta ] between each dimension of any two users_ij]_kThese attention coefficients can then be combined into an attention coefficient vector β between the two users according to the corresponding feature dimensions_ij＝([β_ij]₁，[β_ij]₂，...，[β_ij]_d″) The dimensions of the attention vector and the user node vector are the same, each dimension [ β [ ]_ij]_kCorresponding to the weight of influence of each dimension of the user node vector, [ beta ]_ij]_kThe larger, the two user nodes v are represented_iAnd v_jThe stronger the correlation of the features of the k-th dimension of (a),

finally, for any user node v in the social network_iThe method is characterized in that the embedded vectors of the neighbor users are weighted and linearly aggregated according to the learned attention coefficient vectors among different user nodes, and the method is different from the aggregation mode of the user layer attention mechanism, wherein the aggregation function of the feature layer attention mechanism is to aggregate neighborhood information by adopting a mode of element-by-element multiplication:

wherein

Represents twoThe element-by-element product of the vectors with the same shape is used to obtain a vector with the same shape, wherein (DEG) the vector is an activation function of a neuron, and the node embedded representation of the feature level of each user in the social network can be obtained by carrying out linear aggregation on the neighborhood information of the users according to the influence weights of different features

Forming a feature level vector matrix

8. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 7, wherein S2.3 comprises the following contents: embedding vector fusion submodule changes user level vector matrix M into (h)₁，h₂，...，h_N)^TAnd feature level vector matrix

The weight relationship vector between them is calculated as follows:

wherein

And

is a parameter matrix of the gated neural network,

wherein

Representing selective retention of node-embedded vectors at the user level,

according to the method, the user level node embedded representation and the characteristic level node embedded representation of the user in the social network are fused by using a gating mechanism, and the final node embedded representation z of each user in the social network can be obtained_iThe component node embedding vector matrix Z ═ (Z)₁，z₂，...，z_N)^T；

For any given pair of user nodes v in a social network_iAnd v_jThe node embedding vectors are respectively z_iAnd z_jThen the two nodesThe probability of an edge existing between can be expressed as:

wherein

Is a sigmoid function;

to avoid trivial solution, for each edge e that can be observed_i，j＝(v_i，v_j) We use a negative sampling technique to maximize the objective function, namely:

And

representing users in a source social network

The node of (a) embeds the vector,

representing users in a target social network

9. The method for network alignment based on the double-layer graph attention neural network as claimed in claim 8, wherein the step of S3 includes the following steps: based on the step of S2, obtaining a node embedded vector matrix of the source social network

Embedding vector matrix with nodes of target social network

Wherein each row of the node vector matrix represents a node embedded vector corresponding to one user in the social network, and the whole node vector matrix is also called as the embedded vector corresponding to the social networkEntering a vector space;