CN113434782B - Cross-social network user identity recognition method based on joint embedded learning model - Google Patents

Cross-social network user identity recognition method based on joint embedded learning model Download PDF

Info

Publication number
CN113434782B
CN113434782B CN202110718740.9A CN202110718740A CN113434782B CN 113434782 B CN113434782 B CN 113434782B CN 202110718740 A CN202110718740 A CN 202110718740A CN 113434782 B CN113434782 B CN 113434782B
Authority
CN
China
Prior art keywords
user
upg
representing
node
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110718740.9A
Other languages
Chinese (zh)
Other versions
CN113434782A (en
Inventor
王李冬
关佶红
常乐
曹世华
胡克用
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dayu Chuangfu Technology Co ltd
Original Assignee
Qianjiang College of Hangzhou Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qianjiang College of Hangzhou Normal University filed Critical Qianjiang College of Hangzhou Normal University
Priority to CN202110718740.9A priority Critical patent/CN113434782B/en
Publication of CN113434782A publication Critical patent/CN113434782A/en
Application granted granted Critical
Publication of CN113434782B publication Critical patent/CN113434782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a cross-social-network user identity recognition method based on a joint embedded learning model. Firstly, selecting candidate paired user pairs from two social networks by utilizing the similarity of user names and network structures; then, constructing a user pair network graph by taking all candidate paired user pairs as nodes; secondly, on the basis of the constructed UPG and the labeled user pair data, a labeled matched user information label, structure information and attribute information are fused to build a combined embedded learning model, and the model is designed into a deep neural network structure with 1 input and 2 outputs; and finally, performing minimum learning on the loss function of the combined embedded model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the parameters of the model after learning is finished, and judging whether the output is the same user or not. The method and the system can effectively predict whether two users from different networks are the same user, and play a vital role in related application of cross-social networks in commerce.

Description

Cross-social network user identity recognition method based on joint embedded learning model
Technical Field
The invention relates to the field of user relationship mining for social networks. In particular to a cross-social network user identity recognition method based on a joint embedded learning model.
Background
From early email, BBS, to today's Social Media Networks (SMNs), more and more users are becoming accustomed to daily interaction and information acquisition on Social networks. People often need to register as users of a different website in order to enjoy services provided by the website. It is a common phenomenon that a common user owns virtual accounts of multiple different social networking sites. Because each social network site is independent, the data information is not shared, and a uniform identity identifier is lacked on the network to uniquely identify the netizen, a plurality of social network site accounts belonging to the same netizen are not directly related. In order to obtain a complete image (profile) of a user, data of the user on different social networks needs to be integrated, which breaks through the association of user identities across social platforms, i.e., the identification of accounts of the user on multiple social networks. In recent years, social network identification methods based on representation learning have been prevalent, and researchers have begun to identify users on multiple social networks by using algorithms based on network embedding, however, the following problems still exist in the realization of the cross-social network user identification technology based on representation learning:
1. the existing expression-based learning method belongs to a supervised learning mode and an unsupervised learning mode, wherein the former needs a large amount of Labeled data (Labeled data), the Labeled data is difficult to obtain, and a large amount of manpower is consumed; the latter does not require labeling data, but the obtained effect is often unsatisfactory.
2. The accuracy of user identity recognition can be improved by comprehensively utilizing modal data such as attribute information of a user, structural information of a network, label information of the user and the like, but how to embed the information into a uniform vector space is a difficult problem;
3. the existing user identity correlation method based on representation learning usually splits a task into two steps of embedded learning and identity recognition of nodes, so that label information of a user cannot be effectively integrated.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a cross-social-network user identity association method based on a joint embedding model.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
step 1, aiming at social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all the candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users inAre respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a user-to-network graph UPG is constructed by taking the edge as a principle;
step 3, fusing labeled paired user information labels, structure information and attribute information to build a combined embedded learning model on the basis of the constructed user pair network graph UPG and labeled user pair data, and designing the combined embedded learning model into a deep neural network structure with 1 input and 2 outputs;
and 4, performing minimum learning on the loss function of the combined embedded learning model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user or not.
Further, the step 1 is specifically realized as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EASet of user relationships, X, representing social network AAA matrix of user attributes representing social network a,
Figure BDA0003136107610000021
representing user i in social network A; gB=(UB,EB,XB) Representing a social network B, and the rest parameters have similar meanings;
1-2, acquiring data of different social network platforms by using a crawler;
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure BDA0003136107610000022
User name n ofkAnd njCalculating the similarity according to a formula (1), and adding a user pair with the similarity larger than 0.8 into the candidate paired user pair set P;
Figure BDA0003136107610000023
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (d);
1-4, expanding neighbor nodes by taking each pair of users in the user pair set P as a seed user pair, selecting user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pair, adding the user pairs into the P, and setting different r values according to different data sets.
Further, the step 2 is specifically realized as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes; pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure BDA0003136107610000031
And
Figure BDA0003136107610000032
two nodes in the UPG have an edge between the two nodes if the following relationship exists between the two nodes;
Figure BDA0003136107610000033
wherein,
Figure BDA0003136107610000034
representing a user
Figure BDA0003136107610000035
A set of neighboring nodes.
Further, the step 3 is specifically realized as follows:
3-1, marking accurate mapping account numbers of the users in another network by text analysis and matching technology and combining manual judgment through partial user attribute information crawled by a crawler; the marked user matching pairs are used as the monitoring information of model training;
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure BDA0003136107610000036
And
Figure BDA0003136107610000037
the attributes of (1) are subjected to feature conversion through one-hot coding and are respectively recorded as
Figure BDA0003136107610000038
And
Figure BDA0003136107610000039
the attributes comprise a user name, a gender, a graduation institution and a geographic location;
3-3, constructing a joint embedded learning model for the network aiming at the constructed user; vector the attributes of two users in a node
Figure BDA00031361076100000310
Performing a splicing operation, note
Figure BDA00031361076100000311
And d isiAs input to a joint embedded learning model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label yiThe probability values are 0 and 1, wherein 1 represents that two users in the node are the same user, and 0 represents that two users in the node are different users; outputting the predicted probability value of the Context node by the right branch by using a skipgram model;
the mth layer of the skipgram model is represented as:
Figure BDA00031361076100000312
Figure BDA00031361076100000313
Figure BDA0003136107610000041
wherein δ (·) represents a sigmoid function, WmAnd bmWeights and biases parameters for m layers; formula (4) and formula (5) represent the m +1 th layers of the left and right branches, respectively;
Figure BDA0003136107610000042
the weights parameter representing the left branch of the (m + 1) th layer,
Figure BDA0003136107610000043
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure BDA0003136107610000044
and
Figure BDA0003136107610000045
and so on;
the last layer of the left branch of the model is designed as softmax layer, and the input of the layer is:
Figure BDA0003136107610000046
the last layer of the right branch of the model is designed as a softmax layer, and the input of the layer is as follows:
Figure BDA0003136107610000047
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
Further, the step 4 is specifically realized as follows:
4-1. the left branch of the joint embedding learning model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure BDA0003136107610000048
wherein
Figure BDA0003136107610000049
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure BDA00031361076100000410
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure BDA00031361076100000411
where δ (·) stands for sigmoid function, n ═ UUPGL, u 'represents node u'iThe context node of (a) is selected,
Figure BDA00031361076100000412
representing randomly selected t negative samples;
4-2, calculating parameters by adopting a mini-batch gradient descent method; setting the value of the left branch's batch b1Set to 200, the value of batch of the right branch b2Is 200; slave UUPGMiddle random sampling b1The labeled nodes, and calculate L(L)According to the gradient value of the parameter WmAnd bm
Figure BDA0003136107610000051
And
Figure BDA0003136107610000052
updating;
4-3 from UUPGMiddle random sampling b2A node and calculate
Figure BDA0003136107610000053
According to the gradient value of the parameter WmAnd bm
Figure BDA0003136107610000054
And
Figure BDA0003136107610000055
updating of (1);
4-4, returning to the step 4-2, and iterating for 100 times;
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding learning model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
The invention has the following beneficial effects:
the invention focuses on how to implement network embedding method, effectively integrates key factors of user identity identification, and realizes user identity identification on two social platforms. The cross-social platform identity association plays a crucial role in business cross-social network applications, such as user behavior analysis of multiple social networks, information service push of cross-social networks, cross-platform friend recommendation, network security governance of government offices and enterprises and the like. The method and the system can effectively predict whether two users from different networks are the same user, and play a vital role in the related application of cross-social networks in commerce.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a diagram of an example of a candidate paired user pair generation;
FIG. 3 is an exemplary diagram of a user generating a network graph;
FIG. 4 is an exemplary diagram of a joint embedding model;
Detailed Description
The invention will be further explained with reference to the drawings.
As shown in FIG. 1, the method for identifying the user identity across the social network based on the joint embedded learning model comprises the following steps:
step 1 for social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users in the system are respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a User Pair network Graph (UPG) is constructed by taking the edge as a principle;
step 3, on the basis of the constructed UPG and labeled user pair data (labeled user pairs), labeled paired user information labels, structure information and attribute information are fused to build a joint embedded learning model, and the model is designed into a deep neural network structure with 1 input and 2 outputs;
and 4, learning the loss function minimization of the combined embedded model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user or not.
The specific implementation process of the step 1 is as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EASet of user relationships, X, representing social network AAA matrix of user attributes representing social network a,
Figure BDA0003136107610000061
representing a user in social network A; gB=(UB,EB,XB) Representing social network B, the remaining parameters are similar in meaning. The invention utilizes web crawlers to microblog from the green sea (G)A) And known as (G)B) The new wave net comprises about 1.23 x 10 user nodes5The human network contains about 1.95 x 10 user data5. The user information common to the two networks includes user name and natureThe university, the graduation institution and the location.
And 1-2, data of different social network platforms are obtained by using a crawler.
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure BDA0003136107610000062
User name string nkAnd njCalculating the similarity according to the following formula, selecting the user pairs with the similarity more than 0.8 to be added into the candidate paired user pair set P,
Figure BDA0003136107610000063
Figure BDA0003136107610000064
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (2). For example, the user name "vio" and "violet" have a similarity of 0.5.
1-3, taking each pair of users in the P as a seed user pair to expand neighbor nodes, selecting the user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pair to be added into the P, and setting different r values according to different data sets. In this step, the present invention provides the example shown in FIG. 2. In FIG. 2, assume that
Figure BDA0003136107610000065
For user pairs with a username similarity greater than 0.8, let r be 2, according to which step it will be
Figure BDA0003136107610000071
Four user pairs are used as candidate pairing user pairs to be added into P, and finally
Figure BDA0003136107610000072
The specific implementation process of the step 2 is as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes. Pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure BDA0003136107610000073
And
Figure BDA0003136107610000074
for two nodes in a UPG, there is an edge between the two nodes if there is a relationship between them.
Figure BDA0003136107610000075
Wherein,
Figure BDA0003136107610000076
representing a user
Figure BDA0003136107610000077
A set of neighboring nodes.
The present invention provides step 2 with a user-to-network graph generated by the two social networks shown in FIG. 2, with the results shown in FIG. 3. According to step 2-1 and step 2-2, the generated user-to-network graph contains 6 nodes and 8 edges.
The specific implementation process of the step 3 is as follows:
and 3-1, marking the accurate mapping account of the user in another network by using partial user attribute information (such as account information of other platforms, mobile phones and mailboxes provided by the user in personal introduction) crawled by a crawler, text analysis and matching technology and manual judgment. And the marked user matching pairs are used as the monitoring information of model training.
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure BDA0003136107610000078
And
Figure BDA0003136107610000079
the attributes (user name, gender, college and geography) of (1) are subjected to feature conversion by one-hot coding and are respectively recorded as
Figure BDA00031361076100000710
And
Figure BDA00031361076100000711
specifically, aiming at the attribute of the user name, Chinese characters are unified into pinyin, capital letters are unified into lowercase letters, special characters such as underlines and the like are removed, and then a plurality of character substrings are intercepted from the user name
Figure BDA00031361076100000712
And performing one-hot coding on the character substring. For example, for a user name "violet", several character substrings { "vio", "iol", "ole", "let" } with a length of 3 may be truncated. And directly implementing one-hot coding according to the classifiable attributes such as gender, geographic position, graduation colleges and the like. For example, there are only two options in gender, "male" and "female", then the "male" attribute may be encoded as {10}, the "female" attribute may be encoded as {01}, and the remaining attributes are similar.
3-3. as shown in FIG. 4, a joint embedding model is built for the built user to the network. Attribute vectors (denoted as attribute vectors) for two users in a node
Figure BDA0003136107610000081
) Performing a splicing operation, note
Figure BDA0003136107610000082
And as input to the joint embedding model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label y representing predictioniProbability values of 0 and 1 (1 in the nodeThe two users in the node are the same user, 0 represents that the two users in the node are different users), and the probability value of the predicted Context node is output by the right branch by using a skipgram model. The mth layer of the model is represented as:
Figure BDA0003136107610000083
Figure BDA0003136107610000084
Figure BDA0003136107610000085
wherein δ (·) represents a sigmoid function, WmAnd bmAre the weights and biases parameters for the m layers. The latter two formulas represent the (m + 1) th layers of the left branch and the right branch respectively;
Figure BDA0003136107610000086
the weights parameter representing the left branch of the (m + 1) th layer,
Figure BDA0003136107610000087
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure BDA0003136107610000088
and
Figure BDA0003136107610000089
and so on.
The last layer of the left branch (node label prediction) of the model is designed as the softmax layer, and the inputs of the layer are:
Figure BDA00031361076100000810
the last layer of the right branch (node label prediction) of the model is designed as the softmax layer, and the inputs of the layer are:
Figure BDA00031361076100000811
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
The specific implementation process of the step 4 is as follows:
4-1. the left branch of the joint embedding model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure BDA00031361076100000812
wherein
Figure BDA00031361076100000813
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure BDA0003136107610000091
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure BDA0003136107610000092
where δ (·) stands for sigmoid function, n ═ UUPGL, u 'represent all points u'iThe context node of (a) is selected,
Figure BDA0003136107610000093
representing t negative samples chosen at random. The remaining parameters are referred to in step 3-3.
4-2, calculating parameters by adopting a mini-batch gradient descent method. Setting the value of the left branch's batch b1Set to 200, right-handedbatch value b2Is 200, randomly sampling b1Node with label, and calculate ^ L(L)By a gradient value of the parameter WmAnd bm
Figure BDA0003136107610000094
And
Figure BDA0003136107610000095
updating;
4-3 from UUPGMiddle sampling b2A node and calculate
Figure BDA0003136107610000096
By a gradient value of the parameter WmAnd bm
Figure BDA0003136107610000097
And
Figure BDA0003136107610000098
updating of (1);
4-4 returns to step 4-2 and iterates 100 times.
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
In step 4, taking the crawl of the user data of the Xinlang microblog and the known net user data as an example, 7325 user data pairs are extracted from the user data, wherein the 7325 user data pairs comprise 2213 labeled data, 30% of the labeled data are extracted to serve as model training data, and the rest are taken as test data. And aiming at the network pair, constructing a user-to-network diagram, constructing a joint embedded model according to the diagram 4, and performing parameter learning on the model. And (4) carrying out user identity correlation and calculating accuracy aiming at the test data pair, wherein the finally obtained accuracy reaches 84.7%.

Claims (3)

1. The cross-social network user identity recognition method based on the joint embedded learning model is characterized by comprising the following steps of:
step 1, aiming at social network GAAnd GBThe user selects candidate paired user pairs from the two social networks by utilizing the user name similarity and the network structure;
step 2, all the candidate pairing user pairs P ═ { P ═ PiIs node, if user is piTwo users in the system are respectively neighbors of the other party, then piAnd pjAn edge exists between the two, and a user-to-network graph UPG is constructed by taking the edge as a principle;
step 3, fusing labeled paired user information labels, structure information and attribute information to build a combined embedded learning model on the basis of the constructed user pair network graph UPG and labeled user pair data, and designing the combined embedded learning model into a deep neural network structure with 1 input and 2 outputs;
step 4, performing minimum learning on the loss function of the combined embedded learning model by using a random gradient descent algorithm, predicting the user pair to be predicted by using the model after learning is finished, and judging whether the output is the same user;
the step 3 is realized as follows:
3-1, marking accurate mapping account numbers of the users in another network by text analysis and matching technology and combining manual judgment through partial user attribute information crawled by a crawler; the marked user matching pairs are used as the monitoring information of model training;
3-2, every two users in the candidate pairing user set generated in the step 2-1 are paired
Figure FDA0003466563320000011
And
Figure FDA0003466563320000012
the attributes of (1) are subjected to feature conversion through one-hot coding and are respectively recorded as
Figure FDA0003466563320000013
And
Figure FDA0003466563320000014
the attributes comprise a user name, a gender, a graduation institution and a geographic location;
3-3, constructing a joint embedded learning model for the network aiming at the constructed user; vector the attributes of two users in a node
Figure FDA0003466563320000015
Performing a splicing operation, note
Figure FDA0003466563320000016
And d isiAs input to a joint embedded learning model; the output has a left branch and a right branch, and the left branch utilizes a multilayer perceptron model to output a node label yiThe probability values are 0 and 1, wherein 1 represents that two users in the node are the same user, and 0 represents that two users in the node are different users; outputting the predicted probability value of the Context node by the right branch by using a skipgram model;
the mth layer of the skipgram model is represented as:
Figure FDA0003466563320000021
Figure FDA0003466563320000022
Figure FDA0003466563320000023
wherein δ (·) represents a sigmoid function, WmAnd bmWeights and biases parameters for m layers; formula (4) and formula (5) represent the m +1 th layers of the left and right branches, respectively;
Figure FDA0003466563320000024
the weights parameter representing the left branch of the (m + 1) th layer,
Figure FDA0003466563320000025
the weights parameter of the right branch of the (m + 1) th layer is represented,
Figure FDA0003466563320000026
and
Figure FDA0003466563320000027
and so on;
the last layer of the left branch of the model is designed as softmax layer, and the input of the layer is:
Figure FDA0003466563320000028
the last layer of the right branch of the model is designed as a softmax layer, and the input of the layer is as follows:
Figure FDA0003466563320000029
where k represents the number of layers of the left branch implicit layer and k' represents the number of layers of the right branch implicit layer.
2. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 1, wherein the step 1 is implemented as follows:
1-1.GA=(UA,EA,XA) Representing social networks A, UASet of users representing social network A, EASet of user relationships, X, representing social network AAA matrix of user attributes representing social network a,
Figure FDA00034665633200000210
representing user i in social network A; gB=(UB,EB,XB) Representing a social network B, and the rest parameters have similar meanings;
1-2, acquiring data of different social network platforms by using a crawler;
1-3. pairs are from social network G respectivelyAAnd GBTo a user
Figure FDA00034665633200000211
User name n ofkAnd njCalculating the similarity according to a formula (1), and adding a user pair with the similarity larger than 0.8 into the candidate paired user pair set P;
Figure FDA00034665633200000212
wherein, lev (n)k,nj) Represents the Levenshtein distance, l (n)k) Representing a user name nkThe character length of (d);
1-4, expanding neighbor nodes by taking each pair of users in the user pair set P as a seed user pair, selecting user pairs with r common neighbors (known pairs) from the neighbor nodes of the seed user pairs, adding the user pairs into the P, and setting different r values according to different data sets;
the step 2 is realized as follows:
2-1.UPG=(UUPG,EUPG) Representing a user versus network diagram, UUPGRepresenting a set of nodes, EUPGRepresenting a set of relationships between nodes; pairing candidate users piNode as UPG and is recorded as u'i,u'i∈UUPG
2-2. suppose
Figure FDA0003466563320000031
And
Figure FDA0003466563320000032
two nodes in the UPG have an edge between the two nodes if the following relationship exists between the two nodes;
Figure FDA0003466563320000033
wherein,
Figure FDA0003466563320000034
representing a user
Figure FDA0003466563320000035
A set of neighboring nodes.
3. The method for identifying the user identity across the social network based on the joint embedded learning model according to claim 2, wherein the step 4 is implemented as follows:
4-1. the left branch of the joint embedding learning model is a multi-layer perceptual model, and the loss function of the branch is defined as:
Figure FDA0003466563320000036
wherein
Figure FDA0003466563320000037
Representing a tagged node in UPG, p (y)i|di) Represents given diUnder the condition of yiIs calculated as follows:
Figure FDA0003466563320000038
the right branch adopts a negative sampling mechanism to define a loss function as follows:
Figure FDA0003466563320000039
wherein δ (-) represents sigmoid function, n ═ UUPGL, u 'represents node u'iThe context node of (a) is selected,
Figure FDA00034665633200000310
representing randomly selected t negative samples;
4-2, calculating parameters by adopting a mini-batch gradient descent method; setting the value of the left branch's batch b1Set to 200, the value of batch of the right branch b2Is 200; slave UUPGMiddle random sampling b1The labeled nodes, and calculate L(L)According to the gradient value of the parameter WmAnd bm
Figure FDA0003466563320000041
And
Figure FDA0003466563320000042
updating;
4-3 from UUPGMiddle random sampling b2A node and calculate
Figure FDA0003466563320000043
According to the gradient value of the parameter WmAnd bm
Figure FDA0003466563320000044
And
Figure FDA0003466563320000045
updating of (1);
4-4, returning to the step 4-2, and iterating for 100 times;
4-5 input node u 'to be predicted in UPG'jCalculating according to the step 3-2 to obtain the attribute vectors of the two users in the node, and splicing the attribute vectors to obtain a vector djInputting the data into a joint embedding learning model, and calculating to obtain a node u 'to be predicted'jThe label of (1).
CN202110718740.9A 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model Active CN113434782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110718740.9A CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110718740.9A CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Publications (2)

Publication Number Publication Date
CN113434782A CN113434782A (en) 2021-09-24
CN113434782B true CN113434782B (en) 2022-03-01

Family

ID=77755095

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110718740.9A Active CN113434782B (en) 2021-06-28 2021-06-28 Cross-social network user identity recognition method based on joint embedded learning model

Country Status (1)

Country Link
CN (1) CN113434782B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114663245A (en) * 2022-03-16 2022-06-24 南京信息工程大学 Cross-social network identity matching method
CN114817757B (en) * 2022-04-02 2023-07-21 广州大学 Cross-social network virtual identity association method based on graph rolling network
CN116776193A (en) * 2023-05-17 2023-09-19 广州大学 Method and device for associating virtual identities across social networks based on attention mechanism

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108152A1 (en) * 2012-10-12 2014-04-17 Google Inc. Managing Social Network Relationships Between A Commercial Entity and One or More Users
CN109753602B (en) * 2018-12-04 2020-12-25 中国科学院计算技术研究所 Cross-social network user identity recognition method and system based on machine learning
CN110347932B (en) * 2019-06-04 2021-11-23 中国科学院信息工程研究所 Cross-network user alignment method based on deep learning

Also Published As

Publication number Publication date
CN113434782A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN113434782B (en) Cross-social network user identity recognition method based on joint embedded learning model
CN110097125B (en) Cross-network account association method based on embedded representation
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
CN106295796B (en) entity link method based on deep learning
CN108268643A (en) A kind of Deep Semantics matching entities link method based on more granularity LSTM networks
CN109753602B (en) Cross-social network user identity recognition method and system based on machine learning
CN109857871B (en) User relationship discovery method based on social network mass contextual data
WO2018112696A1 (en) Content pushing method and content pushing system
CN112988917B (en) Entity alignment method based on multiple entity contexts
CN107391542A (en) A kind of open source software community expert recommendation method based on document knowledge collection of illustrative plates
CN112084373B (en) Graph embedding-based multi-source heterogeneous network user alignment method
CN113628059B (en) Associated user identification method and device based on multi-layer diagram attention network
CN110472226A (en) A kind of network security situation prediction method and device of knowledge based map
CN113095948B (en) Multi-source heterogeneous network user alignment method based on graph neural network
CN112884045B (en) Classification method of random edge deletion embedded model based on multiple visual angles
CN113806630A (en) Attention-based multi-view feature fusion cross-domain recommendation method and device
CN109960755B (en) User privacy protection method based on dynamic iteration fast gradient
CN109492027B (en) Cross-community potential character relation analysis method based on weak credible data
CN110569355B (en) Viewpoint target extraction and target emotion classification combined method and system based on word blocks
CN112749566B (en) Semantic matching method and device for English writing assistance
CN115910232A (en) Multi-view drug pair response prediction method, device, equipment and storage medium
CN116049527A (en) Social network specific target account mining method oriented to military field
Ma et al. Friend closeness based user matching cross social networks
CN113343100B (en) Smart city resource recommendation method and system based on knowledge graph
CN113283243B (en) Entity and relationship combined extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230726

Address after: Room 801, 85 Kefeng Road, Huangpu District, Guangzhou City, Guangdong Province

Patentee after: Guangzhou Dayu Chuangfu Technology Co.,Ltd.

Address before: Hangzhou City, Zhejiang province 310036 Xiasha Higher Education Park forest Street No. 16

Patentee before: HANGZHOU NORMAL UNIVERSITY QIANJIANG College