CN107609469B - Social network associated user mining method and system - Google Patents

Social network associated user mining method and system Download PDF

Info

Publication number
CN107609469B
CN107609469B CN201710633081.2A CN201710633081A CN107609469B CN 107609469 B CN107609469 B CN 107609469B CN 201710633081 A CN201710633081 A CN 201710633081A CN 107609469 B CN107609469 B CN 107609469B
Authority
CN
China
Prior art keywords
user
social network
similarity
attribute
attributes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710633081.2A
Other languages
Chinese (zh)
Other versions
CN107609469A (en
Inventor
周小平
赵吉超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN201710633081.2A priority Critical patent/CN107609469B/en
Publication of CN107609469A publication Critical patent/CN107609469A/en
Application granted granted Critical
Publication of CN107609469B publication Critical patent/CN107609469B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a social network associated user mining method and a social network associated user mining system, which can improve the accuracy, recall rate and operation efficiency of associated user mining in a big data environment. The method comprises the following steps: calculating the similarity of the user attributes of the users of the first social network and the second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix; adopting tensor products to form a user relationship fusion matrix of the first social network and the second social network; converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector; and determining the associated users of the first social network and the second social network by solving the associated user mining model.

Description

Social network associated user mining method and system
Technical Field
The invention relates to the field of associated user mining, in particular to a social network associated user mining method and system.
Background
Large social networks are the core subject of social networking research today as well as in the future. The fusion of large-scale social networks has important value and significance for various researches on the social networks. The mining of associated users facing large-scale social networks is an important research content of social network convergence. At present, the research progress of mining associated users for large social network convergence can be summarized as follows:
(1) mining the associated users from the user attributes (including user behaviors) is the most studied and most effective method at present. However, in a large social network, due to the similarity, sparsity, false positives and inconsistency of user attributes, the method for mining the associated users by simply using the user attributes is not robust enough, is easy to be attacked by malicious users, and the recall rate can be further improved;
(2) the related user mining research based on the user relationship is mostly in the research field of de-anonymization. In the similar or same graphs, the method can more accurately mine the nodes with higher node degree, but the feasibility of directly applying the method to the mining of the associated users is poor. In addition, in the prior related user mining research based on user relationship, most mining methods need seed users; however, in many social networks, the acquisition of seed users is increasingly difficult; at present, the mining modeling research aiming at the associated users of seedless users is few;
(3) the research of fusing the user attributes and the user relations to mine accurate and comprehensive associated users is less;
(4) aiming at a large social network, the establishment of rapid associated user mining modeling research is less.
Disclosure of Invention
Aiming at the defects in the prior art, the embodiment of the invention provides a social network associated user mining method and system.
On one hand, the embodiment of the invention provides a social network associated user mining method, which comprises the following steps:
s1, calculating the similarity of the user attributes of the users of the first social network and the second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute;
s2, forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
s3, converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector;
and S4, determining the associated users of the first social network and the second social network by solving the associated user mining model.
On the other hand, an embodiment of the present invention provides a social network associated user mining system, including:
the fusion unit is used for calculating the similarity of user attributes of users of a first social network and a second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute;
the forming unit is used for forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
the construction unit is used for converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector;
and the solving unit is used for determining the associated users of the first social network and the second social network by solving the associated user mining model.
According to the social network associated user mining method and system provided by the embodiment of the invention, the user relationship is integrated into the user attributes, and the associated user mining model is constructed, so that the model can be prevented from being attacked by malicious users, and the accuracy of the model is improved; the user attributes are blended into the user relationship, users with low degrees can be identified more accurately, and the accuracy and recall rate of the related user mining model are improved. Therefore, according to the scheme, the user attributes and the user relations are fused, so that on one hand, the construction of the associated user mining model which is not easy to attack is facilitated; on the other hand, the accuracy and the recall rate of the model are improved.
Drawings
FIG. 1 is a flow chart illustrating an embodiment of a social network associated user mining method according to the present invention;
FIG. 2 is a diagram of an overall research framework of another embodiment of a social network associated user mining method of the present invention;
FIG. 3 is a flowchart of an avatar processing method according to yet another embodiment of the social network associated user mining method of the present invention;
fig. 4 is a schematic structural diagram of an embodiment of a social network associated user mining system according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments, but not all embodiments, of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the embodiment discloses a social network associated user mining method, which includes:
s1, calculating the similarity of the user attributes of the users of the first social network and the second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute;
s2, forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
s3, converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector;
and S4, determining the associated users of the first social network and the second social network by solving the associated user mining model.
According to the social network associated user mining method provided by the embodiment, the user relationship is integrated into the user attributes, and the associated user mining model is constructed, so that the model can be prevented from being attacked by malicious users, and the accuracy of the model is improved; the user attributes are blended into the user relationship, users with low degrees can be identified more accurately, and the accuracy and recall rate of the related user mining model are improved. Therefore, according to the scheme, the user attributes and the user relations are fused, so that on one hand, the construction of the associated user mining model which is not easy to attack is facilitated; on the other hand, the accuracy and the recall rate of the model are improved.
The social network associated user mining method of the present invention is explained in detail below.
The invention simulates the comprehensive user attributes and user relationships and accurately, comprehensively and quickly digsThe overall research framework of the method for mining the associated users among the large-scale social networks to realize the tight integration of the large-scale social networks is shown in fig. 2. In FIG. 3, a and
Figure BDA0001364291790000041
are respectively from social networks SNAAnd SNBTwo users of (2). Firstly, researching and constructing a user attribute utility evaluation system for associated user mining, and comprehensively analyzing various user attributes; and then, adopting different similarity calculation models aiming at different user attributes, and finally completing the fusion of the similarity of the heterogeneous user attributes. On the basis, the invention is supposed to fuse the user relations among different social networks through tensor products to form a user relation fusion matrix; and then converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model and method by fusing the user relationship fusion matrix and the user attribute similarity vector. And finally, exploring a method for improving the mining efficiency of the associated user from two aspects of an approximate solution method and a parallel computing method.
1. User attribute utility evaluation system oriented to associated user mining
SN for two social networks to be mergedAAnd SNBAccording to the method, a user attribute utility evaluation system is constructed from four aspects of consistence, consistency, false appearance, identifiability and the like, and then the utility of the user attribute in the associated user mining modeling is comprehensively analyzed.
The consistency means that the attribute information should be dense enough among social networks, that is, most users contain the attribute information value, such as user name, head portrait, etc. Only attribute information dense enough is suitable for associated user mining. The consistency of any attribute p can be defined as
Figure BDA0001364291790000051
Figure BDA0001364291790000052
In the case of (a) and (b)
Figure BDA0001364291790000053
Is not null, i.e. the probability that
Figure BDA0001364291790000054
Consistency means that users tend to use the same or similar attribute values, such as user names, etc., in different social networks. Attributes that are poorly consistent will not be suitable for correlated user mining across social networks. The consistency of any attribute p can be defined as
Figure BDA0001364291790000055
And none of its attributes p is empty, a and
Figure BDA0001364291790000056
is greater than a set threshold tcIs a probability of
Figure BDA0001364291790000057
Wherein,
Figure BDA0001364291790000058
in order for the user a to be a,
Figure BDA0001364291790000059
similarity values on the attribute p.
Ghosting refers to whether a user assigns a value to an attribute that does not fit the true situation. If a large number of user attributes with false values exist, the mining error rate of the associated users is increased, and the mining method is not suitable for mining the associated users. The ghosting of any attribute p can be defined as
Figure BDA00013642917900000510
And none of its attributes p is empty, a and
Figure BDA00013642917900000511
of attribute pThe similarity is less than a set threshold tfIs a probability of
Figure BDA00013642917900000512
Legibility refers to whether the attribute value can more clearly distinguish a user from other users. City, etc. is not a well-defined attribute. The perceptibility of any attribute p can be defined as
Figure BDA00013642917900000513
Figure BDA00013642917900000514
And none of its attributes p is empty, a and
Figure BDA00013642917900000515
is less than a set threshold tiIs a probability of
Figure BDA00013642917900000516
According to the definition of the consistency, false appearance and legibility, the following steps are carried out: if the classification rule of a classifier (algorithm) is:
Figure BDA0001364291790000061
when there is
Figure BDA0001364291790000062
Then the recall of the classifier is
recall=Dp·Cp (5)
With an accuracy of precision of
precision≤recall/(recall+1-Ip) (6)
According to the analysis, the user attributes are comprehensively analyzed from the consistency, false negatives and identifiability, and the attributes which are dense, strong in consistency, low in false negatives and high in identifiability are selected for associated user mining, so that the method has an important effect on improving the recall rate and accuracy of the associated user mining. Therefore, the invention aims to perform empirical analysis on the user attribute characteristics from the four aspects on real data sets (such as a human network and a Sina microblog data set). The attributes of sparseness, poor consistency, strong false positives, and poor legibility will not be suitable for mining of associated users.
For the attributes used for mining the associated users after screening, the roles of the attributes on mining the associated users are different. Therefore, for any screening attribute i, the invention intends to adopt the uniform utility index CPiA description will be given. Any user pair with M (M is sufficiently large) groups of attributes i with the similarity larger than a given threshold value is selected, if C is availableiThe group is the associated user, then
CPi=Ci/M (7)
Apparently, CPiHigher value attributes, which will improve the effectiveness of the associated user mining. Associating user C in social network dataset of seedless usersiThe number of the cells can be judged by manual identification.
2. Associated user mining model and method research integrating user attributes and user relationships
(1) Attribute similarity calculation model
Different attributes are different in the similarity calculation method. The invention adopts different similarity calculation methods according to different attribute characteristics.
A user name. The user name is an attribute which is ubiquitous in the social network, strong in consistency and high in identifiability. Therefore, a great deal of research work is currently carried out using user names for association modeling. Liu et al discovered associated users by Alias-differentiation (Alias-differentiation) based on user names, which is currently the best model for mining associated users based on user names. The method is used for mining the similarity of the user names according to various behavior characteristics of the user names, and the similarity of the user names is calculated by the method.
An image class attribute. The image class attribute includes a user avatar. The similarity calculation method will be described by taking the user avatar as an example. A user avatar is an attribute that facilitates mining of associated users. However, the device is not suitable for use in a kitchenHowever, the user's head portrait also has a lot of noise. The invention is intended to consider only images with faces for associated user mining. Respectively from social networks SNAAnd SNBUser a and
Figure BDA0001364291790000073
firstly, detecting whether the head portrait is an image or not through an image detector; if the head portrait is an image, further detecting whether the head portrait contains a human face; finally, through face feature extraction, the output value of the classifier is [0,1 ]]The face similarity of the section (the head portrait processing flowchart is shown in fig. 3). On image detection, face detection, feature extraction and similarity calculation classifiers, the invention aims to directly adopt the face recognition invention of the university of Carnegie Mellon (http:// www.briancbecker.com/bcms/site/proj/facerec/fbextract. html) to carry out image processing.
Location class attribute: people often publish UGC at home, office, frequent cafe, etc.; thus, often more prominent features of the release location of the UGC are available for mining by the associated user. If the position is taken as longitude and latitude coordinates, the invention calculates the user a and the user b from three aspects
Figure BDA0001364291790000074
The position similarity of (2): the number of the position areas is the same; cosine similarity value of the position area; ③ average distance of position.
The time class attribute is as follows: the time distribution of UGC can also reflect the use habit of people to the social network. The invention is intended to calculate the user a and
Figure BDA0001364291790000071
time similarity of (2): the number of the same time period; ② cosine similarity value of time period.
Other text class attributes: for users a and
Figure BDA0001364291790000072
the invention aims to adopt TF-IDF model to construct the space vector of the word bag of the text attributeThen, a similarity value of the text attribute is calculated using the cosine similarity.
(2) Attribute similarity fusion
The existing better associated user mining method based on user attributes adopts a machine learning method to judge, and multi-attribute fusion is not needed. In general, the linear weighting method is a conventional method of fusing multi-attribute similarities. If users a and
Figure BDA0001364291790000081
by analyzing selected n attributes for associated user mining, a and
Figure BDA0001364291790000082
has a similarity value of
Figure BDA0001364291790000083
Wherein alpha isiIs the weight of the similarity of the attribute i. However, in the related user mining, a and are not required to be large in similarity of all attributes
Figure BDA0001364291790000084
Are associated; when the value of one or some of the attributes is large, a and
Figure BDA0001364291790000085
there is a very large probability of correlation. Therefore, the invention is intended to adopt the Logit regression model to fuse the user attribute similarity, namely
Figure BDA0001364291790000086
Weight α due to similarity of attribute iiUtility index CP with attribute iiThere is a direct relationship. Thus, α in the formulae (8) and (9)iThe calculation formula of (2) is as follows:
Figure BDA0001364291790000087
where it is a very small positive value to prevent overfitting.
(3) User relationship fusion modeling
Friendship is ubiquitous attribute information in social networks. In an attention-type social network (e.g., microblog, etc.), a friend relationship refers to a two-way attention relationship. Compared with the one-way concern relationship, the method is less prone to counterfeiting and has higher stability. Therefore, the invention is about to adopt the friend relationship to carry out the related user mining. Unless otherwise stated, the user relationships in the following description refer to friend relationships.
In a social network, user relationships are paths for information propagation; generally, people use graph models to build social network models, and use adjacency matrices to describe user relationships in the social network. If A represents SNAIn the friend social network model, a is a symmetric matrix. A' is a normalized matrix of A, i.e.
Figure BDA0001364291790000088
Wherein | A | is SNAAnd a' can be regarded as a score transfer matrix in a single social network. I.e. y is a' x, and y and x are column vectors with length | a |, each matrix vector multiplication operation can be considered as: for any node i, it will push x to its neighbor nodesiFractional value of/d (i), xiIs the score value of node i, and d (i) is the degree of node i.
In a single social network, it is generally considered that "one node is important because its neighbor nodes are important. On the basis, people use x as a' x to mine important nodes through continuous iteration (PageRank algorithm). Similarly, in network-based associative user mining, it is generally considered that "a pair of users whose neighbors are all associative users is also an associative user". To be SNAAnd SNBThe invention adopts tensor product to form a user relation fusion matrix of two social networks, namely
Figure BDA0001364291790000091
Wherein A 'and B' are respectively SNAAnd SNBA normalized matrix of adjacency matrices. In the tensor of the order of 2,
Figure BDA0001364291790000092
also known as kronecker product.
(4) Consistent modeling of user attributes and user relationships
Consistent modeling of user attributes and user relationships is one of the key problems that the present invention is intended to solve. If the matrix S represents the social network SNAAnd SNBIn the relevance matrix of the users, the probability that two users corresponding to the items with higher relevance in the S are the relevant users is higher. S ═ vec (S) is a column-wise expanded vector of S, that is, vec (·) is a vectorization operator, then the fusion (11) can construct an association model based on user relationships as follows:
Figure BDA0001364291790000093
on this basis, if the matrix P represents SNAAnd SNBThe similarity of the user attributes, p ═ vec (p), and p' is a normalized matrix of p, i.e.
Figure BDA0001364291790000094
The invention adopts a linear weighting method to fuse the user attribute and the user relation, namely
Figure BDA0001364291790000095
It is apparent that equation (13) is a typical Sylvester equation. And solving s in the formula (13), wherein the user pair corresponding to the item of which the median value of s is greater than the set threshold is the associated user to be mined. The solution of Sylvester's equation can be found in Grasedick L.Existence of a low rank or
Figure BDA0001364291790000096
‐matrix approximant to the solution of a Sylvester equation[J]The Numerical Linear Algebra with Applications,2004,11(4): 371-.
In addition, the prior knowledge is not needed in the formula (13), so that the problem that the seed users are difficult to obtain is solved, and the dependence of related user mining on the quality and quantity of the seed users is also avoided.
(5) Associated user mining method
Since s is a normalized matrix, s · 1 ═ I, where 1 is the full 1 vector and I is the identity matrix. Equation (13) can be converted into:
Figure BDA0001364291790000101
solving for s, essentially solving for a matrix
Figure BDA0001364291790000102
Which can be solved in an iterative manner, i.e.
Figure BDA0001364291790000103
When s converges, it is the solution of s. And the user pair corresponding to the item with higher numerical value in s is the SNAAnd SNBIs potentially associated with the user. Associating users may have problems with one-to-one associations and many-to-many associations. The solution proposed by the invention is as follows:
the first one-to-one association. Adopting a similar value priority strategy for processing, namely sequencing according to the numerical value in s from high to low, and taking out an item corresponding to the maximum value every time, wherein two users corresponding to the item are regarded as associated users;
② many-to-many association. And all the user pairs corresponding to the items with the values higher than the set threshold value in the s are the associated users.
The invention forms the above processes into an algorithm and proves the convergence, convergence and s of the method theoretically(0)The influence of the independence and the parameter beta on the convergence.
3. Approximation approximate solution and parallel computing method research of associated user mining model
(1) Approximate solution
Approximate solution of the associated user mining model is one of the key scientific problems to be solved by the invention. The pair formula (13) is developed without difficulty
Figure BDA0001364291790000104
From the properties of the kronecker product, it can be derived
Figure BDA0001364291790000111
When n → ∞ S(n)And S(0)Is irrelevant. Therefore, can order S(0)P'. Then, matrix low-rank decomposition is carried out on the P', such as SVD and the like, and the execution efficiency of the algorithm is improved. Specifically, if the SVD decomposition matrix P' is adopted, that is
Figure BDA0001364291790000112
Wherein r is<<min (| A |, | B |) is the rank of the matrix P',iin the form of the singular values of the signals,i>0,uiand viIs composed ofiThe corresponding vector. The invention proposes to bring the formula (18) into the formula (17), construct an approximate solving method of S, form an algorithm and theoretically analyze the complexity of the algorithm efficiency.
(2) Parallel implementation
To reduce the runtime of the associated user mining model, distributed computing is the most commonly used method of use. The invention mainly relates to the multiplication operation of a matrix in the aspect of associated user mining. HAMA (https:// HAMA. apache. org /) is a parallel computing framework based on BSP (bulk synchronization parallel) computing technology, and is used for a large number of scientific computations (such as matrices, graph theory, networks, etc.). The maximum advantage of the BSP calculation technology is that iteration is accelerated, and a feasible solution can be quickly obtained in solving the problems of the minimum path and the like. Meanwhile, HAMA provides simple programming, such as a flexible model and a traditional message passing model, and is compatible with a plurality of distributed file systems, such as HDFS, HBase and the like. Researchers can use existing Hadoop clusters for HAMA BSP. In view of the fact that HAMA is relatively mature matrix operation, the HAMA method is used as a parallel computing frame, and experimental verification under big data is carried out.
Referring to fig. 4, the present embodiment discloses a social network associated user mining system, including:
the system comprises a fusion unit 1, a processing unit and a processing unit, wherein the fusion unit 1 is used for calculating the similarity of user attributes of users of a first social network and a second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute;
the forming unit 2 is used for forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
the construction unit 3 is used for converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector;
and the solving unit 4 is used for determining the associated users of the first social network and the second social network by solving the associated user mining model.
According to the social network associated user mining system provided by the embodiment, the user relationship is integrated into the user attributes, and the associated user mining model is constructed, so that the model can be prevented from being attacked by malicious users, and the accuracy of the model is improved; the user attributes are blended into the user relationship, users with low degrees can be identified more accurately, and the accuracy and recall rate of the related user mining model are improved. Therefore, according to the scheme, the user attributes and the user relations are fused, so that on one hand, the construction of the associated user mining model which is not easy to attack is facilitated; on the other hand, the accuracy and the recall rate of the model are improved.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (8)

1. A social network associated user mining method is characterized by comprising the following steps:
s1, calculating the similarity of the user attributes of the users of the first social network and the second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute; users a and b from the first social network and the second social network, respectively
Figure FDA0002555829980000017
Has a similarity of user attributes of
Figure FDA0002555829980000011
Wherein alpha isiIs the weight of the similarity of the user attribute i,
Figure FDA0002555829980000012
CPithe utility index of the user attribute i is a minimum positive value, n is the number of the user attributes,
Figure FDA0002555829980000013
is the user a and
Figure FDA0002555829980000014
similarity of user attributes i;
s2, forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
s3, converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector; the associated user mining model is as follows:
Figure FDA0002555829980000015
wherein A 'and B' are normalized matrices of adjacency matrices A and B of the first and second social networks, respectively,
Figure FDA0002555829980000016
is a kronecker product, s is vec (S), s is a matrix of the relevance of the users in the first social network and the second social network, beta is a weight, and beta belongs to [0,1 ]]P' is a standardized matrix of P, P is vec (P), P is the user attribute similarity matrix, and vec (·) is a vectorization operator;
and S4, determining the associated users of the first social network and the second social network by solving the associated user mining model.
2. The method of claim 1, wherein a and a are from users in the first social network and the second social network, respectively
Figure FDA0002555829980000021
The similarity calculation method of the image class attribute comprises the following steps:
detecting the user a and the user b by an image detector
Figure FDA0002555829980000022
Whether the head portrait of (1) is an image;
at the user a and
Figure FDA0002555829980000023
when the head portrait of (2) is an image, the user a and the user b are detected
Figure FDA0002555829980000024
Whether the head portrait contains a human face;
at the user a and
Figure FDA0002555829980000025
when the head portrait of (2) contains a face, for the user a and
Figure FDA0002555829980000026
the head portrait is subjected to face feature extraction, and the extracted face features are input into a preset classifier to obtain the user a and the user b
Figure FDA0002555829980000027
The head portrait of (1) and using the similarity as the user a and
Figure FDA0002555829980000028
similarity of image class attributes.
3. The method of claim 1, wherein the similarity of the location class attribute is calculated based on the number of identical user content distribution location areas, the cosine similarity value of the user content distribution location areas, and the average distance of the user content distribution locations.
4. The method of claim 1, wherein the similarity of the time class attributes is calculated based on the number of identical user content distribution time periods and a cosine similarity value of the user content distribution time periods.
5. The method of claim 1, wherein a and a are from users in the first social network and the second social network, respectively
Figure FDA0002555829980000029
The method for calculating the similarity of the text attributes comprises the following steps:
construction of the application by using TF-IDF modelA and
Figure FDA00025558299800000210
based on the bag-of-words space vector, using cosine similarity to calculate the user a and
Figure FDA00025558299800000211
similarity of text class attributes.
6. The method of claim 1, wherein the user relationship fusion matrix C of the first social network and the second social network is
Figure FDA00025558299800000212
Wherein A 'and B' are normalized matrices of adjacency matrices A and B of the first and second social networks, respectively,
Figure FDA00025558299800000213
is kronecker product.
7. The method of claim 1, wherein the associative user mining model is solved using an approximate solution method and a HAMA framework.
8. A social networking related user mining system, comprising:
the fusion unit is used for calculating the similarity of user attributes of users of a first social network and a second social network to be fused, and fusing the similarity of the user attributes to obtain a user attribute similarity matrix, wherein the user attributes comprise a user name, an image attribute, a position attribute, a time attribute and a text attribute; users a and b from the first social network and the second social network, respectively
Figure FDA0002555829980000031
Of user attributesSimilarity is as
Figure FDA0002555829980000032
Wherein alpha isiIs the weight of the similarity of the user attribute i,
Figure FDA0002555829980000033
CPithe utility index of the user attribute i is a minimum positive value, n is the number of the user attributes,
Figure FDA0002555829980000034
is the user a and
Figure FDA0002555829980000035
similarity of user attributes i;
the forming unit is used for forming a user relationship fusion matrix of the first social network and the second social network by adopting tensor products;
the construction unit is used for converting the user attribute similarity matrix into a user attribute similarity vector by adopting a vectorization operator, and constructing an associated user mining model by fusing the user relationship fusion matrix and the user attribute similarity vector; the associated user mining model is as follows:
Figure FDA0002555829980000036
wherein A 'and B' are normalized matrices of adjacency matrices A and B of the first and second social networks, respectively,
Figure FDA0002555829980000037
is a kronecker product, s is vec (S), s is a matrix of the relevance of the users in the first social network and the second social network, beta is a weight, and beta belongs to [0,1 ]]P' is a normalized matrix of P, P ═ vec (P), P is the user attribute similarityMatrix, vec (·) is vectorization operator;
and the solving unit is used for determining the associated users of the first social network and the second social network by solving the associated user mining model.
CN201710633081.2A 2017-07-28 2017-07-28 Social network associated user mining method and system Expired - Fee Related CN107609469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710633081.2A CN107609469B (en) 2017-07-28 2017-07-28 Social network associated user mining method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710633081.2A CN107609469B (en) 2017-07-28 2017-07-28 Social network associated user mining method and system

Publications (2)

Publication Number Publication Date
CN107609469A CN107609469A (en) 2018-01-19
CN107609469B true CN107609469B (en) 2020-12-04

Family

ID=61059633

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710633081.2A Expired - Fee Related CN107609469B (en) 2017-07-28 2017-07-28 Social network associated user mining method and system

Country Status (1)

Country Link
CN (1) CN107609469B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108494849A (en) * 2018-03-20 2018-09-04 石家庄正和网络有限公司 A kind of accurate method for pushing in recycling website and system
CN110839242B (en) * 2018-08-17 2023-07-04 ***通信集团广东有限公司 Abnormal number identification method and device
CN109685125A (en) * 2018-12-14 2019-04-26 大连海事大学 Daily behavior feature mining and calculation method based on frequent Sensor Events sequence
CN110188148B (en) * 2019-05-23 2021-02-02 北京建筑大学 Entity identification method and device facing multimode heterogeneous characteristics
CN110837598B (en) * 2019-11-11 2021-03-19 腾讯科技(深圳)有限公司 Information recommendation method, device, equipment and storage medium
CN111738817B (en) * 2020-05-15 2022-12-23 苏宁金融科技(南京)有限公司 Method and system for identifying risk community
CN112383510B (en) * 2020-10-23 2022-10-11 北京易观智库网络科技有限公司 Method and device for uniquely identifying user association

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645210B2 (en) * 2010-05-17 2014-02-04 Xerox Corporation Method of providing targeted communications to a user of a printing system
US20140172608A1 (en) * 2012-12-13 2014-06-19 Science Media, Llc System, method, and computer program product for placing an order via a social communications network
CN106372072B (en) * 2015-07-20 2019-11-01 北京大学 A kind of recognition methods of location-based mobile agency meeting network user's relationship
US10409823B2 (en) * 2015-12-29 2019-09-10 Facebook, Inc. Identifying content for users on online social networks
CN106339948A (en) * 2016-08-26 2017-01-18 微梦创科网络科技(中国)有限公司 Associated user mining method and device based on social network
CN106649657B (en) * 2016-12-13 2020-11-17 重庆邮电大学 Social network oriented tensor decomposition based context awareness recommendation system and method

Also Published As

Publication number Publication date
CN107609469A (en) 2018-01-19

Similar Documents

Publication Publication Date Title
CN107609469B (en) Social network associated user mining method and system
Kim et al. A review of dynamic network models with latent variables
CN108287864B (en) Interest group dividing method, device, medium and computing equipment
Papalexakis et al. Tensors for data mining and data fusion: Models, applications, and scalable algorithms
Wang et al. NEIWalk: Community discovery in dynamic content-based networks
CN106844665B (en) Thesis recommendation method based on reference relation distributed expression
Li et al. Location inference for non-geotagged tweets in user timelines
CN108304380A (en) A method of scholar&#39;s name disambiguation of fusion academic
US10942939B2 (en) Systems and methods for unsupervised streaming feature selection in social media
Kong et al. Entity matching across multiple heterogeneous data sources
Ran et al. MGAT-ESM: Multi-channel graph attention neural network with event-sharing module for rumor detection
Li et al. Multi-layer network community detection model based on attributes and social interaction intensity
Wu et al. A Tensor CP decomposition method for clustering heterogeneous information networks via stochastic gradient descent algorithms
Marshall et al. A neural network approach for truth discovery in social sensing
Zhu et al. A hybrid time-series link prediction framework for large social network
CN116450938A (en) Work order recommendation realization method and system based on map
Stanhope et al. Group link prediction
Zhao Utilizing citation network structure to predict citation counts: A deep learning approach
Liu et al. UGCC: Social media user geolocation via cyclic coupling
Bohra et al. Popularity Prediction of Social Media Post Using Tensor Factorization.
Mankad et al. Discovery of path-important nodes using structured semi-nonnegative matrix factorization
Chu et al. Noise-aware network embedding for multiplex network
Moussaoui et al. Clustering social network profiles using possibilistic c-means algorithm
Oyama et al. Link prediction across time via cross-temporal locality preserving projections
Wang et al. Detection of social groups in class by affinity propagation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201204