CN109242710B - Social network node influence ordering method and system - Google Patents

Social network node influence ordering method and system Download PDF

Info

Publication number
CN109242710B
CN109242710B CN201810931729.9A CN201810931729A CN109242710B CN 109242710 B CN109242710 B CN 109242710B CN 201810931729 A CN201810931729 A CN 201810931729A CN 109242710 B CN109242710 B CN 109242710B
Authority
CN
China
Prior art keywords
user
information
training
test
posts
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810931729.9A
Other languages
Chinese (zh)
Other versions
CN109242710A (en
Inventor
熊菲
杨佳佩
刘云
张振江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jiaotong University
Original Assignee
Beijing Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jiaotong University filed Critical Beijing Jiaotong University
Priority to CN201810931729.9A priority Critical patent/CN109242710B/en
Publication of CN109242710A publication Critical patent/CN109242710A/en
Application granted granted Critical
Publication of CN109242710B publication Critical patent/CN109242710B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations

Landscapes

  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a social network node influence sequencing method and a system, which relate to the technical field of digital information processing, and the method comprises the steps of firstly collecting user personal homepage information, user posting information and user pair information, preprocessing the personal homepage information, the user posting information and the user pair information, and forming a training set and a testing set; then, according to the training set, establishing a transfer matrix model of the posts, and carrying out simulation calculation on the transfer matrix model to obtain optimal training parameters; and finally, establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix, and obtaining the social network node influence sequencing result. The method can find the attention possibility of the hidden node, so that influence ranking analysis is carried out on the data network with incomplete dynamic information and serious loss; the influence of the social network nodes can be more accurately analyzed under the condition of missing social network data.

Description

Social network node influence ordering method and system
Technical Field
The invention relates to the technical field of digital information processing, in particular to a social network node influence sequencing method and system.
Background
In the information era, analysis of human-to-human relationships is one of important cornerstones for measuring personal value, popularizing related products, realizing public opinion monitoring and planning related construction. According to statistics, by 3 months in 2017, the application markets of android and apples share about 500 ten thousand of APPs, and how to better recommend the APPs to the user is a key problem for improving user experience and increasing enterprise revenue, but the existing APP recommendation strategy is mainly based on user personal information, and the influence of user group influence on the APP in a social network is not considered. However, the user must be affected by the social network friends when downloading or purchasing the APP. Therefore, the problems of incorporating social influence among network users in the personalized recommendation algorithm, how to incorporate the social influence, and the like are all concerned by social network analysis. The social network includes an offline friend-making relationship network around an individual and also includes a social relationship established by an online social application, and such networks can be classified into a unidirectional-attention weak link network and a bidirectional-friend strong link network.
The method for paying attention to the relationship between people gradually spreads to the field of information internet from the traditional sociology and psychology modes, and the online social network analysis can be realized by means of a large amount of information acquisition capacity, a large amount of data mining algorithm and a related ranking algorithm.
The influence of the users on the online traffic can be expressed by the activity condition of the users, namely, the action and the thinking of the network users are influenced and changed by the action and the thinking of other people. And the character with large influence in the network plays a key role in a plurality of steps of network construction, spreading, transfer and the like. Therefore, how to evaluate the influence ability of the social network users, and order the users in the social network to obtain the important influence user nodes is the most basic problem requirement in the online network personal influence discussion. Node influence and sequencing in the social network are often the basis for subsequent social network discussion and research.
The early analysis method for the influence and the sequencing of the network nodes mainly adopts non-networked and data modes, such as questionnaire filling, telephone investigation and the like, and the modes have the defects of less acquired data, large time delay and many problems.
With the rapid development of the internet technology and the personal mobile network technology, mass data of an online social network is used as data support, and the using mode mainly comprises analyzing an attention relationship network structure, a transfer record and user activity/content word-sense resolution, so that the message transfer possibility of the users is extracted, the learning influence of the number of successful transfer pairs among the users is counted according to the sequence, and the transfer probability among the users is estimated as the influence through a Bernoulli model and a Jacard index model.
In the meantime, many excellent algorithms are proposed and applied, such as the Pagerank algorithm, which gives the same initial algorithm value to all nodes, and then performs several iterations until the algorithm value after the iteration is substantially unchanged, where the algorithm value of a node is the algorithm value according to which the final ranking depends, and the larger the value is, the more the influence of the node is. Because the Pagerank uses prime number correction when solving the problems that the sorting is not unique and the like, the matrix structure is greatly deformed, the improved LeaderRank algorithm is provided, the influence caused by correction is better reduced, and the result is ensured to be reliable. The analysis of the social network is a complex problem, which can not be solved by a certain method, but needs to comprehensively consider various factors to carry out optimization combination to identify the final social role and influence.
With the increasing emphasis on network security and information leakage risks, the traditional algorithm has some problems in information collection, and by taking information crawled by a Xinlang microblog crawler as an example, in the conventional mode at present, the microblog is not allowed to crawl information of concerned persons and fans in order to protect user information and avoid information leakage risks, and the user attention collection establishment faces to complex conditions. Content protection is also becoming more and more standard and strict for user historical postings, which results in that a lot of user posting and re-posting information may not be available. And the information is often necessary data for a classical influence analysis algorithm.
Disclosure of Invention
The invention aims to provide a method and a system for accurately judging the influence and the sequencing of social user nodes by integrating social network user information so as to solve the technical problems of one-sided consideration and inaccurate result of the existing social network node influence analysis method in the background art.
In order to achieve the purpose, the invention adopts the following technical scheme:
in one aspect, the invention provides a social network node influence ranking method, which comprises the following steps:
step S110: collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information, the user posting information and the user pair information to form a training set and a testing set;
step S120: establishing a transfer matrix model of the posts according to the training set, and carrying out simulation calculation on the transfer matrix model to obtain optimal training parameters;
step S130: and establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix, and obtaining the social network node influence sequencing result.
Further, the step S110 specifically includes:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set includes test set user personal information and test set user pair information.
Further, in step S120, the establishing a transition matrix model of posts according to the training set specifically includes:
step S121: determining a user pair impact factor f for forwarding posts in the training set1
Figure GDA0003327738580000031
Wherein, IUIndicates the number of people the user U is interested in, SVRepresents the number of people of interest of user V;
step S122: determining a user self influence factor f for forwarding the posts in the training set2
Figure GDA0003327738580000032
Wherein X represents the trade-off of social network to the importance of user U's commentary and comments, MURepresents the total number of posts, T, of user UUIndicating the active duration of user U, ZUNumber of posts, P, of user UUThe number of comments indicating the posts of the user U;
step S123: determining a total impact factor f for forwarding posts in the training setuv
fuv=1-exp(-(f1)m×(f2)1-m) Where m represents a training parameter, i.e. f1And f2The trade-off parameter of (1);
step S124: obtaining the probability p of the user U in the training set forwarding the self post by using a K-shell decomposition algorithmuu
Figure GDA0003327738580000033
Wherein n represents the number of user nodes, KsuA K-shell value representing a user U;
step S125: determining a probability P of being forwarded of posts of a user U in the training setuv
Figure GDA0003327738580000034
Step S126: according to puuAnd PuvObtaining the training transfer matrix P:
Figure GDA0003327738580000041
further, in step S120, the simulating the transition matrix model to obtain the optimal training parameters specifically includes:
sequentially selecting C users in the training set and corresponding real reprint numbers M according to the average rank of the forwarded number of postscA plurality of different m values correspond to a plurality of different training transfer matrixes P, and an independent cascade model is adopted to carry out propagation simulation experiment on each P to obtain the expected average transfer number F of C usersc
Determination of error MAPE value:
Figure GDA0003327738580000042
wherein c ═{1,...,C};
And selecting the training parameter corresponding to the P with the minimum MAPE value as the optimal training parameter.
Further, in step S130, the establishing a test transfer matrix for forwarding posts according to the test set and the optimal training parameters specifically includes:
selecting the optimal training parameter as f according to the test set1And f2The test transfer matrix is established according to the method from step S121 to step S126.
Further, in step S130, the calculating the test transfer matrix to obtain the social network node influence ranking result specifically includes:
setting the initial value of each user value vector St to be 1, and obtaining a stable convergence value by using Markov iteration, wherein the calculation process is as follows:
St=(1...1)1×n×Pm
repeating the following processes until the Euclidean norm error of the user value vector is smaller than the preset precision in the previous and next times, stopping the iteration process, and obtaining a stable convergence algorithm value S:
S=St1×n×Pm
and taking each value of the obtained stable convergence algorithm value S as an algorithm value of each user, and comparing the algorithm values to obtain the social network node influence sequencing result.
In another aspect, the present invention provides a social network node influence ranking system, including:
the data preprocessing module is used for collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information and the user posting information to form a training set and a testing set;
the training module is used for establishing a post transfer matrix model according to the training set, carrying out simulation calculation on the transfer matrix model and obtaining optimal training parameters;
and the test module is used for establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix and obtaining the social network node influence sequencing result.
Further, the data preprocessing module specifically includes:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set includes test set user personal information and test set user pair information.
Further, the training module comprises:
the user pair influence factor determining unit is used for determining a user pair influence factor for forwarding the post according to the number of people to be attended of one user and the number of people to be attended of the other user in the user pair;
the user self influence factor determining unit is used for determining the user self influence factor forwarded by the posts according to the posting total number, the active duration, the forwarded number of the posts and the commented number of the posts of the user;
a total influence factor determining unit, configured to determine a total influence factor for forwarding the post according to a trade-off parameter between the user pair influence factor and the user own influence factor;
the user self-forwarding probability determining unit is used for acquiring the probability of forwarding the self post by the user by utilizing a K-shell decomposition algorithm;
a post forwarded probability determination unit for determining the probability of the post of the user being forwarded by other users;
the transition matrix model establishing unit is used for establishing a transition matrix model of the post according to the probability of the user forwarding the post and the forwarded probability of the post;
the optimal training parameter establishing unit is used for adopting an independent cascade model to carry out propagation simulation experiments on the transfer matrix model respectively to obtain an expected average transfer number, determining an error MAPE value and selecting a training parameter corresponding to the transfer matrix with the minimum MAPE value as the optimal training parameter;
further, the test module includes:
the test transfer matrix establishing unit is used for establishing a test transfer matrix according to the test set and the optimal training parameters;
and the influence sequencing establishing unit is used for calculating the test transfer matrix to obtain the social network node influence sequencing result.
The invention has the beneficial effects that: the method can find the attention possibility of the hidden node, thereby being capable of carrying out influence ranking analysis on the data network with incomplete dynamic information and serious deletion, providing a supplement scheme when a common algorithm cannot carry out analysis due to the missing data, and more accurately analyzing the influence of the social network node.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic block diagram of a social network node influence ranking system according to a first embodiment of the present invention.
Fig. 2 is a schematic flow chart of a social network node influence ranking method according to a second embodiment of the present invention.
Fig. 3 is a schematic block diagram of a social network node influence ranking system according to a third embodiment of the present invention.
Fig. 4 is a schematic diagram of a training set K-shell value and a corresponding user transfer condition according to the fourth embodiment of the present invention.
Fig. 5 is a diagram of a calculation result of the optimal transition matrix parameter m according to the fourth embodiment of the present invention.
Fig. 6 is a schematic diagram of a test set K-shell value and a corresponding user transfer condition according to the fourth embodiment of the present invention.
Fig. 7 is a verification diagram of the effect of the contrast image according to the fifth embodiment of the present invention.
FIG. 8 is a comparison kendall verification plot as described in example five of the present invention.
FIG. 9 is a comparison specific ranking verification diagram according to the fifth embodiment of the present invention.
Fig. 10 is a flowchart illustrating a social network node influence ranking method according to the fourth embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or modules having the same or similar functionality throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
For the convenience of understanding of the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.
It will be understood by those of ordinary skill in the art that the figures are merely schematic representations of one embodiment and that the elements or devices in the figures are not necessarily required to practice the present invention.
Example one
As shown in fig. 1, an embodiment of the present invention provides a social network node influence ranking system, where the system includes:
the data preprocessing module is used for collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information and the user posting information to form a training set and a testing set;
the training module is used for establishing a post transfer matrix model according to the training set, carrying out simulation calculation on the transfer matrix model and obtaining optimal training parameters;
and the test module is used for establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix and obtaining the social network node influence sequencing result.
In a first embodiment of the present invention, the data preprocessing module specifically includes:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set includes test set user personal information and test set user pair information.
In practical application, the data preprocessing module in the first embodiment of the present invention is mainly used for obtaining a data set, where the data includes 3 types: the first type is personal homepage information which at least comprises a user ID, a user posting total number, a user grade or active duration, the number of user fans (a fan that user A pays attention to user B and is called A is called B), and the number of user concerns (a concern that user A pays attention to user B and is called B is called A); the second type is user posting information which at least comprises forwarded number and commented number of partial posts; the third type is the attention relationship among users, and at least comprises the attention relationship among partial users;
the method comprises the following steps of simply preprocessing a data set, generating a required form, simply cleaning the data set by advertisement filtration and the like, cutting the data set into a training set and a testing set according to requirements, and then respectively generating the following required forms: processing a personal homepage statistical information table, combining the third type data and the first type data, and adding two items of the average number of the posts and the average number of the comments of the user in the table; and cleaning the user attention relation table to ensure that each pair of attention information, fan and attention person is in the user personal information table.
In a first embodiment of the present invention, the training module includes:
the user pair influence factor determining unit is used for determining a user pair influence factor for forwarding the post according to the number of people to be attended of one user and the number of people to be attended of the other user in the user pair;
the user self influence factor determining unit is used for determining the user self influence factor forwarded by the posts according to the posting total number, the active duration, the forwarded number of the posts and the commented number of the posts of the user;
a total influence factor determining unit, configured to determine a total influence factor for forwarding the post according to a trade-off parameter between the user pair influence factor and the user own influence factor;
the user self-forwarding probability determining unit is used for acquiring the probability of forwarding the self post by the user by utilizing a K-shell decomposition algorithm;
a post forwarded probability determination unit for determining the probability of the post of the user being forwarded by other users;
the transition matrix model establishing unit is used for establishing a transition matrix model of the post according to the probability of the user forwarding the post and the forwarded probability of the post;
the optimal training parameter establishing unit is used for adopting an independent cascade model to carry out propagation simulation experiments on the transfer matrix model respectively to obtain an expected average transfer number, determining an error MAPE value and selecting a training parameter corresponding to the transfer matrix with the minimum MAPE value as the optimal training parameter;
in a first embodiment of the present invention, the test module includes:
the test transfer matrix establishing unit is used for establishing a test transfer matrix according to the test set and the optimal training parameters;
and the influence sequencing establishing unit is used for calculating the test transfer matrix to obtain the social network node influence sequencing result.
Example two
As shown in fig. 2, a method for ranking influence of social network nodes by using the system according to the first embodiment of the present invention includes the following steps:
step S110: collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information and the user posting information to form a training set and a testing set;
step S120: establishing a transfer matrix model of the posts according to the training set, and carrying out simulation calculation on the transfer matrix model to obtain optimal training parameters;
step S130: and establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix, and obtaining the social network node influence sequencing result.
In a second embodiment of the present invention, the step S110 specifically includes:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set includes test set user personal information and test set user pair information.
In a second embodiment of the present invention, in the step S120, the establishing a transition matrix model of posts according to the training set specifically includes:
step S121: determining a user pair impact factor f for forwarding posts in the training set1
Figure GDA0003327738580000091
Wherein, IUIndicates the number of people the user U is interested in, SVRepresents the number of people of interest of user V;
step S122: determining a user self influence factor f for forwarding the posts in the training set2
Figure GDA0003327738580000092
Wherein X represents the trade-off of social network to the importance of user U's commentary and comments, MURepresents the total number of posts, T, of user UUIndicating the active duration of user U, ZUNumber of posts, P, of user UUThe number of comments indicating the posts of the user U;
step S123: determining a total impact factor f for forwarding posts in the training setuv
fuv=1-exp(-(f1)m×(f2)1-m) Where m represents a training parameter, i.e. f1And f2The trade-off parameter of (1);
step S124: obtaining the probability p of the user U in the training set forwarding the self post by using a K-shell decomposition algorithmuu
Figure GDA0003327738580000093
Wherein n represents the number of user nodes, KsuA K-shell value representing a user U;
step S125: determining a probability P of being forwarded of posts of a user U in the training setuv
Figure GDA0003327738580000094
Step S126: according to puuAnd PuvObtaining the training transfer matrix P:
Figure GDA0003327738580000101
in a second specific embodiment of the present invention, in step S120, the simulating the transition matrix model to obtain the optimal training parameters specifically includes:
sequentially selecting C users in the training set and corresponding real reprint numbers M according to the average rank of the forwarded number of postscA plurality of different m values correspond to a plurality of different training transfer matrixes P, and an independent cascade model is adopted to carry out propagation simulation experiment on each P to obtain the expected average transfer number F of C usersc
Determination of error MAPE value:
Figure GDA0003327738580000102
wherein, C ═ {1,. said, C };
and selecting the training parameter corresponding to the P with the minimum MAPE value as the optimal training parameter.
In a second embodiment of the present invention, in the step S130, the establishing a test transfer matrix for forwarding posts according to the test set and the optimal training parameters specifically includes:
selecting the optimal training parameter as f according to the test set1And f2According to the step S121 to the step 126, the test transfer matrix is established.
In a second embodiment of the present invention, in the step S130, the calculating the test transfer matrix to obtain the social network node influence ranking result specifically includes:
setting the initial value of each user value vector St to be 1, and obtaining a stable convergence value by using Markov iteration, wherein the calculation process is as follows:
St=(1...1)1×n×Pm
repeating the following processes until the Euclidean norm error of the user value vector is smaller than the preset precision in the previous and next times, stopping the iteration process, and obtaining a stable convergence algorithm value S:
S=St1×n×Pm
and taking each value of the obtained stable convergence algorithm value S as an algorithm value of each user, and comparing the algorithm values to obtain the social network node influence sequencing result.
EXAMPLE III
As shown in fig. 3, a third embodiment of the present invention provides a social network node influence ranking system, where the system includes:
the data preprocessing unit 21 acquires the microblog pcu of the surf microblog data set from the network, performs simple preprocessing, and generates 4 training sets and 4 test sets in the form required by the design, where the training sets and the test sets are respectively: training set user personal information tables, training set user attention information tables, test set user personal information tables and test set user attention information tables;
the first training unit 22 is used for generating a transfer matrix Pm for forwarding posts according to a training set, wherein 11 sampling values with m being 0-1 and 0.1 being equal intervals are taken, and different transfer matrices Pm are correspondingly generated;
the second training unit 23 is configured to perform a post propagation simulation experiment of "the number of times a post is rotated" according to 11 different transition matrices Pm, and screen an optimal value in the transition matrix Pm according to the MAPE value to obtain a training value m;
the first test unit 24 generates a transfer matrix P according to the test set and the training result m, and generates the algorithm design ranking;
the second testing unit 25 generates other algorithm ranks according to the test set and other algorithms;
the third testing unit 26 is used for carrying out consistency check according to the ranking result of the algorithm and the ranking results of other algorithms to prove the superiority of the ranking result of the algorithm;
the data preprocessing unit 21 specifically includes:
the data set obtaining subunit 211 obtains a billow microblog data set microbugpcu, which is obtained from a billow microblog by Jun Liu et al at 2015.3.17. The data set mainly comprises 4 files of weibo _ user.csv (user personal information), follow-followee.csv (user attention information), user _ post.csv (post content information) and post.csv (post content information), wherein the weibo _ user.csv contains information such as 700+ user ID, name, gender, grade, private information, zip code, fan value, total number of attention people and the like; csv records about 14 ten thousand fan-attention relationship pairs, including users not recorded in the weibo _ user; the user _ post.csv and post.csv record the posting content, post ID, poster ID, transfer number, comment number and the like of the users; the data set is simply cleaned, interference contents such as zombie numbers, trumpets and the like are removed, and a plurality of missing values still exist and need to be manually removed;
a data set preprocessing subunit 212 for performing simple preprocessing to generate a training set and a test set; and processing the personal information table of the user, adding the information of the average number of transferred labels and the average number of commented items, and ensuring that the following item information exists for each user: ID, name, user level, number of posts, number of fans, number of people paying attention to, average number of posts of the user being re-posted/commented; and cleaning the user attention information table to ensure that each pair of attention information, fan and attention person is in the user personal information table. So far, 4 tables are generated, which are respectively: training set user personal information tables, training set user attention information tables, test set user personal information tables and test set user attention information tables.
The first training unit 22 specifically includes:
the transition matrix Pm is used for describing possible post transfer probability, and when the transfer probability is larger than 0, even if no transfer record or attention relation among users is observed at present, transfer possibility exists in the future;
the kshell value operator unit 221 calculates a kshell value of the training set, and the kshell value calculation step is as follows: firstly, selecting points with the degree of 0 in a network for peeling; then all the users in the newly formed network are selected to be stripped when the degree is judged to be 1 point, then the degree of part of the users in the newly formed network is changed, all the users are continuously selected to be stripped when the degree is judged to be 1 point, and the steps are repeated until the newly formed network has no strippable point, and all the stripped degree 1 points are called as 1 shell; repeating the steps to obtain 2shell … kshell until all nodes are peeled off, so that each node has a kshell value of an integer belonging to the node;
as shown in fig. 4, which is a graph of kshell values and the posts of the corresponding users in the training set of the present invention, it can be seen that kshell values and posts do not exhibit obvious correlation, and perform poorly, and cannot be used alone as a method for analyzing such network rankings.
The training parameter simulation sub-unit 222 respectively takes 0-1 equal interval values for m, such as 0.1 interval, and completes the following processes in 11 times;
fuv and Puu compute subunit 223, compute fuv and Puu;
definition 1, user pair impact factor f 1:
Figure GDA0003327738580000121
wherein Iu represents the fan number of user u, and Sv represents the attention number of user v;
definition 2, user self-influence factor f 2:
Figure GDA0003327738580000122
wherein x is a parameter of 0-1, which represents the balance of social application on the importance degree of the commentary and the commentary, and the settable value without loss of generality is 0.5;
definition 3, transition probability fuv:
fuv=1-exp{-(f1)m*(f2)1-m}
wherein m is 0-1, and 11 parameters are sampled at intervals of 0.1, which represents the balance between f1 and f2 and needs training;
definition 4, transition probability Puu:
Figure GDA0003327738580000123
wherein n is the number of user nodes, and ksu is the kshell value.
Puv calculation subunit 224, calculate Puv;
definition 5, forwarding probability Puv:
Figure GDA0003327738580000131
a transition matrix P generation subunit 225 for generating a transition probability matrix P, where different m is represented as Pm;
according to Puv and Puu, a transition matrix Pm is obtained:
Figure GDA0003327738580000132
the second training unit 23 specifically includes:
the data extraction subunit 231 is used for extracting a user list and the number of the transferred posts according to the data set content, wherein the number of the transferred posts is ranked 20 users before the post number is obtained;
a propagation experiment subunit 232, which has different transition matrixes P for different m, uses 20 users as a single starting point to independently cascade propagation experiments for each transition matrix Pm, and repeats each experiment 10 times to obtain an average value Fc;
the MAPE value calculating operator unit 233 calculates MAPE values by using Fc and Mc for each Pm;
Figure GDA0003327738580000133
where C is the number of users, C is the specific user, and MAPE indicates the error between the predicted data and the actual data;
a training value selecting subunit 234, which selects the minimum MAPE value corresponding to the optimal m as the training result;
as shown in fig. 5, which is a graph of the calculation result of the second training value optimal m according to the embodiment of the present invention, m takes 11 samples with 0-1 and 0.1 as an interval, to obtain 11 transition probability matrices Pm with m being 0 and 0.1 … … 1, then MAPE values are respectively calculated, and the MAPE value is taken to be the minimum, to obtain the corresponding optimal m value, where the optimal m value obtained in this embodiment is 0.5.
The first test unit 24 specifically includes:
a kshell value operator unit 241, which calculates a kshell value of the test set;
as shown in fig. 6, a graph of kshell values and corresponding user re-posts of a second test set according to an embodiment of the present invention shows that kshell values are poor in performance and cannot be used alone as a method for analyzing such network rankings.
The transfer matrix calculation subunit 242 calculates a formula by using the training value m, the test set data and the transfer matrix model to obtain a test set transfer matrix P;
the sorting calculation subunit 243 sets the initial value vector as the full 1 vector, multiplies the transition probability matrix P, and iterates continuously until convergence to obtain an algorithm value and obtain the top 10 ranks. The stable convergence value may be iteratively obtained using a markov approach. Let the iteration initial value be 1, calculate:
St=(1…1)1*n*Pn*n
repeating the following processes until the error delta meets the precision requirement to obtain a stable convergence algorithm value vector S:
St=St1*n*Pn*n
when calculating Δ, a two-norm (euclidean norm) may be used, and the length of the difference vector between two times of St before and after calculation satisfies the requirement, and is considered to be convergent.
The second testing unit 25 specifically includes:
degree centrality is used as a local influence algorithm to represent reference, and betweenness centrality and tight centrality or approximate centrality are used as global influence algorithm to represent reference; therefore, the second embodiment applies the three indexes to the test set data to obtain the top ten ranks respectively for subsequent comparison.
The third testing unit 26 specifically includes:
a comparison image effect verification subunit 261, which draws different algorithms and real data inspection images, and compares them; firstly, after the algorithm is applied and calculated by a test set, 10 users before the rank are obtained from the test set, the average post transfer quantity of the users is obtained from a data set, then the calculated value of the user algorithm is taken as the abscissa, the average post transfer quantity of the corresponding users is taken as the ordinate, and whether positive correlation is presented or whether consistency is presented between the two is observed; for other algorithms, such as degree centrality, betweenness centrality and tight centrality indexes, the consistency relation check graph can be made by taking the algorithm or the index value as the horizontal axis and the corresponding user transferred number as the vertical axis, and whether the algorithm images show positive correlation or consistency relation or not can be observed; observing whether the method has a consistency relation with real data or not, and intuitively observing whether the method is superior to other methods in performance or not from the image;
as shown in fig. 7, a verification diagram of the effect of the two-comparison image according to the embodiment of the present invention includes consistency between the top 10 of the rank and the real transfer data (res.myalgo), consistency between the top 10 of the degree centrality and the real transfer data (res.indgcent), consistency between the top 10 of the tight centrality and the real transfer data (res.closed element), and consistency between the top 10 of the degree centrality and the real transfer data (res.betweencent) and the comparison image, which can be visually seen through image comparison, the index effect of the tight centrality is very poor, and there is no obvious consistency; the effect of betweenness centrality and degree centrality can be accepted, the number of posts to be posted is not reduced along with the increase of values, monotonicity is realized, but the degree centrality is lack of resolution, and the betweenness centrality has obvious defects of engineering application cost due to higher complexity; the algorithm value and the transfer number under the method are consistent, users with high influence cannot be hidden in a user group with a low algorithm calculation result value (the influence ranking is considered to be low) and a user group with a medium algorithm calculation result value (the influence ranking is considered to be medium), the screening range of users with high influence can be narrowed in a region with a large algorithm calculation result value through the algorithm, and a good ranking effect is achieved;
a comparison kendall test verification subunit 262, which calculates different algorithms and real data kendall tests, and researches and tests results and values (kendall consistency test, which means that different methods are used for ranking the same sample, and then, for each two ranks, the comparison and calculation of ranking similarity are performed, wherein one method is the kendall consistency test, the calculation method mainly considers pairs of same sequence and opposite sequence, if the rank of user a in method a is higher than that of the user, and the rank of user a in method B is also higher than that of user B, the same sequence and the sign are positive, otherwise, the opposite sequence and the sign are negative, and the positive and negative sums of the same sequence and the opposite sequence are counted, the larger the value of the sum indicates that the same sequence pairs are more, the closer the rank is, and if the kendall test values of a certain method and the real rank are larger, the rank obtained by the method is more accurate); for the top 10 ranking obtained by the algorithm calculation test set, the top 10 ranking obtained by the other algorithms and the top 10 ranking of the real data (the highest 10 users with the transferred number) are subjected to a kendall consistency check mode to obtain a kendall check result, wherein the numerical value means consistency between two vectors, if the kendall value is larger, the ordering of the two vectors is more consistent, and if the kendall coefficient obtained by the real data calculation is larger, the ordering result of the algorithm is more consistent with the situation of the real data, so that the method effectiveness degree is numerically and definitely known;
as shown in fig. 8, which is a verification diagram for a two-comparison kendall test according to an embodiment of the present invention, it can be seen that the consistency between the close centrality (clCent) and the real sticky note ranking data (realRepo) is poor, and a correct ranking result cannot be obtained at all by using the close centrality; the betweenness centrality (bwCent) has the defects of difficult engineering realization and high cost due to high complexity; by using the algorithm, the difference between the degree and the centrality (dgCent) is not large, and a better related trend can be obtained, so that the screening range of the high-influence users can be narrowed.
Comparing the specific ranking verification subunit 263, outputting the top 10 users and algorithm/real values of different algorithms and real data, and specifically comparing; and (3) calculating top 10 ranks obtained by the test set through the algorithm, listing other algorithms and top 10 of real data, wherein the listed items comprise the ranking of each algorithm/real data ranking user ID, the algorithm value/real average posted sub-number, and obtaining algorithm effect analysis from the specific ranking result and the algorithm value.
As shown in fig. 9, for a comparative specific ranking verification graph in the embodiment of the present invention, it can be known from an analysis graph that 3 users before ranking are 3 before the ranking of the algorithm of the present invention in the real data, which indicates that the high-influence users have a high algorithm value, the present invention is effective.
A time counting subunit 264, which calculates the time difference by setting time stamps at the beginning and end of the operation of the embodiment, considering whether the time is acceptable when the total time of the output program is used; as can be seen from FIG. 8, the network analysis embodiment two, which includes 700+ user interconnections, runs for 42 seconds, which matches the o (n ^2) complexity of the theoretical analysis, and can be further optimized and reduced because embodiment two uses python language which runs slower and takes more time to output contrast verification images.
Example four
As shown in fig. 10, a fourth embodiment of the present invention provides a method for ranking influence of social network nodes by using the system described in the third embodiment. The method mainly comprises the following steps:
step 11, acquiring a data set, and simply preprocessing the data set to generate a training set and a test set in a form required by the design;
step 12, generating a transfer matrix Pm for forwarding posts according to the training set, wherein different training parameters m generate different transfer matrices Pm;
step 13, carrying out a simulation experiment of post propagation according to different transfer matrixes Pm, and screening an optimal value in the transfer matrixes Pm;
and 14, generating a transfer matrix P according to the test set and the training result m, and generating the algorithm design ranking.
The step 11 comprises:
step 111, obtaining a data set, wherein the data comprises 3 types: the first type is personal homepage information which at least comprises a user ID, a user posting total number, a user grade or active duration, the number of user fans (a fan that user A pays attention to user B and is called A is called B), and the number of user concerns (a concern that user A pays attention to user B and is called B is called A); the second type is user posting information which at least comprises forwarded number and commented number of partial posts; the third type is the attention relationship among users, and at least comprises the attention relationship among partial users;
step 112, simply preprocessing the data set to generate a required form, simply cleaning the data set by advertisement filtering and the like, cutting the data set into a training set and a testing set according to requirements, and then respectively generating the following required forms: processing a personal homepage statistical information table, combining the third type data and the first type data, and adding two items of the average number of the posts and the average number of the comments of the user in the table; and cleaning the user attention relation table to ensure that each pair of attention information, fan and attention person is in the user personal information table.
The step 12 specifically includes:
the transition matrix Pm is used for describing possible post transfer probability, and when the transfer probability is larger than 0, even if no transfer record or attention relation among users is observed at present, transfer possibility exists in the future;
the generation method of the transfer matrix Pm is as follows:
definition 1, user pair impact factor f 1: the more the fans I of the user are, the stronger the influence is, the more the attention number S of the user is, the stronger the sensitivity is, and if the attention number S of one user is larger, the influence of a single node on the node is relatively diluted, so that the parameters are set:
Figure GDA0003327738580000161
wherein Iu represents the fan number of the user U, Sv represents the attention number of the user V, and f1 represents the description of the influence of the user U on the user pair of the user V, and the fact that the condition that 0 cannot be divided by f1 to be 0 can bring about subsequent problems is considered;
definition 2, user self-influence factor f 2: on one hand, the node activity degree, namely the number of posts/the total active time, can be considered, and the total active time can be embodied by the user level; on the other hand, the post quality can be embodied by the number of commentary and the number of commentary, so that parameters are set
Figure GDA0003327738580000171
The parameter x is 0-1, represents the balance of social application on the sticky note and comment weight degree, the settable value without losing generality is 0.5, if a user has a plurality of posts to be captured, the average sticky note and the average comment number are obtained, and the follow-up problems caused by the fact that the problem except 0 and the f2 is 0 are considered;
definition 3, transition probability fuv, indicating the influence of different users U on V, i.e. V transit U post probability
fuv=1-exp{-(f1)m*(f2)1-m}
The exponential form is adopted because the increasing trend of fuv along with the increase of f1 and f2 is met, m is a key training parameter with the value of 0-1 and is used for distributing the proportion of the influence factors of the user to the influence factors and the influence factors of the user in the propagation process, namely the balance of f1 and f 2;
definition 4, transition probability Puu, representing the probability that the user points to itself; considering that the more probability that one user is reprinted by the other users, the less the user points to the user, and thus the user is generally the core user in the network, namely the point with larger kshell decomposition value, the user can point to the user more slightly
Figure GDA0003327738580000172
Wherein n is the number of user nodes, Ksu is a kshell value, and when the kshell value is larger, the node is more positioned in the network center, and the possibility that the node points to the node is relatively smaller;
definition 5, forwarding probability Puv: the matrix pattern can be obtained by definition 3 and definition 4, but needs to be satisfied in consideration of the probabilistic forwarding matrix definition
∑Puv+Puu=1
Thus is provided with
Figure GDA0003327738580000173
Considering that Puv is generally too small to facilitate subsequent training, when Sv < average (Sv) (average of Sv), fuv is set to be the minimum value.
According to Puv and Puu, a transition matrix Pm is obtained:
Figure GDA0003327738580000174
the step 13 comprises:
performing a propagation simulation experiment of the posts, screening an optimal value in a transfer matrix Pm, adopting an independent cascade model in the propagation simulation experiment, and taking out '20 users before the average post is ranked by the number of revolutions' in a corresponding data set to obtain a user list and a corresponding real number of revolutions Mc;
introduction of independent cascade model: in the independent cascade model, each node has two states, activated and not activated, wherein activation indicates that the node accepts or propagates some information (e.g., forwarding on microblog, likes on microblog) [ Libang, Chuya Nu, Von Jian Hua, Xuyao Strong [ J ] computer science, 2016,39(04):643- "656 ]; the independent cascade model is a modeling situation, when a node u in the model is infected, the node u tries to infect a neighbor node v with a possibility Puv, the infection can only be used once in one direction between a pair of users, u infects all neighbor nodes v without mutual interference, and all different users infect v without mutual interference until u tries to infect all neighbor users v, and then the infected users v are processed in sequence according to the previous method; the activated node can not be activated again, namely, the information can not be re-posted by the same user; the message transfer process applying the independent cascade model comprises the following steps:
given an initial user or users, infection in sequence becomes the starting point for the beginning; if user u is infected, then u will have a chance to infect all buddies once each, with each process infecting possibilities Puv, independent of itself; when Puv is more, it indicates that the infection probability is more, u is more likely to infect v;
if the node w is not infected at the time t, all infected neighbors of the node w try to infect the node w, but the infected neighbors do not comprise the tried infection process, and if the node w is infected, the node w is switched into an infected state at the time t + 1;
repeating the process until all the infection attempts are completed, namely the maximum infectable range is reached, wherein the infection range is the maximum transmission range of the information from the initial node, and averaging;
for different m with different transfer matrixes Pm, carrying out propagation experiments for 10 times by using independent cascade models respectively to obtain an average transfer number Fc;
definition 6, error MAPE value: the expression is the error between the predicted data and the real data, if the calculated result is smaller, the calculated error is smaller, in other words, the corresponding values of P and m are better and more accepted;
Figure GDA0003327738580000181
wherein C is the number of users and C is a specific user;
calculating Fc and MAPE of different m, and selecting m with the smallest MAPE as an optimal value for training.
The step 14 comprises:
step 141, generating a transition matrix P, and generating the transition matrix P according to the data of the test set and the optimal training value of m by the method of step 12;
step 142, generating the ranking of the algorithm, setting the initial value of each user value vector St to be 1, and obtaining a stable convergence value by using Markov iteration, wherein the calculation process is as follows:
St=(1…1)1*n*Pn*n
repeating the following processes until the error delta meets the precision requirement to obtain a stable convergence algorithm value vector S:
St=St1*n*Pn*n
comparing the Euclidean norm errors delta of the user value vectors of the previous and next times, stopping the iteration process when the errors delta are smaller than the preset precision, taking all values of the obtained user values St as algorithm values of each user, and comparing the values to obtain the user ranking of the algorithm;
convergence in this process can be demonstrated: if the transition matrix P converges, 3 conditions need to be satisfied:
p is a random matrix;
p is irreducible;
p is non-periodic;
for the first requirement, the random matrix: let Pij be the i row and j column of P, with any i 1, 2 … n and j 1, 2 … n, Pij ≧ 0, and at the same time satisfying any i 1, 2 … n, Pij summing j to 1, it is clear that the matrix P is non-negative and satisfies each row and 1;
for the second condition, the matrix P is a matrix that satisfies the requirement and only if only the directed image of the network corresponding to P is a network image that is strongly connected (any two nodes can reach), that is, a path can be found between any two points, and since all elements in the transfer matrix P of the algorithm are all positive, such a path must exist, so the matrix P satisfies the irreducible condition;
for the third condition, the periodicity refers to the iterative value changing repeatedly according to the rule, since the relationship that the aperiodic and the element matrix are equivalent can be known according to the related knowledge, the element matrix refers to the matrix with the number of times of a power being a positive matrix, because all elements of P are positive, P must also satisfy the equivalence requirement, that is to say, the third aperiodic condition is satisfied;
meanwhile, the method mainly spends time in the calculation of the transfer matrix, the calculation of the independent cascade model and the Markov iteration, all elements in the transfer probability matrix need to be independently operated in the algorithm, the time o (n ^2) is spent, the element calculation is relatively simple, data can be directly obtained from a table, simple addition, division and exponential operation are carried out, and therefore the complexity of the generation time of the transfer matrix is o (n ^ 2); in the independent cascade model, the worst case is to infect one user at a time until all the users are infected, and the time consumption is also o (n ^2) magnitude at the time, so the time complexity is o (n ^ 2). Although the Markov iteration process is relatively long in time consumption, the process is similar to a classical Pagerank transfer matrix in the method, the transfer probability of the same user in the Pagerank algorithm is approximately halved, and researches prove that the Pagerank can be converged generally in 50-75 iterations, and in conclusion, the total time complexity of the method is o (n ^2), which is acceptable time complexity.
In summary, the social network node influence ranking system provided by the embodiment of the present invention is applicable to a social network with incomplete dynamic information. The method comprises the steps that a training set and a test set in the form required by the system are generated by obtaining a data set containing personal information, post information and attention relationship information of a social network user and performing simple preprocessing such as data cleaning and project merging; establishing a social network post transfer matrix generation and screening model according to the training set, and generating a post forwarding transfer matrix by sequentially considering the personal network position of a user, the local network influence of the user, the self post transfer probability of the user and the inter-user post transfer probability; respectively carrying out simulation experiments of post propagation according to different transition matrixes, comparing relative errors of actual values and simulated values of post propagation ranges, and selecting the minimum error to screen out the optimal transition matrix and the corresponding training parameters; and generating a transfer matrix by using the same modeling method according to the test set and the training parameters, and finally obtaining stable influence sequencing.
From the above description of the embodiments, it is clear to those skilled in the art that the present invention can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.
The above-described embodiments of the apparatus and system are merely illustrative, and the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (8)

1. A social network node influence sequencing method is used for recommending APP downloading or purchasing to a user, and is characterized by comprising the following steps:
step S110: collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information, the user posting information and the user pair information to form a training set and a testing set;
step S120: establishing a transfer matrix model of the posts according to the training set, and carrying out simulation calculation on the transfer matrix model to obtain optimal training parameters;
step S130: establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix, and obtaining the social network node influence sequencing result;
the step S110 specifically includes:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set comprises test set user personal information and test set user pair information;
in step S120, the establishing a transition matrix model of posts according to the training set specifically includes:
step S121: determining a user pair impact factor f for forwarding posts in the training set1
Figure FDA0003458816650000011
Wherein, IUIndicates the number of people the user U is interested in, SVRepresents the number of people of interest of user V;
step S122: determining a user self influence factor f for forwarding the posts in the training set2
Figure FDA0003458816650000012
Wherein X represents the trade-off of social network to the importance of user U's commentary and comments, MURepresents the total number of posts, T, of user UUIndicating the active duration of user U, ZUNumber of posts, P, of user UUThe number of comments indicating the posts of the user U;
step S123: determining a total impact factor f for forwarding posts in the training setuv
fuv=1-exp(-(f1)m×(f2)1-m) Where m represents a training parameter, i.e. f1And f2The trade-off parameter of (1);
step S124: obtaining the probability p of the user U in the training set forwarding the self post by using a K-shell decomposition algorithmuu
Figure FDA0003458816650000021
Wherein n represents the number of user nodes, KsuA K-shell value representing a user U;
step S125: determining a probability P of being forwarded of posts of a user U in the training setuv
Figure FDA0003458816650000022
Step S126: according to puuAnd PuvObtaining a training transfer matrix P:
Figure FDA0003458816650000023
2. the method according to claim 1, wherein in the step S120, the simulating the transition matrix model to obtain the optimal training parameters specifically includes:
sequentially selecting C users in the training set and corresponding real reprint numbers M according to the average rank of the forwarded number of postscA plurality of different m values correspond to a plurality of different training transfer matrixes P, and an independent cascade model is adopted to carry out propagation simulation experiment on each P to obtain the expected average transfer number F of C usersc
Determination of error MAPE value:
Figure FDA0003458816650000024
wherein, C ═ {1,. said, C };
and selecting the training parameter corresponding to the P with the minimum MAPE value as the optimal training parameter.
3. The method according to claim 2, wherein in step S130, the establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters specifically comprises:
selecting the optimal training parameter as f according to the test set1And f2The test transfer matrix is established according to the method from the step S121 to the step S126.
4. The method according to claim 3, wherein in the step S130, the calculating the test transfer matrix to obtain the social network node influence ranking result specifically includes:
setting the initial value of each user value vector St to be 1, and obtaining a stable convergence value by using Markov iteration, wherein the calculation process is as follows:
St=(1...1)1×n×Pm
repeating the following processes until the Euclidean norm error of the user value vector is smaller than the preset precision in the previous and next times, stopping the iteration process, and obtaining a stable convergence algorithm value S:
S=St1×n×Pm
and taking each value of the obtained stable convergence algorithm value S as an algorithm value of each user, and comparing the algorithm values to obtain the social network node influence sequencing result.
5. A social network node impact ranking system for performing the method of any of claims 1 to 4, the system comprising:
the data preprocessing module is used for collecting user personal homepage information, user posting information and user pair information, and preprocessing the personal homepage information and the user posting information to form a training set and a testing set;
the training module is used for establishing a post transfer matrix model according to the training set, carrying out simulation calculation on the transfer matrix model and obtaining optimal training parameters;
and the test module is used for establishing a test transfer matrix for post forwarding according to the test set and the optimal training parameters, calculating the test transfer matrix and obtaining the social network node influence sequencing result.
6. The system of claim 5, wherein the data preprocessing module specifically comprises:
collecting personal homepage information, user posting information and user pair information to form a data set; the personal homepage information at least comprises a user ID, a user posting total number, user active time, the number of people concerned by the user and the number of people concerned by the user;
the user posting information at least comprises forwarded number of posts and commented number of the posts;
the user pair information comprises a concern relationship between the users;
cutting the data set into a training set and a testing set according to requirements, wherein the training set comprises training set user personal information and training set user pair information; the test set includes test set user personal information and test set user pair information.
7. The system of claim 6, wherein the training module comprises:
the user pair influence factor determining unit is used for determining a user pair influence factor for forwarding the post according to the number of people to be attended of one user and the number of people to be attended of the other user in the user pair;
the user self influence factor determining unit is used for determining the user self influence factor forwarded by the posts according to the posting total number, the active duration, the forwarded number of the posts and the commented number of the posts of the user;
a total influence factor determining unit, configured to determine a total influence factor for forwarding the post according to a trade-off parameter between the user pair influence factor and the user own influence factor;
the user self-forwarding probability determining unit is used for acquiring the probability of forwarding the self post by the user by utilizing a K-shell decomposition algorithm;
a post forwarded probability determination unit for determining the probability of the post of the user being forwarded by other users;
the transition matrix model establishing unit is used for establishing a transition matrix model of the post according to the probability of the user forwarding the post and the forwarded probability of the post;
and the optimal training parameter establishing unit is used for adopting the independent cascade model to carry out propagation simulation experiments on the transfer matrix model respectively to obtain an expected average transfer number, determining an error MAPE value and selecting the training parameter corresponding to the transfer matrix with the minimum MAPE value as the optimal training parameter.
8. The system of claim 7, wherein the test module comprises:
the test transfer matrix establishing unit is used for establishing a test transfer matrix according to the test set and the optimal training parameters;
and the influence sequencing establishing unit is used for calculating the test transfer matrix to obtain the social network node influence sequencing result.
CN201810931729.9A 2018-08-16 2018-08-16 Social network node influence ordering method and system Active CN109242710B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810931729.9A CN109242710B (en) 2018-08-16 2018-08-16 Social network node influence ordering method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810931729.9A CN109242710B (en) 2018-08-16 2018-08-16 Social network node influence ordering method and system

Publications (2)

Publication Number Publication Date
CN109242710A CN109242710A (en) 2019-01-18
CN109242710B true CN109242710B (en) 2022-03-11

Family

ID=65070531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810931729.9A Active CN109242710B (en) 2018-08-16 2018-08-16 Social network node influence ordering method and system

Country Status (1)

Country Link
CN (1) CN109242710B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110391013B (en) * 2019-07-17 2020-08-14 北京智能工场科技有限公司 System and device for predicting mental health by building neural network based on semantic vector
CN110942345B (en) * 2019-11-25 2022-02-15 北京三快在线科技有限公司 Seed user selection method, device, equipment and storage medium
CN111192153B (en) * 2019-12-19 2023-08-29 浙江大搜车软件技术有限公司 Crowd relation network construction method, device, computer equipment and storage medium
CN111062808B (en) * 2019-12-24 2023-06-09 深圳市信联征信有限公司 Credit card limit evaluation method, credit card limit evaluation device, computer equipment and storage medium
CN111639267A (en) * 2020-05-28 2020-09-08 郭海萍 Method for quickly calculating first screen attention posts and related product
CN111932109B (en) * 2020-08-06 2023-04-07 国家计算机网络与信息安全管理中心 User influence evaluation system for mobile short video application
CN112612968B (en) * 2020-12-17 2024-04-09 北京理工大学 Link recommendation method in dynamic social network based on long-term benefits
CN113158072B (en) * 2021-03-24 2024-03-22 马琦伟 Multi-attribute heterogeneous network node influence measurement method, device, equipment and medium
CN117217808B (en) * 2023-07-21 2024-04-05 广州有机云计算有限责任公司 Intelligent analysis and prediction method for activity invitation capability

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150120717A1 (en) * 2013-10-25 2015-04-30 Marketwire L.P. Systems and methods for determining influencers in a social data network and ranking data objects based on influencers
CN106952166B (en) * 2016-01-07 2020-11-03 腾讯科技(深圳)有限公司 User influence estimation method and device of social platform
CN107818514B (en) * 2016-09-12 2022-01-14 腾讯科技(深圳)有限公司 Method, device and terminal for controlling information propagation of online social network
CN108305181B (en) * 2017-08-31 2021-12-14 腾讯科技(深圳)有限公司 Social influence determination method and device, information delivery method and device, equipment and storage medium

Also Published As

Publication number Publication date
CN109242710A (en) 2019-01-18

Similar Documents

Publication Publication Date Title
CN109242710B (en) Social network node influence ordering method and system
Mazumder et al. Dataperf: Benchmarks for data-centric ai development
US20170154267A1 (en) Discovering signature of electronic social networks
CN109447156B (en) Method and apparatus for generating a model
CN111695965B (en) Product screening method, system and equipment based on graphic neural network
Huang et al. Why does collaborative filtering work? transaction-based recommendation model validation and selection by analyzing bipartite random graphs
CN112085172A (en) Method and device for training graph neural network
CN109685537B (en) User behavior analysis method, device, medium and electronic equipment
CN110442788A (en) A kind of information recommendation method and device
CN112085615A (en) Method and device for training graph neural network
JP5967577B2 (en) Co-clustering apparatus, co-clustering method, program, and integrated circuit
CN109189935A (en) A kind of the APP propagation analysis method and system of knowledge based map
CN111966915A (en) Information inspection method, computer equipment and storage medium
CN109977979B (en) Method and device for locating seed user, electronic equipment and storage medium
CN110866698A (en) Device for assessing service score of service provider
Krishnamurthy et al. Online reputation and polling systems: Data incest, social learning, and revealed preferences
Ruhrländer et al. Improving box office result predictions for movies using consumer-centric models
Liu et al. Social network analysis using big data
CN105431874A (en) Computing social influenceability of products and social influencers
CN111957053A (en) Game player matching method and device, storage medium and electronic equipment
Kessentini et al. Improving web services design quality using heuristic search and machine learning
CN114418701A (en) Method and device for generating recommendation list, electronic equipment and storage medium
CN114897607A (en) Data processing method and device for product resources, electronic equipment and storage medium
CN111784091B (en) Method and device for processing information
CN111949860B (en) Method and apparatus for generating a relevance determination model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant