CN112035545A - Method for maximizing competitive influence considering non-active nodes and community boundaries - Google Patents

Method for maximizing competitive influence considering non-active nodes and community boundaries Download PDF

Info

Publication number
CN112035545A
CN112035545A CN202010891298.5A CN202010891298A CN112035545A CN 112035545 A CN112035545 A CN 112035545A CN 202010891298 A CN202010891298 A CN 202010891298A CN 112035545 A CN112035545 A CN 112035545A
Authority
CN
China
Prior art keywords
node
nodes
competitor
influence
community
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010891298.5A
Other languages
Chinese (zh)
Other versions
CN112035545B (en
Inventor
谢晓芹
李家辉
王巍
杨武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Engineering University
Original Assignee
Harbin Engineering University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Engineering University filed Critical Harbin Engineering University
Priority to CN202010891298.5A priority Critical patent/CN112035545B/en
Publication of CN112035545A publication Critical patent/CN112035545A/en
Application granted granted Critical
Publication of CN112035545B publication Critical patent/CN112035545B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Fuzzy Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • Medical Informatics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of social network analysis and data mining, and particularly relates to a method for maximizing competitive influence considering non-active nodes and community boundaries. The invention solves the problems of neglect of influence on the non-activated node, blockage of community homogeneity in an influence maximization algorithm based on the community and the like in the previous research. The invention provides a new propagation model CIMWIB under a competitive environment, which can effectively simulate the action of inactive users in a social network in information propagation. In order to solve the problem of blocking information propagation by community homogeneity, the invention provides a new node influence evaluation index BI. On the basis of the research, the invention provides a two-stage seed node selection algorithm CBCIM, which can help merchants to better popularize themselves in a competitive environment and obtain higher benefits by utilizing a public praise effect.

Description

Method for maximizing competitive influence considering non-active nodes and community boundaries
Technical Field
The invention belongs to the technical field of social network analysis and data mining, and particularly relates to a method for maximizing competitive influence considering non-active nodes and community boundaries.
Background
With the appearance of social networks such as facebook, WeChat, microblog and the like, more and more users like to share own opinions and other information on the social networks, so that research on network influence propagation becomes a hot problem of social media network analysis. The prior research has the following problems: 1) traditional community-based influence maximization algorithms ignore homogeneity in social networks. This can cause information barriers, so that information is easy to spread in the community but difficult to spread outside the community, and in the real world, fresh things are from different communities, so how to make information spread across different communities in the whole network to the maximum is a problem that needs to be considered; 2) the existing propagation model considers that the non-active node has no influence on the neighbor nodes, however, in reality, the non-active node plays a certain role in the information propagation process; 3) the existing method uses a greedy algorithm to select the seed nodes, and the algorithm efficiency is low. How to research a new seed node selection algorithm and improve the seed node selection efficiency on the premise of ensuring the accuracy remains to be considered.
Disclosure of Invention
The invention aims to solve the problem of blocking information propagation by community homogeneity, and provides a method for maximizing competitive influence considering non-active nodes and community boundaries.
The purpose of the invention is realized by the following technical scheme: the method comprises the following steps:
step 1: inputting a social network graph G which is (V, E), wherein V is a set of all nodes in the social network graph, and E is a set of all edges in the social network; determining a first competitor and a second competitor;
step 2: utilizing a community discovery algorithm to divide a social network into n communities C ═ C1,C2,...,Cn};
And step 3: calculating the BI value of each node and the weight of each edge in the social network graph;
for any node V ∈ V, the BI value of node V, BI (V), is:
BI(v)=outDegree(v)+(1-e-inf(v))+cimp(v)
Figure BDA0002657084630000011
wherein, outDegreee (v) is the out degree of the node v; inf (v) is the sum of the forces exerted by node v on all inactive neighbor nodes; cimp (v) is the connectivity between node v and other communities; num (v) is the number of neighbor nodes of the node v outside the community in which the node v is located;
Figure BDA0002657084630000012
and
Figure BDA0002657084630000013
respectively num (v) in the social network Gi) Corresponding node v when taking minimum and maximum valuesiThe number of neighbor nodes outside the community in which it is located;
any two nodes vaE.g. V and VbWeight of edges between e and V
Figure BDA0002657084630000021
Comprises the following steps:
Figure BDA0002657084630000022
wherein nb (v)a) And nb (v)b) Are respectively node vaAnd node vbAll neighbor node sets of (2);
and 4, step 4: screening out an initial seed set S of a first competitor from the set V by utilizing a seed node selection algorithm1,S1The rest nodes except the activated node are all nodes in an inactivated state to form a node set S3
Figure BDA0002657084630000023
S3∩S1=φ;
And 5: calculating a set of seeds S for a second competitor2
Step 5.1: set of slave nodes S3In the node v with the largest BI valuejJoining to seed set S2Performing the following steps;
step 5.2: based on seed set S2The influence is propagated in the social network diagram G, and the set S is updated3The state of the middle node; the method for updating the state specifically comprises the following steps:
step 5.2.1: if not, the node v is activatedsNode v of the incoming edge neighbor node pairsIs reached at node vsActivation threshold of
Figure BDA0002657084630000024
Namely, it is
Figure BDA0002657084630000025
Then node vsChanging from an inactive state to a thinking state; wherein, aci (v)s) Is a node vsThe sum of the influence of all activated nodes in the edge-entering neighbor nodes; ini (v)s) Is a node vsThe sum of the influence of all the inactivated nodes in the edge-entering neighbor nodes;
step 5.2.2: node v in thought state after d time stepssEntering an activated state and selecting whether to become a follower of the second competitor;
if it is
Figure BDA0002657084630000026
Then node vsBecoming a follower of the second competitor; otherwise, node vsBecoming a follower of the first competitor;
wherein the content of the first and second substances,
Figure BDA0002657084630000027
for the follower of the first competitor, at time t + d, to node vsThe sum of the influences of (a);
Figure BDA0002657084630000028
node v at time t + d for a follower of a second competitorsThe sum of the influences of (a);
Figure BDA0002657084630000029
Figure BDA00026570846300000210
wherein the content of the first and second substances,
Figure BDA00026570846300000211
node v at time t + dsThe node set of followers of the first competitor in the edge-entering neighbor nodes;
Figure BDA0002657084630000031
node v at time t + dsThe node set of followers of the second competitor in the edge-entering neighbor nodes; a. thet+d(vs) Node v at time t + dsThe set of nodes in an inactivated state in the edge-entering neighbor nodes;
step 5.3: update set S3The BI values of all inactive nodes in the cluster;
step 5.4: the steps 5.1 to 5.3 are repeated k times,
Figure BDA0002657084630000032
c is a set heuristic parameter;
step 5.5: if f (S)2)>f(S1) Then the seed set S of the second competitor is output2Ending the calculation; otherwise, executing step 5.6; f (-) is a measure of edge influence, f (S)1) And f (S)2) Respectively represent a set S1And S2Scope of influence on social networking graph G;
step 5.6: calculating each inactive node joining set S2Adding the node with the maximum corresponding edge influence into the set S2Performing the following steps;
Figure BDA0002657084630000033
step 5.7: updating the states of all the nodes in the social network graph G, and returning to the step 5.5;
step 6: outputting a set S of seeds of a second competitor2I.e., targeted advertising promoting users.
The invention has the beneficial effects that:
the invention provides a competition influence maximization method considering an inactive node and a community boundary, aiming at the problems that influence on the inactive node is neglected in the previous research, homogeneity of a community in an influence maximization algorithm based on the community can block information transmission and the like. The invention provides a new propagation model CIMWIB under a competitive environment, which can effectively simulate the action of inactive users in a social network in information propagation. In order to solve the problem of blocking information propagation by community homogeneity, the invention provides a new node influence evaluation index BI. On the basis of the research, the invention provides a two-stage seed node selection algorithm CBCIM, which can help merchants to better popularize themselves in a competitive environment and obtain higher benefits by utilizing a public praise effect.
Drawings
FIG. 1 is a graph showing the results of comparative experiments between the CIMWIB model and the conventional model DCM.
FIG. 2 is an exemplary segment diagram of a social networking graph structure.
FIG. 3 is a graph of the results of the CBCIM algorithm on three real data sets (assuming that the HPG method is used to select the seed node of competitor 1).
FIG. 4 is a flow chart of a two-stage seed node selection algorithm CBCIM.
FIG. 5 is a pseudo code diagram of the overall process of the present invention.
FIG. 6 is a notation table of symbolic meanings in the present invention.
FIG. 7 is a pseudo code diagram of a CBCIM seed node selection algorithm in the present invention.
FIG. 8 is a pseudo code diagram of an influence propagation algorithm ActiveNodes in the present invention.
FIG. 9 is a table of experimental results in examples of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The invention overcomes the defects in the prior art and provides a competition influence optimization method considering the non-active nodes and the community boundary. The invention provides a new propagation model CIMWIB under a competitive environment, which can effectively simulate the function of inactive users in information propagation in a social network. In order to solve the problem of blocking information propagation by community homogeneity, the invention provides a new index for evaluating node influence. Then, a two-stage seed node selection algorithm CBCIM is provided on the basis. The algorithm can help the traders to better popularize themselves in a competitive environment and achieve higher benefits by utilizing the public praise effect.
As shown in fig. 5, a method for maximizing competitive power considering inactive nodes and community boundaries includes the following steps:
step 1: acquiring social network data, wherein a graph G (V, E) represents a social network, V represents a set of nodes in the network, and E represents a set of edges in the network;
step 2: utilizing a community discovery algorithm to divide a social network into a plurality of communities C ═ { C1,C2,...,Cn};
And step 3: preprocessing the data to obtain the degree of each node and the initial boundary influence BI value of each node;
and 4, step 4: screening out an initial seed set S of a first competitor from the set V by utilizing the existing seed node selection algorithm1,S1The rest nodes except the activated node are all nodes in an inactivated state to form a node set S3
Figure BDA0002657084630000041
S3∩S1=φ;
And 5: drawing G, S1And the divided community C ═ { C ═ C1,C2,...,CnAs input, the heuristic phase of the CBCIM algorithm is performed and loops
Figure BDA0002657084630000042
Then, each time selecting the node with the maximum BI value to join in S2Performing the following steps;
step 6: executing a greedy stage of the CBCIM algorithm, and adding the node v with the maximum edge influence into the S after each execution2Updating the states of all nodes, and repeating the steps until S2Has an influence range greater than S1Of (c), i.e. f (S)2)>f(S1) (ii) a (edge impact can be measured by the number of nodes affected or other methods.)
And 7: outputting the seed set S of the competitor 22. This collection can serve as a targeted advertising campaign for users in subsequent applications.
The features in the above steps are mainly expressed in the following aspects:
(1) in step 2, when calculating the value of the boundary influence BI, the connection relationship between the structure inside the community and the community is considered, and for any node V ∈ V, the BI value BI (V) of the node V is:
BI(v)=outDegree(v)+(1-e-inf(v))+cimp(v)
wherein: outDegreee (v) is the degree of departure of node v, inf (v) is the sum of the influence that node v exerts on all inactive neighbor nodes if node v is in community CiIn (d), then num (v) indicates that node v is in community CiThe number of the other neighbor nodes, cimp (v), represents the connectivity between the node v and other communities, and the calculation formula of cimp (v) is as follows:
Figure BDA0002657084630000051
where num (v) indicates that node v is at CiThe number of neighbor nodes outside the community. V represents all nodes in graph G.
Figure BDA0002657084630000052
And
Figure BDA0002657084630000053
respectively num (v) in the social network Gi) Corresponding node v when taking minimum and maximum valuesiThe number of neighbor nodes outside the community in which it is located.
(2) In the step 5, when the seed set of the competitor 2 is selected, in the process of influence propagation, a new model cimpib is used as a propagation model, and the cimpib model considers that an inactivated node has a weak influence on an activated node.
The new model CIMWIB activates inactive nodes according to the following steps:
1) if not, the node v is activatedsNode v of the incoming edge neighbor node pairsIs reached at node vsActivation threshold of
Figure BDA0002657084630000054
Namely, it is
Figure BDA0002657084630000055
Then node vsChanging from an inactive state to a thinking state; wherein, aci (v)s) Is a node vsThe sum of the influence of all activated nodes in the edge-entering neighbor nodes; ini (v)s) Is a node vsThe sum of the influence of all the inactivated nodes in the edge-entering neighbor nodes;
2) node v in thought state after d time stepssEntering an activated state and selecting whether to become a follower of the second competitor;
if it is
Figure BDA0002657084630000056
Then node vsBecoming a follower of the second competitor; otherwise, node vsBecoming a follower of the first competitor;
wherein the content of the first and second substances,
Figure BDA0002657084630000057
is as followsA follower of a competitor pairs node v at time t + dsThe sum of the influences of (a);
Figure BDA0002657084630000058
node v at time t + d for a follower of a second competitorsThe sum of the influences of (a);
Figure BDA0002657084630000059
Figure BDA00026570846300000510
wherein the content of the first and second substances,
Figure BDA0002657084630000061
node v at time t + dsThe node set of followers of the first competitor in the edge-entering neighbor nodes;
Figure BDA0002657084630000062
node v at time t + dsThe node set of followers of the second competitor in the edge-entering neighbor nodes; a. thet+d(vs) Node v at time t + dsThe set of nodes in an inactivated state in the edge-entering neighbor nodes;
(3) as shown in fig. 6, the meaning of the symbol used by the CBCIM algorithm mentioned in step 5 is shown, and the specific implementation steps are as follows.
The aim of the invention is to select as few seed nodes as possible for the competitor 2 as quickly as possible given the knowledge of the initial seed nodes of the competitor 1, and finally to achieve a wider propagation range than the competitor 1.
In the conventional CI2 algorithm, the most influential node in each community is added to the candidate set each time, and thus there are two problems in selecting the seed node: 1) the node with the largest influence in each community cannot represent the global optimum, so that the selected node has certain limitation and can influence the propagation of information among communities; 2) within each community at a timeGreedy algorithm results in low algorithm operation efficiency. Aiming at the problems, the invention improves the CI2 algorithm and provides a community-based two-stage seed node selection algorithm CBCIM. The algorithm mainly comprises two steps: 1) the boundary influence BI proposed by the invention measures the influence of the node from multiple angles. The influence of the degree of any node v on v is the most, and the sum of the influence of v on all the unactivated nodes and the influence of the connectivity of v between communities on v are slightly weaker. Therefore, the research comprehensively considers the three evaluation indexes, gives the definition of BI, comprehensively evaluates the influence of nodes from the local aspect and the global aspect, and breaks through the barrier of information propagation in the traditional method; 2) the CBCIM is divided into a heuristic stage and a greedy stage, and the heuristic stage circularly selects the node with the largest BI value to add into the target set S2And in the greedy stage, each inactive node is calculated each time to join the set S2Adding the node with the maximum corresponding edge influence into the set S2In
Figure BDA0002657084630000063
The above process continues until S2Has an influence range greater than S1Of (c), i.e. f (S)2)>f(S1). It is to be noted that S is added2Is not included in S1In (1). Experiments show that the CBCIM algorithm has higher efficiency at the greedy phase than the corresponding phase of the CI2 algorithm. The influence range of the node v is the number of the nodes which can be influenced and activated by the node v, and the influence range of the set S is the sum of the number of the nodes which can be influenced and activated by all the nodes in the set S.
The CBCIM algorithm combines G ═ V, E, the seed set S of competitor 11And the divided communities C ═ { C ═ C1,C2,...,CnAnd
Figure BDA0002657084630000064
as input, where k represents the number of loop steps in the heuristic stage, c represents the heuristic factor (c e 0,1]). In the greedy stage of the algorithm, the CBCIM greedy selects the node with the largest edge influence in the whole social network to be added into the S each time2In (1), up to f (S)2)>f(S1)。
In the heuristic stage, the node with the maximum BI value is selected each time to join the set S2Then based on S1S2, the influence and the state update are carried out on the inactive nodes in the social media network, the process lasts for k times, and after the heuristic stage is finished, a large number of nodes in the network are activated; and the greedy stage executes a greedy algorithm based on the inactive nodes in the current network, and after the activation of the heuristic stage, the number of the nodes in the inactive state in the current network is less than that of the original network data, so that the CBCIM algorithm has higher operating efficiency in the greedy stage. The time complexity of the CBCIM algorithm is much less than that of the CI2 algorithm.
The specific implementation steps of the two-stage seed node selection algorithm CBCIM are as follows:
step 5.1: set the initial target S2Setting to be empty;
steps 5.2-5.5 are heuristic stages of the CBCIM algorithm:
step 5.2: set of slave nodes S3In the node v with the largest BI valuejJoining to seed set S2Performing the following steps;
step 5.3: the influence propagation is carried out by using an algorithm of ActiveNodes, the ActiveNodes algorithm takes a social network graph G, a seed set S1 of a competitor 1 and a seed set S2 of a competitor 2 as input, and outputs a marked social network graph G ((V, Vstate), E), wherein V represents a set of nodes in the graph G, E represents a set of edges between the nodes, Vstate represents the state of the nodes, the final state of the nodes can be active +, active-and inactive, respectively represent a support competitor 1, a support competitor 2 and a neutral state, and the set S is updated3The state of the middle node;
step 5.4: update set S3The BI values of all inactive nodes in the cluster;
step 5.5: step 5.2 to step 5.4 are repeated k times,
Figure BDA0002657084630000071
c is a set heuristic parameter; the influence rule experiments of different heuristic factors carried out in the research verify the influence of the heuristic factors on the experimental results, and the inventor finds that when the value of c is [0.4,0.6 ]]When the CBCIM is used, a good effect can be achieved.
Steps 5.6-5.9 are greedy stages of the CBCIM algorithm.
Step 5.6: if f (S)2)>f(S1) Then the seed set S of the second competitor is output2Ending the calculation; otherwise, executing step 5.7; f (-) is a measure of edge influence, f (S)1) And f (S)2) Respectively represent a set S1And S2Scope of influence on social networking graph G;
step 5.7: calculating each inactive node joining set S2Adding the node with the maximum corresponding edge influence into the set S2Performing the following steps;
Figure BDA0002657084630000072
in the formula, argmax (×) represents the maximum value taken (×);
step 5.8, updating the states of all nodes by using an ActiveNodes algorithm;
the above-mentioned 5.6-5.8 processes continue until set S2Is larger than the set S1The range of influence of (c).
Step 5.9: output seed set S2And the algorithm ends.
The ActiveNodes algorithm adopts CIMWIB as a propagation model, and the specific steps are shown in detail in FIG. 8.
As shown in FIG. 8, the states of the seed nodes of the competitor 1 and the competitor 2 are set in the step 1-4 of the algorithm 2, and the state of the seed node in the competitor 1 is set to active+The state of the seed node in the competitor 2 is active-(ii) a Step 5, setting an algorithm stopping parameter; 6-17, propagating influence, in 11-13, if some nodes reach activation condition, firstly changing to thining state, after d time steps, according to 14-17 determining to change to active+Or active-. The algorithm terminates when none of the inactive nodes in the network have reached the active condition.
In addition, studies have also discussed the effect of the heuristic on the experimental results by varying the c-value for multiple sets of comparative experiments. We have found that: 1) when S is1As such, the CBCIM algorithm can in most cases get a smaller target set S than the CI2 algorithm except for the case where c is 12. When c is 1, all nodes are statically selected according to the BI value, the information propagation process is omitted, and therefore S is2More seed nodes are needed to fight S1. For example when | S1When the parameter c is 1 and 40, the CBCIM algorithm selects the obtained S2The number of nodes in the tree is 24.5 percent more than that selected by the CI2 algorithm; 2) at S1When the number of medium nodes is different, the minimum size of S can be obtained when c is 0.52. When | S1When | ═ 50 and parameter c ═ 0.5, the CBCIM algorithm selects S obtained2The number of nodes in (1) is 11.1% less than that selected by the CI2 algorithm. Therefore, in the following experiments, only the case where c is 0.5 is considered in the present study.
Example 1:
data preprocessing
This embodiment uses three real datasets facebook _ combined, Wiki _ vote, NetHEPT. First we use the Jaccard similarity to compute the weight of each edge. The Jaccard coefficient is used to compare similarity and difference between limited sample sets. Such as similarity between sets, string similarity, similarity of object detection, document duplication, etc. The calculation mode of the Jaccard coefficient is that the ratio of the intersection number and the union number is as follows:
Figure BDA0002657084630000081
the higher the Jaccard value, the higher the similarity, and when both sets a and B are empty, J (a, B) is 1. The invention utilizes the Jaccard coefficient to calculate the weight of each edge, namely:
Figure BDA0002657084630000082
wherein p isu,vRepresents the influence of node u on node v, nb (u) represents the set of all neighbors of u.
Community discovery
The community partitioning method based on the similarity of the common neighbor nodes utilizes a community discovery algorithm CDSCN based on the similarity of the common neighbor nodes to partition communities, and utilizes the community modularity Q to measure the relative quality of the community partitioning. In the method for maximizing the competitive influence, other non-overlapping community discovery algorithms can be adopted to divide the communities. The value of the community modularity of the real social network is generally 0.3-0.7, and the larger the value of Q is, the more obvious the community structure is. The specific description of the algorithm is as follows:
step 1, forming an initial community.
Step 1.1, calculating an n multiplied by n similarity matrix S (n is the number of network nodes) according to the following formula for a given social network, and initially dividing each point into independent communities;
Figure BDA0002657084630000091
wherein: x and Y respectively represent in the social network, take node X and Y as the core, combine all neighbour's nodes to form one and take this node X or Y as the core star neighborhood subgraph. In the formula, H (X) represents a set of all nodes in the satellite domain network X, including a central node X and all neighbor nodes thereof; h (X | Y) represents a set of all nodes belonging to the star domain network X but not to the star domain network Y; h (X, Y) represents the set of all nodes included in the star field networks X and Y; i (X, Y) represents all common neighbor node sets of the star domain networks X and Y, i.e., all common neighbor node sets of nodes X and Y.
And 1.2, selecting the node i with the largest influence as the current node according to the local influence table of the nodes. In the invention, the local influence of one node is evaluated by using the neighbor nodes and the secondary neighbor nodes.
Step 1.3, selecting a node j most similar to the node i according to the similarity matrix S of the nodes, judging whether the node j belongs to the community where the node i is located, and if so, jumping back to the step 1.2 to execute; if not, merging communities in which the node i and the node j are located, and setting the node j as a current node;
and 1.4, stopping clustering after all the nodes are accessed, and obtaining initial community division.
Step 2 merging communities
Step 2.1, calculating the modularity of the current community according to the following formula and recording the modularity as Qc
Figure BDA0002657084630000092
In the formula eijRepresenting the ratio of the number of internal edges in community i and community j to the total number of edges in social network G, aiRepresenting the ratio of the number of all edges associated with a point inside community i to the total number of edges.
Step 2.2, based on the community structure divided in the step 1, calculating the maximum community modularity which can be obtained after any two communities are combined to be Qm
Step 2.3 if Qm>QcThen Q is assertedc=QmAnd merging the two communities;
step 2.4-repeat Steps 2.2 and 2.3 until Qm≤QcAnd obtaining the final community division.
CIMWIB model introduction
In consideration of the role of the inactive node in information propagation, the invention provides a new propagation model CIMWIB in a competitive environment.
The model is based on a linear threshold model LT, when a node tries to activate an inactive node without success, the process is not abandoned but accumulated, as shown in FIG. 1, a new propagation model CIMWIB is compared with a traditional propagation model DCM on a data set facebook _ combined, and since CIMWIB takes the effect of the inactive node in information propagation into consideration, some nodes at the activation edge are successfully activated, the simulation of information propagation under the CIMWIB model is more accurate, and the information propagation range is wider.
In CIMWIB, each node can have four states inactive, active+And active-
When the sum of the influence of the neighbor nodes of the node v on v reaches the threshold theta of vvWhen the node v changes from inactive state to inactive state. The node v continues to be in the ringing state for d time steps and then determines to be in the active state according to the following rule+Or active-
Figure BDA0002657084630000101
Figure BDA0002657084630000102
And
Figure BDA0002657084630000103
representing the sum of the influence of followers of competitor 1 and competitor 2, respectively, on node v at time t + d.
As shown in FIG. 2, if the node v is determined according to the following conventional model, it will be changed to active+However, according to the CIMWIB model provided by the invention, the node v becomes the active state-. The conventional model activation method is as follows:
Figure BDA0002657084630000104
the effects of the present invention can be further illustrated by the following experiments:
(1) comparative experiment of CIMWIB model and traditional DCM model
The CBCIM algorithm is operated on a DCM model and a CIMWIB model by using a data set facebook _ combined, and S calculated by the HPG algorithm is assumed to be used in the two models1As an input, the range of influence in this experiment is expressed as the mostThe number of active nodes (including active) in the final social network+And active-). The experimental results are shown in figure 1.
As can be seen from fig. 1: 1) according to the DCM model or the CIMWIB model, when the number of the seed nodes of the competitor 1 is continuously increased, the influence propagation range is larger and larger; 2) under the same condition, the information is transmitted according to the CIMWIB model, and the influence range larger than that of the DCM model can be achieved. At | S1When | ═ 10, the difference is maximal, and the influence range of information propagation according to the CIMWIB model is 61.2% higher than that of the DCM model.
The experimental result is analyzed, and the influence propagation range of the same initial seed node under the CIMWIB model is wider under the same social network diagram, because a plurality of nodes at the activation edge are changed into an active state due to the weak influence of the inactive nodes, and the influence of the inactive nodes becomes the 'last straw' of the node state conversion. In real life, when a user is hesitant to buy a certain commodity, the CIMWIB model can better help the users to make decisions; from the perspective of the merchant, the merchant can select potential users which cannot be found in the traditional model to promote commodities, and a more favorable selling result is achieved.
(2) Comparison experiment of seed node selection algorithm CBCIM algorithm and existing CI2 algorithm:
compared with the existing algorithm CI2, the CBCIM algorithm provided by the invention has higher efficiency in the heuristic stage. Three real datasets were collected by this study on the Stanford university data collection website: facebook _ combined, Wiki _ vote, NetHEPT, then multiple sets of experiments were designed and implemented. The CBCIM algorithm provided by the invention can be obtained by comparing and analyzing experimental data, and the operation efficiency and the influence range are improved.
To facilitate comparison of the CBCIM algorithm with the existing CI2 algorithm, we split the CI2 algorithm into two phases. AI represents the average influence and is used as a tool for comparing the CBCIM algorithm to the CI2 algorithm. For example, if the initial seed node number is 10 and 50 nodes are finally activated in the social network diagram G, the AI value is 5.0 (50/10).
As can be seen in FIG. 9, the AI values for the heuristic stages of the CBCIM algorithm are smaller than those for the corresponding stage 1 of the CI2 algorithm, while the AI values for the greedy stage are larger than those for the corresponding stage 2. Analyzing the experimental data, it can be seen that if the heuristic factor c is properly selected, the CBCIM algorithm can achieve better experimental results than CI 2.
The reason why the AI value of the heuristic stage of the CBCIM algorithm is smaller than that of the corresponding stage 1 of the CI2 algorithm is that the seed nodes are statically selected by the heuristic stage CBCIM according to the value of the boundary influence BI, and the information transmission process is not considered dynamically; the AI value of the greedy phase is greater than that of the corresponding phase 2 mainly because although the nodes selected in the CBCIM heuristic phase cannot activate more nodes, the nodes accumulate a large amount of boundary influences, and in the greedy phase, after some nodes are activated, the boundary influences accumulated in the heuristic phase are rapidly activated, so that more nodes can be activated in the greedy phase.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (1)

1. A method for maximizing competitive power in consideration of inactive nodes and community boundaries, comprising the steps of:
step 1: inputting a social network graph G which is (V, E), wherein V is a set of all nodes in the social network graph, and E is a set of all edges in the social network; determining a first competitor and a second competitor;
step 2: utilizing a community discovery algorithm to divide a social network into n communities C ═ C1,C2,...,Cn};
And step 3: calculating the BI value of each node and the weight of each edge in the social network graph;
for any node V ∈ V, the BI value of node V, BI (V), is:
BI(v)=outDegree(v)+(1-e-inf(v))+cimp(v)
Figure FDA0002657084620000011
wherein, outDegreee (v) is the out degree of the node v; inf (v) is the sum of the forces exerted by node v on all inactive neighbor nodes; cimp (v) is the connectivity between node v and other communities; num (v) is the number of neighbor nodes of the node v outside the community in which the node v is located;
Figure FDA0002657084620000012
and
Figure FDA0002657084620000013
respectively num (v) in the social network Gi) Corresponding node v when taking minimum and maximum valuesiThe number of neighbor nodes outside the community in which it is located;
any two nodes vaE.g. V and VbWeight of edges between e and V
Figure FDA0002657084620000014
Comprises the following steps:
Figure FDA0002657084620000015
wherein nb (v)a) And nb (v)b) Are respectively node vaAnd node vbAll neighbor node sets of (2);
and 4, step 4: screening out an initial seed set S of a first competitor from the set V by utilizing a seed node selection algorithm1,S1The rest nodes except the activated node are all nodes in an inactivated state to form a node set S3
Figure FDA0002657084620000016
S3∩S1=φ;
And 5: calculating a set of seeds S for a second competitor2
Step 5.1: set of slave nodes S3In the node v with the largest BI valuejJoining to seed set S2Performing the following steps;
step 5.2: based on seed set S2The influence is propagated in the social network diagram G, and the set S is updated3The state of the middle node; the method for updating the state specifically comprises the following steps:
step 5.2.1: if not, the node v is activatedsNode v of the incoming edge neighbor node pairsIs reached at node vsActivation threshold of
Figure FDA0002657084620000017
Namely, it is
Figure FDA0002657084620000018
Then node vsChanging from an inactive state to a thinking state; wherein, aci (v)s) Is a node vsThe sum of the influence of all activated nodes in the edge-entering neighbor nodes; ini (v)s) Is a node vsThe sum of the influence of all the inactivated nodes in the edge-entering neighbor nodes;
step 5.2.2: node v in thought state after d time stepssEntering an activated state and selecting whether to become a follower of the second competitor;
if it is
Figure FDA0002657084620000021
Then node vsBecoming a follower of the second competitor; otherwise, node vsBecoming a follower of the first competitor;
wherein the content of the first and second substances,
Figure FDA0002657084620000022
for the follower of the first competitor, at time t + d, to node vsThe sum of the influences of (a);
Figure FDA0002657084620000023
node v at time t + d for a follower of a second competitorsThe sum of the influences of (a);
Figure FDA0002657084620000024
Figure FDA0002657084620000025
wherein the content of the first and second substances,
Figure FDA0002657084620000026
node v at time t + dsThe node set of followers of the first competitor in the edge-entering neighbor nodes;
Figure FDA0002657084620000027
node v at time t + dsThe node set of followers of the second competitor in the edge-entering neighbor nodes; a. thet+d(vs) Node v at time t + dsThe set of nodes in an inactivated state in the edge-entering neighbor nodes;
step 5.3: update set S3The BI values of all inactive nodes in the cluster;
step 5.4: the steps 5.1 to 5.3 are repeated k times,
Figure FDA0002657084620000029
c is a set heuristic parameter;
step 5.5: if f (S)2)>f(S1) Then the seed set S of the second competitor is output2Ending the calculation; otherwise, executing step 5.6; f (-) is a measure of edge influence, f (S)1) And f (S)2) Respectively represent a set S1And S2Scope of influence on social networking graph G;
step 5.6: calculating each inactive node joining set S2Adding the node with the maximum corresponding edge influence into the set S2Performing the following steps;
Figure FDA0002657084620000028
step 5.7: updating the states of all the nodes in the social network graph G, and returning to the step 5.5;
step 6: outputting a set S of seeds of a second competitor2I.e., targeted advertising promoting users.
CN202010891298.5A 2020-08-30 2020-08-30 Competition influence maximization method considering non-active node and community boundary Active CN112035545B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010891298.5A CN112035545B (en) 2020-08-30 2020-08-30 Competition influence maximization method considering non-active node and community boundary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010891298.5A CN112035545B (en) 2020-08-30 2020-08-30 Competition influence maximization method considering non-active node and community boundary

Publications (2)

Publication Number Publication Date
CN112035545A true CN112035545A (en) 2020-12-04
CN112035545B CN112035545B (en) 2023-12-19

Family

ID=73587481

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010891298.5A Active CN112035545B (en) 2020-08-30 2020-08-30 Competition influence maximization method considering non-active node and community boundary

Country Status (1)

Country Link
CN (1) CN112035545B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112929445A (en) * 2021-02-20 2021-06-08 山东英信计算机技术有限公司 Recommendation system-oriented link prediction method, system and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951524A (en) * 2017-03-21 2017-07-14 哈尔滨工程大学 Overlapping community discovery method based on node influence power
US20180018709A1 (en) * 2016-05-31 2018-01-18 Ramot At Tel-Aviv University Ltd. Information spread in social networks through scheduling seeding methods
US20180341696A1 (en) * 2017-05-27 2018-11-29 Hefei University Of Technology Method and system for detecting overlapping communities based on similarity between nodes in social network
CN111445291A (en) * 2020-04-01 2020-07-24 电子科技大学 Method for providing dynamic decision for social network influence maximization problem

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180018709A1 (en) * 2016-05-31 2018-01-18 Ramot At Tel-Aviv University Ltd. Information spread in social networks through scheduling seeding methods
CN106951524A (en) * 2017-03-21 2017-07-14 哈尔滨工程大学 Overlapping community discovery method based on node influence power
US20180341696A1 (en) * 2017-05-27 2018-11-29 Hefei University Of Technology Method and system for detecting overlapping communities based on similarity between nodes in social network
CN111445291A (en) * 2020-04-01 2020-07-24 电子科技大学 Method for providing dynamic decision for social network influence maximization problem

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵富强;杨贵军;王双琳;何丽;: "代数连通性在社会网络影响力传播最大化中的应用研究", 计算机应用研究, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112929445A (en) * 2021-02-20 2021-06-08 山东英信计算机技术有限公司 Recommendation system-oriented link prediction method, system and medium
CN112929445B (en) * 2021-02-20 2022-06-07 山东英信计算机技术有限公司 Recommendation system-oriented link prediction method, system and medium

Also Published As

Publication number Publication date
CN112035545B (en) 2023-12-19

Similar Documents

Publication Publication Date Title
Eiben et al. Parameter tuning for configuring and analyzing evolutionary algorithms
Gong et al. Community detection in networks by using multiobjective evolutionary algorithm with decomposition
Panagant et al. Truss topology, shape and sizing optimization by fully stressed design based on hybrid grey wolf optimization and adaptive differential evolution
Jokar et al. Community detection in social networks based on improved label propagation algorithm and balanced link density
García et al. A memetic algorithm for evolutionary prototype selection: A scaling up approach
Jiang et al. An efficient evolutionary user interest community discovery model in dynamic social networks for internet of people
Gong et al. Identification of multi-resolution network structures with multi-objective immune algorithm
Aghaalizadeh et al. A three-stage algorithm for local community detection based on the high node importance ranking in social networks
Li et al. A link clustering based memetic algorithm for overlapping community detection
Heloulou et al. Automatic multi-objective clustering based on game theory
İni̇k et al. MODE-CNN: A fast converging multi-objective optimization algorithm for CNN-based models
Allesina Predicting trophic relations in ecological networks: a test of the allometric diet breadth model
Wang et al. Influential node identification by aggregating local structure information
CN112148991A (en) Social network node influence recommendation method for fusion degree discount and local node
Cheng et al. Multi-objective evolutionary algorithm for optimizing the partial area under the ROC curve
Dong et al. TSIFIM: A three-stage iterative framework for influence maximization in complex networks
Guo et al. Network representation learning based on community-aware and adaptive random walk for overlapping community detection
Huang et al. Identifying influential individuals in microblogging networks using graph partitioning
Ali et al. Detection of gene ontology clusters using biclustering algorithms
Khavandi et al. Maximizing the Impact on Social Networks using the Combination of PSO and GA Algorithms
Falanti et al. POPNASv3: A pareto-optimal neural architecture search solution for image and time series classification
Dabaghi-Zarandi et al. Community detection in complex network based on an improved random algorithm using local and global network information
CN112035545A (en) Method for maximizing competitive influence considering non-active nodes and community boundaries
CN112148989A (en) Social network node influence recommendation system based on local nodes and degree discount
Goswami et al. Sparsity of weighted networks: Measures and applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant