CN114826678A - Network propagation source positioning method based on seepage process and evolutionary computation - Google Patents
Network propagation source positioning method based on seepage process and evolutionary computation Download PDFInfo
- Publication number
- CN114826678A CN114826678A CN202210321271.1A CN202210321271A CN114826678A CN 114826678 A CN114826678 A CN 114826678A CN 202210321271 A CN202210321271 A CN 202210321271A CN 114826678 A CN114826678 A CN 114826678A
- Authority
- CN
- China
- Prior art keywords
- node
- nodes
- network
- sequence
- propagation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 83
- 208000015181 infectious disease Diseases 0.000 claims abstract description 67
- 238000011084 recovery Methods 0.000 claims description 19
- 238000004891 communication Methods 0.000 claims description 13
- 230000002441 reversible effect Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 claims description 9
- 238000005295 random walk Methods 0.000 claims description 9
- 238000004880 explosion Methods 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 241000700605 Viruses Species 0.000 claims description 4
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000010276 construction Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 230000002401 inhibitory effect Effects 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 7
- 230000004807 localization Effects 0.000 description 6
- 238000002474 experimental method Methods 0.000 description 4
- 230000036039 immunity Effects 0.000 description 4
- 238000013459 approach Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000002829 reductive effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 208000035473 Communicable disease Diseases 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003053 immunization Effects 0.000 description 1
- 238000002649 immunization Methods 0.000 description 1
- 230000005764 inhibitory process Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000005325 percolation Methods 0.000 description 1
- 230000000452 restraining effect Effects 0.000 description 1
- 238000010187 selection method Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
- H04L63/126—Applying verification of the received information the source of the received data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0823—Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/04—Processing captured monitoring data, e.g. for logfile generation
- H04L43/045—Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1441—Countermeasures against malicious traffic
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Hardware Design (AREA)
- General Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention provides a network propagation source positioning method based on a seepage process and evolutionary computation. Firstly, inputting a network data set to obtain the node and edge attribute of the network data set, and initializing propagation model parameters; then, based on the seepage and evolutionary computation correlation theory and method, adopting an AEF algorithm to iteratively update an initial observation point sequence to obtain a final sequence, and arranging observation points in a certain proportion in the network according to the sequence; secondly, randomly selecting a propagation source in an infection state to start a propagation process, and stopping the propagation process until detection reaches a certain outbreak range; searching a target connection sheet according to the information captured by the observation point to obtain a sub-image, and starting an RIS algorithm on the sub-image to detect a propagation source; finally, the neighbors within the fixed hop count of the detected propagation source are added into a candidate set, and the candidate set can be used as a range for subsequently searching the real propagation source. The invention can realize the rapid propagation source positioning of the large-scale network, thereby controlling the malicious information propagation in time and reducing the loss caused by the malicious information propagation.
Description
Technical Field
The invention belongs to the technical field of network information propagation, and particularly relates to a network propagation source positioning method based on a seepage process and evolutionary computation.
Background
Various complex networks such as social networks, power networks, road traffic networks and the like exist in the current world, and the high interconnectivity and cohesion of the complex networks facilitate information exchange between nodes and increase the chances of various risks in the networks. For example, rumors spread rapidly in social networks, computer viruses infect large numbers of hosts in a short time, and outbreaks of infectious diseases among people. Therefore, the area where the propagation source is located is quickly positioned, and the influence brought by the point spread of the propagation source is controlled, so that the method has very important research value and significance.
The main task of the propagation source localization problem is to design an estimator that can infer the propagation source, where the most desirable estimator is one that can find the true source. However, due to the complexity of the node communication pattern and the uncertainty of the diffusion model, even if the underlying network is a tree network, the designed estimator is almost impossible to infer the true source in theory. Thus, the error distance is developed and used as a criterion to evaluate the performance of an estimator: one estimator is said to be better than the other if the corresponding inferred source is closer in distance to the real source. Based on different assumptions of known information, researchers have developed different methods to minimize the error distance. However, in practice, we face the problems of: after obtaining an estimator with a smaller error distance, how to trace to the source? Indeed, one can perform more intensive detection of the estimated vicinity of the propagation source, eventually achieving the localization of the real source. In this scenario, for a network with a relatively simple structure, a small error distance usually indicates that we only need to perform further more intensive detection on a small number of nodes to find out the true propagation source. However, since most real world networks are heterogeneous, i.e., where a node may be directly connected to a plurality of nodes, the size of the more densely detected nodes in the neighborhood may be proportional to the size of the network, which is obviously not feasible in practice.
To date, there has been a great deal of research directed to locating dissemination sources in complex networks, and more algorithms are proposed to detect dissemination sources in networks that carry false or malicious information. The algorithms for propagation source localization can be generally classified into three major categories: 1) the method is based on a complete observation graph, namely, a researcher obtains state information and infection time information of all nodes of a network to detect propagation sources, such as a rumor centrality, a minimum description length method and a source identification method of node dynamic ages. 2) The method based on network snapshot observation, that is, the condition that a researcher obtains the condition that each node in the network receives and spreads information in a unit time, is easier to satisfy compared with a completely observed graph. Such as the Jordan center method, dynamic message propagation method. 3) The method based on sensor observation is that researchers arrange a certain number of observation points in a network as sensors to acquire infection information of a specific node to detect a propagation source in the network. Pinto et al first proposed such a method in 2012, which is based on two assumptions, namely that the network propagation delay obeys gaussian distribution, and that the propagation path of information is a deep traversal tree with nodes as roots. And estimating a propagation source by using a maximum likelihood estimation method by monitoring the time of the initial change of the observation point state and the direction of an information source. The node centrality method is a feasible means for analyzing network attributes, and some algorithms also adopt various centrality methods to identify propagation sources, such as degree centrality, approach centrality and betweenness centrality.
The propagation source node detection problem is significant, but at present, some problems still exist. On one hand, most current methods are based on tree structure network design, and most networks in practice are complex networks. Therefore, the propagation source detection is directly performed on a general network by using or expanding a tree network-based method, and the problems of reduced detection efficiency, difficulty in ensuring accuracy and the like generally exist. On the other hand, most of the existing methods are designed for small networks, the computational complexity is high, and the methods are difficult to be practically applied to general large-scale networks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a network propagation source positioning method based on a seepage process and evolutionary computation. Firstly, inputting a network data set to obtain the node and edge attribute of the network data set, and initializing propagation model parameters; then, based on the seepage and evolutionary computation correlation theory and method, adopting an AEF algorithm to iteratively update an initial observation point sequence to obtain a final sequence, and arranging observation points in a certain proportion in the network according to the sequence; secondly, randomly selecting a propagation source in an infection state to start a propagation process, and stopping the propagation process until detection reaches a certain outbreak range; finding a target connection piece according to observation point capture information to obtain a sub-graph V' c In subfigure V' c The upper starting RIS algorithm detects the propagation source; finally, the neighbors within the fixed hop count of the detected propagation source are added into a candidate set, and the candidate set can be used as a range for subsequently searching the real propagation source. The invention can realize the rapid propagation source positioning of the large-scale network, thereby controlling the malicious information propagation in time and reducing the loss caused by the malicious information propagation.
A network propagation source positioning method based on a seepage process and evolutionary computation is characterized by comprising the following steps:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in the network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: constructing and obtaining an initial graph whole node sequence S by adopting a random ordering or node degree ordering method, updating the node sequence S by adopting an AEF algorithm, and selecting the updated node sequence according to the sequence from front to backUsing q nodes as observation points in proportion to form an observation point set O, marking the observation points on the network, recording the absolute time of the observation points infected, and randomly determining the proportion R from the observation point set O d =|O d I/O I observation points form a set O d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And step 3: at the moment t is 0, a propagation source V is randomly selected from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv Spreading virus to neighbor nodes in susceptible state, and simultaneously, the nodes in infected state can recover the rate gamma u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and are marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according toDetermination of communication piece c i Is limited byWherein u represents the set of removed nodes V r Arbitrary node in (1), v tableCommunication sheet c i Arbitrary node in (b), e uv Shows the infection pattern G I An edge connecting node u and node v; according toDetermining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i ;
And 5: selecting observation points with the earliest infected time to form an infected observation point subset O'; according to the formulaConstruct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,in subfigure V' c The propagation source is found and obtained by adopting the RIS algorithm
And 7: will propagate the sourceNeighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
Further, the specific process of updating the node sequence S by using the AEF algorithm described in step 2 is as follows:
step a: according toRandomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each segment sequence S k Updating in parallel according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the orderRepresents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequencesAll nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodesWherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate setsIs selected to satisfyWherein y is the candidate setXi (y) represents the size of the connection piece of the node y according toOrCalculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According toUpdate subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, ifThen S k =S′ k Let us orderReturning to step S1, whenThen, the sequence S obtained k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c | "/n max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s ,A splice sequence representing both sequences;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
Further, the specific process of finding the propagation source by using the RIS algorithm in step 6 is as follows:
step a: initializing the node set Lambda as an empty set; for subgrade V' c Let G ' (V ', E ') be its inverse network, satisfying | V ' | ═ V ' c And if edge e ba Is contained in subpicture V' c Middle, side e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating the random walk step length t' of the nodeOf medium to t' m Denotes the relative time of infection of node m, t' 0 Is a slave intervalThe random number of (a) and (b),has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
step e: repeating the steps b-d for T Λ And obtaining a final updated node set Lambda, wherein the node with the most occurrence times in the final updated node set Lambda is the propagation sourceT Λ Is taken as value of 10 6 。
The invention has the beneficial effects that: due to the adoption of a method based on the combination of the network seepage process and the evolutionary computation, the observation point sequence arranged in the network is optimized, and the connected piece model based on the observation point removal set in the network is inhibited, so that fewer observation points can be set to realize the positioning of the propagation source, and the network protection cost is reduced; by combining the relevant strategy of the network immunity problem, a few nodes are isolated by using the observation point information to control the spread of epidemic diseases, and the spread source positioning range and the real spread source searching range are reduced. The invention provides technical support for restraining malicious propagation under resource limitation, and can be used for solving the problem of propagation source positioning in a large network.
Drawings
FIG. 1 is a flow chart of the network propagation source positioning method based on the seepage process and evolutionary computation according to the present invention;
FIG. 2 is a schematic diagram of the process of determining a propagation source localization sub-graph according to the present invention;
in the figure, (a) -infection profile obtained by the transmission process; (b) figure 1 is illustrated for a sub-graph of infection graph (a); (c) figure 2 is illustrated for a sub-graph of infection graph (a);
FIG. 3 is a graphical representation of the results of candidate set ratios for different infection rates obtained using different methods in four different networks;
in the figure, (a) -ER model network result schematic diagram; (b) -a schematic diagram of SF model network results; (c) -PG network result graph; (d) -SCM network result graph;
FIG. 4 is a graph showing the results of candidate set ratios for different observation point ratios using different network immunization methods in two networks;
in the figure, (a) -ratio R is set in LOCG network d Results are shown schematically as 0; (b) -setting a duty ratio R in a LOCG network d Schematic of results 1; (c) -setting a ratio R in a WG network d Results are schematic 0; (d) -setting a ratio R in a WG network d Results are shown schematically as 1.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
As shown in fig. 1, the present invention provides a network propagation source positioning method based on a percolation process and evolutionary computation, which is implemented as follows:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in a network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: the problem of localization of the propagation source can be seen as a network immunity problem, the objective function aims to control the propagation of epidemics by isolating few nodes, and the problem turns into: can a connectivity piece be decomposed by a network of a few nodes so that the size of the connectivity piece is small? To pairIn each network, wherein q c And q is a critical threshold of q, wherein q is the ratio of the connected component observation point set obtained in the network (the observation point set is a removed node set): (1) if q < q c The probability of having large connected components in the graph is high; (2) if q > q c The probability of not having a large connected component in the graph is high. Generally, the configuration of the observation point set plays an important role in the inhibition of the size of the connection piece, so that obtaining a better node sequence is the focus of the problem of positioning the propagation source. The method adopts a random ordering or node degree ordering method to construct and obtain an initial graph overall node sequence S, and adopts an AEF algorithm based on an evolutionary framework to update the node sequence S, wherein the AEF specific process comprises the following steps:
step a: according toRandomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each segment sequence S k Updating in parallel (the sequences are independent and do not influence each other) according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the orderRepresents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequencesAll nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodesWherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate setsIs selected to satisfyWherein y is a candidate setXi (y) represents the size of the connection piece of the node y according toOrCalculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According toUpdate subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, ifThen S k =S′ k Let us orderReturning to step S1, whenThen, the sequence S obtained k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c ″, | max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s ,A spliced sequence representing two sequences is shown,the same process is carried out;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
For the updated node sequence, selecting q nodes as observation points in the order from front to back, forming an observation point set O, marking the observation points on the network, recording the absolute time of the infection of the observation points, and randomly determining the occupation ratio R from the observation point set O d Forming a set O of observation points d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And step 3: at the moment when t is 0, randomly selecting a propagation source V from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv Spreading virus to neighbor nodes in susceptible state, and simultaneously, the nodes in infected state can recover the rate gamma u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according toDetermination of communication piece c i Is limited byWherein u represents the set of removed nodes V r V represents a connection piece c i Arbitrary node of (1), e uv Shows the infection pattern G I An edge connecting node u and node v; according toDetermining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i ;
Fig. 2 is a schematic diagram of a process for determining a propagation source positioning subgraph, wherein nodes are divided into three types, namely susceptible nodes, infected nodes and recovery nodes according to infected conditions, the three types are sequentially represented as three colors with different gray levels in the graph, the propagation source is represented by "star-shaped" nodes, and the observation point is represented by "cross" nodes. FIG. (a) is an infection chart in which a point t is observed i The indicia indicating when the observation point i is infected, e.g. t 1 Indicating the time at which observation point 1 was infected. Graph (b) shows the time when t is reached 1 When the infection time is the earliest, O ' ═ 1}, the connected coverage area union of the set O ' is subgraph V ' c Indicated in the figure as hatched; (c) when t is shown 1 =t 2 Is the earliest time of infection, O '═ 1,2, subfigure V' c Is a shaded portion in the figure.
And 5: selecting observation points with the earliest infected time to form an infected observation point subset O'; according to the formulaConstruct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,in subfigure V' c The invention adopts the RIS algorithm proposed by Borgs et al to find and obtain the propagation sourceThe method comprises the following specific steps:
step a: initializing the node set Lambda as an empty set; for subgrade V' c Let G ' (V ', E ') be its inverse network, satisfying | V ' | ═ V ' c And if edge e ba Is contained in subfigure V' c Middle, side e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating a node random walk step length t ', wherein t' m Denotes the relative time of infection of node m, t' 0 Is a slave intervalThe random number of (a) and (b),has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
step e: repeating the steps b-d for T Λ And obtaining a final updated node set Lambda, wherein the node with the most occurrence times in the final updated node set Lambda is the propagation sourceT Λ Is taken as value of 10 6 。
And 7: will propagate the sourceNeighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
To verify the validity of the method of the present invention, experiments were performed on model networks and real networks, and the experimental network data are shown in table 1.
TABLE 1
Data set | Number of nodes | Number of edges |
ER | 10000 | 35000 |
SF | 10000 | 40000 |
PG | 4941 | 6594 |
SCM | 7228 | 24784 |
LOCG | 196591 | 950327 |
WG | 875713 | 4322051 |
In the experiment, a propagation source localization algorithm JC (Jordan Center) method, a CI (Collective Influence) method in the field of network immunity, an MSRG (Min-sum and Reverse-greedy, minimum sum and inverse greedy) method, a FINDER (fine key planes in Networks through DEep learning to find key nodes) method are adopted as comparison methods. The JC algorithm considers all infected nodes and recovery nodes to realize the positioning of the propagation source, and simultaneously, a candidate set is constructed through node sequencing. The relevant parameters of the method are set as follows:for the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p 500; in the RIS algorithm T Λ =10 6 (ii) a Setting the edge infection rate beta in the network in the experiment uv Same as a fixed value beta, recovery rate gamma u Similarly, the fixed value γ is 0.1, and the explosion rate ∈ is 0.1.
Fig. 3 shows a schematic diagram of the results of candidate set ratios for different infection probabilities obtained by different methods in four different networks. Wherein JC denotes JC algorithm, Hubs _ s denotes observation point selection method based on node degree sorting, and PrEF (R) d ) Denotes with respect to a particular R d PrEF algorithm of values, e.g. PrEF (0)) Representing no directional information, while pcef (1) indicates that all directional information is known. The abscissa is the infection rate β and the ordinate is the candidate set occupation ratio φ. As can be seen from (a) and (b) of fig. 3, if the propagation process is symmetric (when the probability of infection is large), the JC algorithm is an efficient propagation source location estimator, but the performance decreases as the probability of infection decreases. In contrast, the method of the present invention exhibits more stable performance against a whole range of variation of the infection probability, and is superior to the JC algorithm when the infection probability is low, such as when β is 0.1, in SF network, Φ (praf (1)) is0.0004, and Φ (JC) is 0.0721. In addition, the present invention, praf (0), clearly performs better in ER networks than in SF networks, indicating that the more severe nodes in SF networks have an impact on the performance of the praf (0) algorithm. The real networks in fig. 3 (c) and (d) further confirm this conclusion, that the Hubs _ s algorithm works better than the JC algorithm in the PG network, but only the present method, pcef (1), works in the SCM network, while the other methods fail.
Fig. 4 is a schematic diagram showing candidate set ratio results obtained by different methods in two large networks and related to different observation point ratios, wherein CI represents CI algorithm, MSRG represents MSRG algorithm, filter represents filter algorithm, and praf represents the method of the present invention. The abscissa is the observation point ratio q, the ordinate is the candidate set ratio phi, and the comparison experiment fixes beta to 0.5. Graphs (a) and (b) are LOCG networks, graphs (c) and (d) are WG networks, and graphs (a) and (c) are set to R d R is set in fig. (b) and (d) when 0 d 1. It can be seen that as the observation point ratio (removal ratio) approaches 0, the candidate set occupancy also approaches 1; one particular method is in R d When R is 0, the expression effect is better d When 1, the performance is also better; aiming at a specific q value, compared with the algorithms CI, MSRG and FINDER, the PrEF algorithm of the invention has smaller candidate set range, and reduces the search range of a propagation source, especially in a WG network.
In summary, the present invention realizes network propagation source positioning, wherein the size of the observation point set, the observation point direction information acquisition ratio, and the strategy generated by the observation point set all play a crucial role in narrowing the propagation source search range and improving the positioning efficiency. Particularly, the method of the invention has better performance in the value range of the q value and has stronger robustness in different propagation models. The invention combines the network immunity problem, realizes the idea of positioning the propagation source after decomposing the network, shows effectiveness, high efficiency and stability, and is suitable for positioning the propagation source in a large-scale network.
Claims (3)
1. A network propagation source positioning method based on a seepage process and evolutionary computation is characterized by comprising the following steps:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in a network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: constructing and obtaining an initial graph overall node sequence S by adopting a random sequencing or node degree sequencing method, updating the node sequence S by adopting an AEF algorithm, selecting q nodes as observation points according to the sequence from front to back for the updated node sequence, forming an observation point set O, marking the observation points on the network, recording the absolute time of the observation points being infected, and randomly determining the occupation ratio R from the observation point set O d =|O d I/O I observation points form a set O d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And 3, step 3: at the moment t is 0, a propagation source V is randomly selected from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv The virus is transmitted to the neighbor nodes in a susceptible state,at the same time, the node in the infected state recovers at a recovery rate γ u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and are marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according toDetermination of communication piece c i Is limited byWherein u represents the set of removed nodes V r V represents a connection piece c i Arbitrary node of (1), e uv Shows an infection chart G I An edge connecting node u and node v; according toDetermining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i ;
And 5: selecting the observation point with the earliest infected time to form an infected observation point subset O'; according to the formulaConstruct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,in subfigure V' c The propagation source is found and obtained by adopting the RIS algorithm
And 7: will propagate the sourceNeighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
2. The method for positioning network propagation sources based on the seepage process and the evolutionary computation as claimed in claim 1, wherein: the specific process of updating the node sequence S by using the AEF algorithm in step 2 is as follows:
step a: according toRandomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each oneA segmentation sequence S k Updating in parallel according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the orderRepresents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequencesAll nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodesWherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate setsIs selected to satisfyWherein y is the candidate setXi (y) represents the size of the connection piece of the node y according toOrCalculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According toUpdate subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, ifThen S k =S′ k Let us orderReturning to step S1, whenWhen the temperature of the water is higher than the set temperature,the resulting sequence S k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c | "/n max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s ,A spliced sequence representing two sequences;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
3. The method for positioning network propagation sources based on the seepage process and the evolutionary computation as claimed in claim 1, wherein: the specific process of finding the propagation source by adopting the RIS algorithm in the step 6 is as follows:
step a: initializing the node set Lambda as an empty set; for subgraph V c ', let G ' (V ', E ') be its reverse network, satisfying | V ' | | V | c ' |, and if edge e ba Included in subfigure V c In this case, the edge e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating a node random walk step length t ', wherein t' m Denotes the relative time of infection of node m, t' 0 Is a slave intervalThe random number of (a) and (b),has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321271.1A CN114826678B (en) | 2022-03-24 | 2022-03-24 | Network propagation source positioning method based on seepage process and evolutionary computation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210321271.1A CN114826678B (en) | 2022-03-24 | 2022-03-24 | Network propagation source positioning method based on seepage process and evolutionary computation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114826678A true CN114826678A (en) | 2022-07-29 |
CN114826678B CN114826678B (en) | 2023-11-17 |
Family
ID=82532878
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210321271.1A Active CN114826678B (en) | 2022-03-24 | 2022-03-24 | Network propagation source positioning method based on seepage process and evolutionary computation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114826678B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006132987A1 (en) * | 2005-06-03 | 2006-12-14 | Board Of Trustees Of Michigan State University | Worm propagation modeling in a mobile ad-hoc network |
US20100023503A1 (en) * | 2008-07-22 | 2010-01-28 | Elumindata, Inc. | System and method for automatically selecting a data source for providing data related to a query |
US20140129190A1 (en) * | 2012-11-08 | 2014-05-08 | Ecole Polytechnique Federale De Lausanne Epfl | Method, apparatus and computer program product for locating a source of diffusion in a network |
CN113852597A (en) * | 2021-08-03 | 2021-12-28 | 中国电子科技集团公司第三十研究所 | Network threat traceability iterative analysis method, computer equipment and storage medium |
-
2022
- 2022-03-24 CN CN202210321271.1A patent/CN114826678B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006132987A1 (en) * | 2005-06-03 | 2006-12-14 | Board Of Trustees Of Michigan State University | Worm propagation modeling in a mobile ad-hoc network |
US20100023503A1 (en) * | 2008-07-22 | 2010-01-28 | Elumindata, Inc. | System and method for automatically selecting a data source for providing data related to a query |
US20140129190A1 (en) * | 2012-11-08 | 2014-05-08 | Ecole Polytechnique Federale De Lausanne Epfl | Method, apparatus and computer program product for locating a source of diffusion in a network |
CN113852597A (en) * | 2021-08-03 | 2021-12-28 | 中国电子科技集团公司第三十研究所 | Network threat traceability iterative analysis method, computer equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
ZYGMUNT J. HAAS, CORNELL;: "The Zone Routing Protocol (ZRP) for Ad Hoc Networks amp;amp;lt; amp;amp;lt;a href= amp;quot;./draft-ietf-manet-zone-zrp-02.txt amp;quot; amp;amp;gt;draft-ietf-manet-zone-zrp-02.txt amp;amp;lt;/a amp;amp;gt; amp;amp;gt;", IETF * |
刘栋;赵婧;聂豪;: "传播源估计中有效观察点部署策略研究", 中文信息学报, no. 08 * |
Also Published As
Publication number | Publication date |
---|---|
CN114826678B (en) | 2023-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109194707B (en) | Distributed graph embedding method and device | |
WO2018076571A1 (en) | Method and system for detecting abnormal value in lte network | |
CN107276793B (en) | Node importance measurement method based on probability jump random walk | |
CN113422695B (en) | Optimization method for improving robustness of topological structure of Internet of things | |
CN109657268B (en) | Immune strategy design method in uncertain network based on characteristic value | |
CN112464107B (en) | Social network overlapping community discovery method and device based on multi-label propagation | |
CN112446634B (en) | Method and system for detecting influence maximization node in social network | |
JP6200076B2 (en) | Method and system for evaluating measurements obtained from a system | |
CA2743466C (en) | Path calculation order deciding method, program and calculating apparatus | |
CN104700311B (en) | A kind of neighborhood in community network follows community discovery method | |
CN113569142B (en) | Network rumor tracing method based on full-order neighbor coverage strategy | |
CN110247805B (en) | Method and device for identifying propagation key nodes based on K-shell decomposition | |
CN115915226A (en) | Abnormal node detection and iterative positioning method based on residual comparison | |
CN114826678A (en) | Network propagation source positioning method based on seepage process and evolutionary computation | |
CN109218184B (en) | Router attribution AS identification method based on port and structure information | |
US20230046801A1 (en) | Source localization method for rumor based on full-order neighbor coverage strategy | |
CN116720975A (en) | Local community discovery method and system based on structural similarity | |
CN116743468A (en) | Dynamic attack path generation method based on reinforcement learning | |
Lin et al. | Assessing percolation threshold based on high-order non-backtracking matrices | |
He et al. | A comparative study of different approaches for tracking communities in evolving social networks | |
CN116186581A (en) | Floor identification method and system based on graph pulse neural network | |
CN115130044A (en) | Influence node identification method and system based on second-order H index | |
CN112597699A (en) | Social network rumor source identification method integrated with objective weighting method | |
KR20150079370A (en) | Method for predicting link in big database | |
Choi et al. | Consistent and efficient reconstruction of latent tree models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |