CN114826678A - Network propagation source positioning method based on seepage process and evolutionary computation - Google Patents

Network propagation source positioning method based on seepage process and evolutionary computation Download PDF

Info

Publication number
CN114826678A
CN114826678A CN202210321271.1A CN202210321271A CN114826678A CN 114826678 A CN114826678 A CN 114826678A CN 202210321271 A CN202210321271 A CN 202210321271A CN 114826678 A CN114826678 A CN 114826678A
Authority
CN
China
Prior art keywords
node
nodes
network
sequence
propagation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210321271.1A
Other languages
Chinese (zh)
Other versions
CN114826678B (en
Inventor
刘洋
汪小琦
王震
王茜
李学龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northwestern Polytechnical University
Original Assignee
Northwestern Polytechnical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northwestern Polytechnical University filed Critical Northwestern Polytechnical University
Priority to CN202210321271.1A priority Critical patent/CN114826678B/en
Publication of CN114826678A publication Critical patent/CN114826678A/en
Application granted granted Critical
Publication of CN114826678B publication Critical patent/CN114826678B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • H04L63/126Applying verification of the received information the source of the received data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • H04L41/0823Configuration setting characterised by the purposes of a change of settings, e.g. optimising configuration for enhancing reliability
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a network propagation source positioning method based on a seepage process and evolutionary computation. Firstly, inputting a network data set to obtain the node and edge attribute of the network data set, and initializing propagation model parameters; then, based on the seepage and evolutionary computation correlation theory and method, adopting an AEF algorithm to iteratively update an initial observation point sequence to obtain a final sequence, and arranging observation points in a certain proportion in the network according to the sequence; secondly, randomly selecting a propagation source in an infection state to start a propagation process, and stopping the propagation process until detection reaches a certain outbreak range; searching a target connection sheet according to the information captured by the observation point to obtain a sub-image, and starting an RIS algorithm on the sub-image to detect a propagation source; finally, the neighbors within the fixed hop count of the detected propagation source are added into a candidate set, and the candidate set can be used as a range for subsequently searching the real propagation source. The invention can realize the rapid propagation source positioning of the large-scale network, thereby controlling the malicious information propagation in time and reducing the loss caused by the malicious information propagation.

Description

Network propagation source positioning method based on seepage process and evolutionary computation
Technical Field
The invention belongs to the technical field of network information propagation, and particularly relates to a network propagation source positioning method based on a seepage process and evolutionary computation.
Background
Various complex networks such as social networks, power networks, road traffic networks and the like exist in the current world, and the high interconnectivity and cohesion of the complex networks facilitate information exchange between nodes and increase the chances of various risks in the networks. For example, rumors spread rapidly in social networks, computer viruses infect large numbers of hosts in a short time, and outbreaks of infectious diseases among people. Therefore, the area where the propagation source is located is quickly positioned, and the influence brought by the point spread of the propagation source is controlled, so that the method has very important research value and significance.
The main task of the propagation source localization problem is to design an estimator that can infer the propagation source, where the most desirable estimator is one that can find the true source. However, due to the complexity of the node communication pattern and the uncertainty of the diffusion model, even if the underlying network is a tree network, the designed estimator is almost impossible to infer the true source in theory. Thus, the error distance is developed and used as a criterion to evaluate the performance of an estimator: one estimator is said to be better than the other if the corresponding inferred source is closer in distance to the real source. Based on different assumptions of known information, researchers have developed different methods to minimize the error distance. However, in practice, we face the problems of: after obtaining an estimator with a smaller error distance, how to trace to the source? Indeed, one can perform more intensive detection of the estimated vicinity of the propagation source, eventually achieving the localization of the real source. In this scenario, for a network with a relatively simple structure, a small error distance usually indicates that we only need to perform further more intensive detection on a small number of nodes to find out the true propagation source. However, since most real world networks are heterogeneous, i.e., where a node may be directly connected to a plurality of nodes, the size of the more densely detected nodes in the neighborhood may be proportional to the size of the network, which is obviously not feasible in practice.
To date, there has been a great deal of research directed to locating dissemination sources in complex networks, and more algorithms are proposed to detect dissemination sources in networks that carry false or malicious information. The algorithms for propagation source localization can be generally classified into three major categories: 1) the method is based on a complete observation graph, namely, a researcher obtains state information and infection time information of all nodes of a network to detect propagation sources, such as a rumor centrality, a minimum description length method and a source identification method of node dynamic ages. 2) The method based on network snapshot observation, that is, the condition that a researcher obtains the condition that each node in the network receives and spreads information in a unit time, is easier to satisfy compared with a completely observed graph. Such as the Jordan center method, dynamic message propagation method. 3) The method based on sensor observation is that researchers arrange a certain number of observation points in a network as sensors to acquire infection information of a specific node to detect a propagation source in the network. Pinto et al first proposed such a method in 2012, which is based on two assumptions, namely that the network propagation delay obeys gaussian distribution, and that the propagation path of information is a deep traversal tree with nodes as roots. And estimating a propagation source by using a maximum likelihood estimation method by monitoring the time of the initial change of the observation point state and the direction of an information source. The node centrality method is a feasible means for analyzing network attributes, and some algorithms also adopt various centrality methods to identify propagation sources, such as degree centrality, approach centrality and betweenness centrality.
The propagation source node detection problem is significant, but at present, some problems still exist. On one hand, most current methods are based on tree structure network design, and most networks in practice are complex networks. Therefore, the propagation source detection is directly performed on a general network by using or expanding a tree network-based method, and the problems of reduced detection efficiency, difficulty in ensuring accuracy and the like generally exist. On the other hand, most of the existing methods are designed for small networks, the computational complexity is high, and the methods are difficult to be practically applied to general large-scale networks.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a network propagation source positioning method based on a seepage process and evolutionary computation. Firstly, inputting a network data set to obtain the node and edge attribute of the network data set, and initializing propagation model parameters; then, based on the seepage and evolutionary computation correlation theory and method, adopting an AEF algorithm to iteratively update an initial observation point sequence to obtain a final sequence, and arranging observation points in a certain proportion in the network according to the sequence; secondly, randomly selecting a propagation source in an infection state to start a propagation process, and stopping the propagation process until detection reaches a certain outbreak range; finding a target connection piece according to observation point capture information to obtain a sub-graph V' c In subfigure V' c The upper starting RIS algorithm detects the propagation source; finally, the neighbors within the fixed hop count of the detected propagation source are added into a candidate set, and the candidate set can be used as a range for subsequently searching the real propagation source. The invention can realize the rapid propagation source positioning of the large-scale network, thereby controlling the malicious information propagation in time and reducing the loss caused by the malicious information propagation.
A network propagation source positioning method based on a seepage process and evolutionary computation is characterized by comprising the following steps:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in the network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: constructing and obtaining an initial graph whole node sequence S by adopting a random ordering or node degree ordering method, updating the node sequence S by adopting an AEF algorithm, and selecting the updated node sequence according to the sequence from front to backUsing q nodes as observation points in proportion to form an observation point set O, marking the observation points on the network, recording the absolute time of the observation points infected, and randomly determining the proportion R from the observation point set O d =|O d I/O I observation points form a set O d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And step 3: at the moment t is 0, a propagation source V is randomly selected from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv Spreading virus to neighbor nodes in susceptible state, and simultaneously, the nodes in infected state can recover the rate gamma u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and are marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according to
Figure BDA0003563945800000031
Determination of communication piece c i Is limited by
Figure BDA0003563945800000032
Wherein u represents the set of removed nodes V r Arbitrary node in (1), v tableCommunication sheet c i Arbitrary node in (b), e uv Shows the infection pattern G I An edge connecting node u and node v; according to
Figure BDA0003563945800000033
Determining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i
And 5: selecting observation points with the earliest infected time to form an infected observation point subset O'; according to the formula
Figure BDA0003563945800000034
Construct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,
Figure BDA0003563945800000035
in subfigure V' c The propagation source is found and obtained by adopting the RIS algorithm
Figure BDA0003563945800000036
And 7: will propagate the source
Figure BDA0003563945800000041
Neighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
Further, the specific process of updating the node sequence S by using the AEF algorithm described in step 2 is as follows:
step a: according to
Figure BDA0003563945800000042
Randomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each segment sequence S k Updating in parallel according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Figure BDA0003563945800000043
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the order
Figure BDA0003563945800000044
Represents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequences
Figure BDA00035639458000000413
All nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodes
Figure BDA0003563945800000045
Wherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate sets
Figure BDA0003563945800000046
Is selected to satisfy
Figure BDA0003563945800000047
Wherein y is the candidate set
Figure BDA0003563945800000048
Xi (y) represents the size of the connection piece of the node y according to
Figure BDA0003563945800000049
Or
Figure BDA00035639458000000410
Calculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According to
Figure BDA00035639458000000411
Update subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, if
Figure BDA00035639458000000412
Then S k =S′ k Let us order
Figure BDA0003563945800000051
Returning to step S1, when
Figure BDA0003563945800000052
Then, the sequence S obtained k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c | "/n max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s
Figure BDA0003563945800000053
A splice sequence representing both sequences;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
Further, the specific process of finding the propagation source by using the RIS algorithm in step 6 is as follows:
step a: initializing the node set Lambda as an empty set; for subgrade V' c Let G ' (V ', E ') be its inverse network, satisfying | V ' | ═ V ' c And if edge e ba Is contained in subpicture V' c Middle, side e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating the random walk step length t' of the nodeOf medium to t' m Denotes the relative time of infection of node m, t' 0 Is a slave interval
Figure BDA0003563945800000054
The random number of (a) and (b),
Figure BDA0003563945800000055
has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
step e: repeating the steps b-d for T Λ And obtaining a final updated node set Lambda, wherein the node with the most occurrence times in the final updated node set Lambda is the propagation source
Figure BDA0003563945800000056
T Λ Is taken as value of 10 6
The invention has the beneficial effects that: due to the adoption of a method based on the combination of the network seepage process and the evolutionary computation, the observation point sequence arranged in the network is optimized, and the connected piece model based on the observation point removal set in the network is inhibited, so that fewer observation points can be set to realize the positioning of the propagation source, and the network protection cost is reduced; by combining the relevant strategy of the network immunity problem, a few nodes are isolated by using the observation point information to control the spread of epidemic diseases, and the spread source positioning range and the real spread source searching range are reduced. The invention provides technical support for restraining malicious propagation under resource limitation, and can be used for solving the problem of propagation source positioning in a large network.
Drawings
FIG. 1 is a flow chart of the network propagation source positioning method based on the seepage process and evolutionary computation according to the present invention;
FIG. 2 is a schematic diagram of the process of determining a propagation source localization sub-graph according to the present invention;
in the figure, (a) -infection profile obtained by the transmission process; (b) figure 1 is illustrated for a sub-graph of infection graph (a); (c) figure 2 is illustrated for a sub-graph of infection graph (a);
FIG. 3 is a graphical representation of the results of candidate set ratios for different infection rates obtained using different methods in four different networks;
in the figure, (a) -ER model network result schematic diagram; (b) -a schematic diagram of SF model network results; (c) -PG network result graph; (d) -SCM network result graph;
FIG. 4 is a graph showing the results of candidate set ratios for different observation point ratios using different network immunization methods in two networks;
in the figure, (a) -ratio R is set in LOCG network d Results are shown schematically as 0; (b) -setting a duty ratio R in a LOCG network d Schematic of results 1; (c) -setting a ratio R in a WG network d Results are schematic 0; (d) -setting a ratio R in a WG network d Results are shown schematically as 1.
Detailed Description
The present invention will be further described with reference to the following drawings and examples, which include, but are not limited to, the following examples.
As shown in fig. 1, the present invention provides a network propagation source positioning method based on a percolation process and evolutionary computation, which is implemented as follows:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in a network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: the problem of localization of the propagation source can be seen as a network immunity problem, the objective function aims to control the propagation of epidemics by isolating few nodes, and the problem turns into: can a connectivity piece be decomposed by a network of a few nodes so that the size of the connectivity piece is small? To pairIn each network, wherein q c And q is a critical threshold of q, wherein q is the ratio of the connected component observation point set obtained in the network (the observation point set is a removed node set): (1) if q < q c The probability of having large connected components in the graph is high; (2) if q > q c The probability of not having a large connected component in the graph is high. Generally, the configuration of the observation point set plays an important role in the inhibition of the size of the connection piece, so that obtaining a better node sequence is the focus of the problem of positioning the propagation source. The method adopts a random ordering or node degree ordering method to construct and obtain an initial graph overall node sequence S, and adopts an AEF algorithm based on an evolutionary framework to update the node sequence S, wherein the AEF specific process comprises the following steps:
step a: according to
Figure BDA0003563945800000071
Randomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each segment sequence S k Updating in parallel (the sequences are independent and do not influence each other) according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Figure BDA00035639458000000713
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the order
Figure BDA0003563945800000072
Represents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequences
Figure BDA00035639458000000714
All nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodes
Figure BDA0003563945800000073
Wherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate sets
Figure BDA0003563945800000074
Is selected to satisfy
Figure BDA0003563945800000075
Wherein y is a candidate set
Figure BDA0003563945800000076
Xi (y) represents the size of the connection piece of the node y according to
Figure BDA0003563945800000077
Or
Figure BDA0003563945800000078
Calculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According to
Figure BDA0003563945800000079
Update subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, if
Figure BDA00035639458000000710
Then S k =S′ k Let us order
Figure BDA00035639458000000711
Returning to step S1, when
Figure BDA00035639458000000712
Then, the sequence S obtained k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c ″, | max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s
Figure BDA0003563945800000081
A spliced sequence representing two sequences is shown,
Figure BDA0003563945800000082
the same process is carried out;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
For the updated node sequence, selecting q nodes as observation points in the order from front to back, forming an observation point set O, marking the observation points on the network, recording the absolute time of the infection of the observation points, and randomly determining the occupation ratio R from the observation point set O d Forming a set O of observation points d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And step 3: at the moment when t is 0, randomly selecting a propagation source V from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv Spreading virus to neighbor nodes in susceptible state, and simultaneously, the nodes in infected state can recover the rate gamma u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according to
Figure BDA0003563945800000083
Determination of communication piece c i Is limited by
Figure BDA0003563945800000084
Wherein u represents the set of removed nodes V r V represents a connection piece c i Arbitrary node of (1), e uv Shows the infection pattern G I An edge connecting node u and node v; according to
Figure BDA0003563945800000085
Determining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i
Fig. 2 is a schematic diagram of a process for determining a propagation source positioning subgraph, wherein nodes are divided into three types, namely susceptible nodes, infected nodes and recovery nodes according to infected conditions, the three types are sequentially represented as three colors with different gray levels in the graph, the propagation source is represented by "star-shaped" nodes, and the observation point is represented by "cross" nodes. FIG. (a) is an infection chart in which a point t is observed i The indicia indicating when the observation point i is infected, e.g. t 1 Indicating the time at which observation point 1 was infected. Graph (b) shows the time when t is reached 1 When the infection time is the earliest, O ' ═ 1}, the connected coverage area union of the set O ' is subgraph V ' c Indicated in the figure as hatched; (c) when t is shown 1 =t 2 Is the earliest time of infection, O '═ 1,2, subfigure V' c Is a shaded portion in the figure.
And 5: selecting observation points with the earliest infected time to form an infected observation point subset O'; according to the formula
Figure BDA0003563945800000091
Construct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,
Figure BDA0003563945800000092
in subfigure V' c The invention adopts the RIS algorithm proposed by Borgs et al to find and obtain the propagation source
Figure BDA0003563945800000093
The method comprises the following specific steps:
step a: initializing the node set Lambda as an empty set; for subgrade V' c Let G ' (V ', E ') be its inverse network, satisfying | V ' | ═ V ' c And if edge e ba Is contained in subfigure V' c Middle, side e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating a node random walk step length t ', wherein t' m Denotes the relative time of infection of node m, t' 0 Is a slave interval
Figure BDA0003563945800000094
The random number of (a) and (b),
Figure BDA0003563945800000095
has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
step e: repeating the steps b-d for T Λ And obtaining a final updated node set Lambda, wherein the node with the most occurrence times in the final updated node set Lambda is the propagation source
Figure BDA0003563945800000096
T Λ Is taken as value of 10 6
And 7: will propagate the source
Figure BDA0003563945800000097
Neighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
To verify the validity of the method of the present invention, experiments were performed on model networks and real networks, and the experimental network data are shown in table 1.
TABLE 1
Data set Number of nodes Number of edges
ER 10000 35000
SF 10000 40000
PG 4941 6594
SCM 7228 24784
LOCG 196591 950327
WG 875713 4322051
In the experiment, a propagation source localization algorithm JC (Jordan Center) method, a CI (Collective Influence) method in the field of network immunity, an MSRG (Min-sum and Reverse-greedy, minimum sum and inverse greedy) method, a FINDER (fine key planes in Networks through DEep learning to find key nodes) method are adopted as comparison methods. The JC algorithm considers all infected nodes and recovery nodes to realize the positioning of the propagation source, and simultaneously, a candidate set is constructed through node sequencing. The relevant parameters of the method are set as follows:
Figure BDA0003563945800000101
for the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p 500; in the RIS algorithm T Λ =10 6 (ii) a Setting the edge infection rate beta in the network in the experiment uv Same as a fixed value beta, recovery rate gamma u Similarly, the fixed value γ is 0.1, and the explosion rate ∈ is 0.1.
Fig. 3 shows a schematic diagram of the results of candidate set ratios for different infection probabilities obtained by different methods in four different networks. Wherein JC denotes JC algorithm, Hubs _ s denotes observation point selection method based on node degree sorting, and PrEF (R) d ) Denotes with respect to a particular R d PrEF algorithm of values, e.g. PrEF (0)) Representing no directional information, while pcef (1) indicates that all directional information is known. The abscissa is the infection rate β and the ordinate is the candidate set occupation ratio φ. As can be seen from (a) and (b) of fig. 3, if the propagation process is symmetric (when the probability of infection is large), the JC algorithm is an efficient propagation source location estimator, but the performance decreases as the probability of infection decreases. In contrast, the method of the present invention exhibits more stable performance against a whole range of variation of the infection probability, and is superior to the JC algorithm when the infection probability is low, such as when β is 0.1, in SF network, Φ (praf (1)) is0.0004, and Φ (JC) is 0.0721. In addition, the present invention, praf (0), clearly performs better in ER networks than in SF networks, indicating that the more severe nodes in SF networks have an impact on the performance of the praf (0) algorithm. The real networks in fig. 3 (c) and (d) further confirm this conclusion, that the Hubs _ s algorithm works better than the JC algorithm in the PG network, but only the present method, pcef (1), works in the SCM network, while the other methods fail.
Fig. 4 is a schematic diagram showing candidate set ratio results obtained by different methods in two large networks and related to different observation point ratios, wherein CI represents CI algorithm, MSRG represents MSRG algorithm, filter represents filter algorithm, and praf represents the method of the present invention. The abscissa is the observation point ratio q, the ordinate is the candidate set ratio phi, and the comparison experiment fixes beta to 0.5. Graphs (a) and (b) are LOCG networks, graphs (c) and (d) are WG networks, and graphs (a) and (c) are set to R d R is set in fig. (b) and (d) when 0 d 1. It can be seen that as the observation point ratio (removal ratio) approaches 0, the candidate set occupancy also approaches 1; one particular method is in R d When R is 0, the expression effect is better d When 1, the performance is also better; aiming at a specific q value, compared with the algorithms CI, MSRG and FINDER, the PrEF algorithm of the invention has smaller candidate set range, and reduces the search range of a propagation source, especially in a WG network.
In summary, the present invention realizes network propagation source positioning, wherein the size of the observation point set, the observation point direction information acquisition ratio, and the strategy generated by the observation point set all play a crucial role in narrowing the propagation source search range and improving the positioning efficiency. Particularly, the method of the invention has better performance in the value range of the q value and has stronger robustness in different propagation models. The invention combines the network immunity problem, realizes the idea of positioning the propagation source after decomposing the network, shows effectiveness, high efficiency and stability, and is suitable for positioning the propagation source in a large-scale network.

Claims (3)

1. A network propagation source positioning method based on a seepage process and evolutionary computation is characterized by comprising the following steps:
step 1: inputting an experimental network data set G (V, E), wherein V represents a network node set, and E represents an edge set in a network; edge infection rate beta for initializing fixed propagation model uv Node recovery ratio gamma u Infection rate beta uv Has a value range of [0,1 ]]Node recovery rate γ u Has a value range of [0,1 ]](ii) a Determining the explosion rate epsilon, wherein the value range of the explosion rate epsilon is [0, 1%](ii) a Initializing all nodes in a network to be in a susceptible state;
step 2: constructing and obtaining an initial graph overall node sequence S by adopting a random sequencing or node degree sequencing method, updating the node sequence S by adopting an AEF algorithm, selecting q nodes as observation points according to the sequence from front to back for the updated node sequence, forming an observation point set O, marking the observation points on the network, recording the absolute time of the observation points being infected, and randomly determining the occupation ratio R from the observation point set O d =|O d I/O I observation points form a set O d Recording the infected direction information of the observation point; q has a value range of [0,0.2 ]],R d Has a value range of [0.001,1 ]];
And 3, step 3: at the moment t is 0, a propagation source V is randomly selected from all nodes in the data set G (V, E) s In an infected state, starting a transmission process; during the propagation, the observation point records the absolute time of its own infection, set O d The observation point in (1) also records the infected direction information, and the node in the infected state has the side infection rate beta uv The virus is transmitted to the neighbor nodes in a susceptible state,at the same time, the node in the infected state recovers at a recovery rate γ u Entering a recovery state, enabling the infected node to enter an infection state to become a new infection node, continuing the propagation behavior of all nodes in the infection state until the number n1 of the infection nodes and the number n2 of the recovery nodes in the network meet (n1+ n2)/n ≧ epsilon, stopping the propagation process, and forming an infection graph G by the obtained node infection situation distribution graph I (ii) a Wherein n represents the number of nodes contained in the network G;
and 4, step 4: taking the observation point set O in the step 2 as a removed node set V r And other nodes except the observation points in the data set G form a residual node set V o (ii) a For removing node set V r Remove set V o Node in (1) and V r A plurality of communicating sheets with different sizes and larger than 1 are obtained after the edges connected by the middle nodes and are marked as c i Denotes the ith communication piece, wherein i is 1,2, …, C denotes the total number of communication pieces; according to
Figure FDA0003563945790000011
Determination of communication piece c i Is limited by
Figure FDA0003563945790000012
Wherein u represents the set of removed nodes V r V represents a connection piece c i Arbitrary node of (1), e uv Shows an infection chart G I An edge connecting node u and node v; according to
Figure FDA0003563945790000013
Determining to remove node set V r Wherein Γ (u) represents the node u in the infection map G I Set of neighbor nodes in c i (v) Indicating the connected slice c to which the node v belongs i
And 5: selecting the observation point with the earliest infected time to form an infected observation point subset O'; according to the formula
Figure FDA0003563945790000021
Construct subgrade V' c Where x represents any observation point in the subset O', and α (x) represents observation point x in the infection map G I The communication piece covered area where the neighbor node is located;
step 6: according to t' x =t x -t min Calculating the relative time of infection t' x Wherein, t x Indicating the time at which observation point x was infected,
Figure FDA0003563945790000022
in subfigure V' c The propagation source is found and obtained by adopting the RIS algorithm
Figure FDA0003563945790000023
And 7: will propagate the source
Figure FDA0003563945790000024
Neighbor nodes within a fixed order of V add to the candidate set V c With the relative size of the candidate set phi ═ V c The | n is used as an evaluation index, and the smaller phi represents the smaller range of inhibiting infection; wherein, | V c I represents the candidate set V c The number of the contained nodes, and the fixed order is first order or second order.
2. The method for positioning network propagation sources based on the seepage process and the evolutionary computation as claimed in claim 1, wherein: the specific process of updating the node sequence S by using the AEF algorithm in step 2 is as follows:
step a: according to
Figure FDA0003563945790000025
Randomly determining a segment length parameter n for sequence segmentation s Segmenting the node sequence S from front to back, wherein each segment comprises n s Each node, the kth segment sequence is recorded as S k N is the total number of nodes contained in the sequence S, and K is 1, 2.
Step b: for each oneA segmentation sequence S k Updating in parallel according to the following process to obtain an updated segmented sequence S k Wherein the number of updates is
Figure FDA0003563945790000026
Step S1: initializing parameters, and making cyclic counting variable j equal to n s Initial intermediate node sequence S' k =S k Construction of initial subgraph G' k (V′ k ,E′ k ) In which the order
Figure FDA0003563945790000027
Represents the sequence S k Splicing sequence of all sequences thereafter, V' k Set of nodes being subgraphs, containing sequences
Figure FDA0003563945790000028
All nodes, E' k =E∩(V′ k ×V′ k ) Is a set of edges, V ', constituting a subgraph' k ×V′ k Representing the set of all possible connected edges between the node sets; randomly determining a selection time parameter delta and a node selection parameter x, delta epsilon [1,50 ∈ ]],x∈(0,1];
Step S2: from set { S' k (z),z∈[max(j-x×n s ,1),j]Randomly selecting nodes, selecting for delta times, and forming a candidate set by the selected nodes
Figure FDA0003563945790000029
Wherein, S' k (z) represents the sequence S' k The z-th node in (a);
step S3: by candidate sets
Figure FDA00035639457900000210
Is selected to satisfy
Figure FDA00035639457900000211
Wherein y is the candidate set
Figure FDA00035639457900000212
Xi (y) represents the size of the connection piece of the node y according to
Figure FDA00035639457900000213
Or
Figure FDA00035639457900000214
Calculated, c (y) represents a connected slice set comprising the node y, c' i Represents any connected piece, | c' i L represents a linking piece c' i The number of nodes of (c);
step S4: according to V' k ←V′ k U { r } update sub-graph G' k Node set of V' k And then according to the updated node set V' k According to
Figure FDA0003563945790000031
Update subgraph G' k Side set E 'of' k Where { r } denotes a set including the node r, and V 'denotes an updated node set V' k Arbitrary node of middle non-node r, e rv′ Represents original image G' k The edge connecting the middle node r with the node v'; if is at sequence S' k Wherein is present of S' k (z) ═ r node S' k (z), exchange S' k (j) And S' k (z);
Step S5: if j is greater than 0, returning to step S2; otherwise, if
Figure FDA0003563945790000032
Then S k =S′ k Let us order
Figure FDA0003563945790000033
Returning to step S1, when
Figure FDA0003563945790000034
When the temperature of the water is higher than the set temperature,the resulting sequence S k I.e. the updated sequence; where F denotes a correlation evaluation index function with respect to the sequence, in terms of F ═ Σ q |c″ max Calculated as | c | "/n max I represents the node number of the maximum connection piece of the sequence under different q values, n represents the total node number, and the value range of q is [1/n ] s ,1]Step length of 1/n s
Figure FDA0003563945790000035
A spliced sequence representing two sequences;
step c: let T p =T p -1, returning to step a, when counting variable T p When 0, the final sequence S is obtained; wherein, the number of nodes n is less than or equal to 10 5 Network T of p 5000, node number 10 5 <n≤10 6 Network T of p 2500, the number of nodes n > 10 6 Network T of p =500。
3. The method for positioning network propagation sources based on the seepage process and the evolutionary computation as claimed in claim 1, wherein: the specific process of finding the propagation source by adopting the RIS algorithm in the step 6 is as follows:
step a: initializing the node set Lambda as an empty set; for subgraph V c ', let G ' (V ', E ') be its reverse network, satisfying | V ' | | V | c ' |, and if edge e ba Included in subfigure V c In this case, the edge e ab E 'where, | V' c L represents sub-diagram V' c The number of nodes in the reverse network is expressed by V ' and E ', the node set and the edge set of the reverse network are respectively expressed by | V ' | expressing the number of nodes in the reverse network;
step b: randomly selecting a node m from the infected observation point subset O ', and obtaining t ″ -t' 0 +t′ m Calculating a node random walk step length t ', wherein t' m Denotes the relative time of infection of node m, t' 0 Is a slave interval
Figure FDA0003563945790000036
The random number of (a) and (b),
Figure FDA0003563945790000037
has a value range of [0,20 ]];
Step c: taking the node m as a random walk starting point, starting random walk to one of random neighbors of the node m, and then changing the state of the node m into recovery;
step d: the walking lasts for t' step, v represents the last node of random walking, and the node set Lambda is updated according to Lambda ═ Lambda { v };
step e: repeating the steps b-d for T Λ And obtaining a final updated node set Lambda, wherein the node with the most occurrence times in the final updated node set Lambda is the propagation source
Figure FDA0003563945790000041
T Λ Is taken as value of 10 6
CN202210321271.1A 2022-03-24 2022-03-24 Network propagation source positioning method based on seepage process and evolutionary computation Active CN114826678B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210321271.1A CN114826678B (en) 2022-03-24 2022-03-24 Network propagation source positioning method based on seepage process and evolutionary computation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210321271.1A CN114826678B (en) 2022-03-24 2022-03-24 Network propagation source positioning method based on seepage process and evolutionary computation

Publications (2)

Publication Number Publication Date
CN114826678A true CN114826678A (en) 2022-07-29
CN114826678B CN114826678B (en) 2023-11-17

Family

ID=82532878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210321271.1A Active CN114826678B (en) 2022-03-24 2022-03-24 Network propagation source positioning method based on seepage process and evolutionary computation

Country Status (1)

Country Link
CN (1) CN114826678B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006132987A1 (en) * 2005-06-03 2006-12-14 Board Of Trustees Of Michigan State University Worm propagation modeling in a mobile ad-hoc network
US20100023503A1 (en) * 2008-07-22 2010-01-28 Elumindata, Inc. System and method for automatically selecting a data source for providing data related to a query
US20140129190A1 (en) * 2012-11-08 2014-05-08 Ecole Polytechnique Federale De Lausanne Epfl Method, apparatus and computer program product for locating a source of diffusion in a network
CN113852597A (en) * 2021-08-03 2021-12-28 中国电子科技集团公司第三十研究所 Network threat traceability iterative analysis method, computer equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006132987A1 (en) * 2005-06-03 2006-12-14 Board Of Trustees Of Michigan State University Worm propagation modeling in a mobile ad-hoc network
US20100023503A1 (en) * 2008-07-22 2010-01-28 Elumindata, Inc. System and method for automatically selecting a data source for providing data related to a query
US20140129190A1 (en) * 2012-11-08 2014-05-08 Ecole Polytechnique Federale De Lausanne Epfl Method, apparatus and computer program product for locating a source of diffusion in a network
CN113852597A (en) * 2021-08-03 2021-12-28 中国电子科技集团公司第三十研究所 Network threat traceability iterative analysis method, computer equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZYGMUNT J. HAAS, CORNELL;: "The Zone Routing Protocol (ZRP) for Ad Hoc Networks amp;amp;lt; amp;amp;lt;a href= amp;quot;./draft-ietf-manet-zone-zrp-02.txt amp;quot; amp;amp;gt;draft-ietf-manet-zone-zrp-02.txt amp;amp;lt;/a amp;amp;gt; amp;amp;gt;", IETF *
刘栋;赵婧;聂豪;: "传播源估计中有效观察点部署策略研究", 中文信息学报, no. 08 *

Also Published As

Publication number Publication date
CN114826678B (en) 2023-11-17

Similar Documents

Publication Publication Date Title
CN109194707B (en) Distributed graph embedding method and device
WO2018076571A1 (en) Method and system for detecting abnormal value in lte network
CN107276793B (en) Node importance measurement method based on probability jump random walk
CN113422695B (en) Optimization method for improving robustness of topological structure of Internet of things
CN109657268B (en) Immune strategy design method in uncertain network based on characteristic value
CN112464107B (en) Social network overlapping community discovery method and device based on multi-label propagation
CN112446634B (en) Method and system for detecting influence maximization node in social network
JP6200076B2 (en) Method and system for evaluating measurements obtained from a system
CA2743466C (en) Path calculation order deciding method, program and calculating apparatus
CN104700311B (en) A kind of neighborhood in community network follows community discovery method
CN113569142B (en) Network rumor tracing method based on full-order neighbor coverage strategy
CN110247805B (en) Method and device for identifying propagation key nodes based on K-shell decomposition
CN115915226A (en) Abnormal node detection and iterative positioning method based on residual comparison
CN114826678A (en) Network propagation source positioning method based on seepage process and evolutionary computation
CN109218184B (en) Router attribution AS identification method based on port and structure information
US20230046801A1 (en) Source localization method for rumor based on full-order neighbor coverage strategy
CN116720975A (en) Local community discovery method and system based on structural similarity
CN116743468A (en) Dynamic attack path generation method based on reinforcement learning
Lin et al. Assessing percolation threshold based on high-order non-backtracking matrices
He et al. A comparative study of different approaches for tracking communities in evolving social networks
CN116186581A (en) Floor identification method and system based on graph pulse neural network
CN115130044A (en) Influence node identification method and system based on second-order H index
CN112597699A (en) Social network rumor source identification method integrated with objective weighting method
KR20150079370A (en) Method for predicting link in big database
Choi et al. Consistent and efficient reconstruction of latent tree models

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant