WO2003039070A2 - Method and apparatus for analysing network robustness - Google Patents

Method and apparatus for analysing network robustness Download PDF

Info

Publication number
WO2003039070A2
WO2003039070A2 PCT/GB2002/005029 GB0205029W WO03039070A2 WO 2003039070 A2 WO2003039070 A2 WO 2003039070A2 GB 0205029 W GB0205029 W GB 0205029W WO 03039070 A2 WO03039070 A2 WO 03039070A2
Authority
WO
WIPO (PCT)
Prior art keywords
network
node failure
performance
simulations
nodes
Prior art date
Application number
PCT/GB2002/005029
Other languages
French (fr)
Other versions
WO2003039070A3 (en
Inventor
Fabrice Tristan Pierre Saffre
Robert Alan Ghanea-Hercock
Original Assignee
British Telecommunications Public Limited Company
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications Public Limited Company filed Critical British Telecommunications Public Limited Company
Publication of WO2003039070A2 publication Critical patent/WO2003039070A2/en
Publication of WO2003039070A3 publication Critical patent/WO2003039070A3/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • H04L41/0645Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis by additionally acting on or stimulating the network after receiving notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/22Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks comprising specially adapted graphical user interfaces [GUI]

Definitions

  • the present invention relates to analysis of the structure of networks.
  • the invention related to assessing the robustness of a network topology when exposed to cumulative node failure.
  • Such node failure may result from a node going out of service as a result of maintenance or a directed attack.
  • a node may go out of service because it is out of range of nodes that it was previously in communication with.
  • apparatus for determining the response of a network to node failure comprising: means for inputting a representation of a network; means for measuring the performance of the network in simulations of node failure; and means for comparing the performance of the network in simulations to one or more models of network response to node failure.
  • Embodiments of the present invention provide a network analyser that quantifies a complex networks' behaviour when submitted to cumulative node failure.
  • the analyser tests the robustness of any given network topology in an automated fashion, computing the values for a set of global variables after performing a statistical analysis of simulation results. Those variables characterise the decay of the network's largest component and effectively summarise the system's resilience to stress.
  • the analyser provides a user-friendly interface to specify key simulation parameters and a graphical representation of the results. The results are also made available as text files.
  • Figure 1 is a representation of the topology of a network
  • Figure 2 is a representation of the topology of the network of figure 1 after being subjected to cumulative node failure
  • Figure 3 is a flow diagram illustrating the analysis method used by the analysis apparatus according to an embodiment of the present invention.
  • Figure 4a & 4b are graphs illustrating specific steps in the analysis illustrated in figure 3;
  • Figure 5 is an annotated screen shot of the graphical user interface (GUI) of the analysis apparatus;
  • GUI graphical user interface
  • Figures 6 and 9 to 12 are screen shots of the display by the analysis apparatus of the results of its analysis
  • Figure 7 is a graph representing features of the network whose analysis results are shown in figure 6;
  • Figure 8 is a further screen shot of the GUI showing the inputs used to generate the analysis shown in figure 9.
  • a network 101 is made up of a number of nodes 103 interconnected by links 105 (Nodes 103 may also be referred to as vertices and links 105 as edges).
  • Nodes 103 may also be referred to as vertices and links 105 as edges).
  • S the relative size of the largest intact component to the total number of nodes.
  • Figure 2 illustrates the same network 101 after cumulative node failure. This node failure has resulted in 50% or less of the remaining nodes (i.e. the nodes left after the failed nodes have been removed) are still connected together in the largest component.
  • the other 50% of the nodes are attached in smaller groups or not attached at all.
  • a network may have one hundred nodes of which five fail leaving 95 nodes in the network.
  • S For a relatively resilient network topology this could result in a value of S of 0.80.
  • 80% or 76 of the remaining 95 nodes would still be connected together.
  • S For a relatively brittle network topology this could result in a value of S of 0.20. In other words, 20% or 19 of the remaining 95 nodes would still be connected together.
  • the decay of the average relative size of the largest component ⁇ S> of a given network can be modelled using one of two basic non-linear equations.
  • the first equation performs better when modelling networks with relatively resilient topology and has the form:
  • Equations [1a] and [1b] obey a very similar logic and are relative efficient in describing the network's behaviour under cumulative node failure. They can be used to discriminate between 2 qualitatively different categories of architecture. Since expression [1a] or [1b] give an approximation of the decay of a given network's largest component, then the corresponding X and ⁇ global variables are a suitable measurement for quantifying its resilience to cumulative node failure.
  • a further useful indicator derived from an adjusted value of X is X c .
  • This is defined as the value of x for which the average relative size ( ⁇ S>) of the largest component is equal to 0.5.
  • X c is the critical fraction of "missing" nodes above which, on average, less than 50% of the surviving nodes are still interconnected.
  • the value of ⁇ provides an approximation of the slope of the curve around the critical value X c .
  • X c is defined for networks described by equation [1a] as:
  • FIG 3 is a flow diagram illustrating the analysis method carried out by a computer program embodying the present invention running on a general purpose computer.
  • the program provides an analysis apparatus that takes as input a description of the topology of the network to be analysed.
  • the topology is described in a text file that lists the total number of nodes, the total number of links and paired node identification numbers (IDs) thereby specifying which nodes are directly connected to which other nodes.
  • IDs paired node identification numbers
  • GUI graphical user interface
  • the GUI 501 comprises a number of user definable fields, a check box and two buttons in addition to the standard Windows TM control buttons.
  • the #Sims box 503 enables the user to determine how many simulations should be performed on the supplied topology data.
  • the Sample box 505 enables the user to determine how many points there should be during each simulation where the effect of node losses should be calculated i.e. S measured.
  • the File box 507 enables the user to define the file in which the topology of the network to be analysed is stored.
  • the Seed box 509 is used to define a number that is used by the analyser to initialise its random number generator.
  • the Attack check box 511 enables the user to choose between a random node failure simulation or a directed attack simulation.
  • the Start button 513 begins the simulation process while the Exit button 515 closes the program.
  • the analyser creates two separate text files, bearing the same name as the original topology file, but with different extensions.
  • One is contains the values for the global variables and a measurement of fitting quality (r 2 ) and has a ".gvr” suffix.
  • the other contains a table of numerical values as shown in table 2 below and has a ".rst" suffix.
  • the first column of table 2 contains the fraction of nodes that have failed, the second contains the corresponding average relative size ⁇ S> of the largest component, and the third is the standard deviation for S.
  • the fourth column is the value of ⁇ S> as predicted by expression [1a], and the fifth is the value of ⁇ S> as predicted by [1b].
  • step 301 the program is initiated and extracts the topology data from the topology file described above. Processing then moves to step 303 at which the topology data is used to simulate network decay either by random node loss or by directed attack as determined by the user via the GUI as noted above.
  • the random node loss is simulated by the system randomly choosing one or more nodes from the supplied topology and removing it from the network. This is repeated until all nodes have been removed.
  • the number simulations that are carried out and the number of nodes that are removed at each iteration can be varied by the user via the GUI which is described in further detail below.
  • the size S of the largest component of the remaining network is calculated by known methods and stored.
  • the average value between simulations (where there is more than one) of S is calculated along with its standard deviation SDEV and stored in the manner noted above with reference to table 2.
  • Figure 4a is a graph of ⁇ S> (relative size of the largest component) derived from the simulations plotted against x (proportion of original number of nodes removed).
  • ⁇ S> relative size of the largest component
  • step 311 a fitting function is used to compare the results of the calculations of S using expressions [1a] & [1b] against the empirical results for S from the simulations carried out in step 303.
  • the fitting function gives a measure (r 2 ) for each of the curves derived from expressions [1a] & [1b] relative to the curve from the simulation.
  • the analyser displays the data it has calculated. An example display is shown in figure 6.
  • the results window 601 is displayed which includes values for all global variables and a graph showing simulation data (average S +/- standard deviation) as well as the results from each of the expressions [1a] & [1b] (referred to in the window 601 as Option 1 and Option 2).
  • the analyser is arranged to remove nodes using a "best guess" strategy in the simulation carried out at step 303 of figure 3.
  • This strategy emulates an attacker's strategy where the attacker possesses partial information about network topology which is used to chose which node to target next.
  • This strategy is modelled by attributing to each surviving node a probability of being selected that is linearly proportional to its degree k (i.e. the number of links it has to other nodes):
  • the analyser recalculates P,- after each attack in order to take into account the changing probability distribution caused by the elimination of one of the nodes. This increased complexity means that testing a network's resilience for directed attack is more intensive and time consuming than for random failure.
  • the "Attack” scenario because of its stochastic nature, can also be used to model special forms of accidental damage where connectivity level is involved. For example, in a network where congestion is a cause for node failure, key relays (highly connected nodes) are more likely to suffer breakdown, which can be modelled using expression [5].
  • the example network is a relatively large 3000 nodes system.
  • the cheapest way to have them all such nodes interconnected involves 2999 links. They could all be arranged in a single "star” or in a closed “loop", but more realistic architectures would involve inter-connected sub-domains of different size and/or topology.
  • the network used for this example is a scale-free network of the appropriate size (3000 nodes, one link per node except the first one) to use as the basic blueprint.
  • Figure 7 indicates that the example network's topology is scale-free (power law relationship between node frequency and degree).
  • the most highly connected node has a direct link with 45 other nodes, 9 "secondary hubs" have more than 20 connections, and 28 have between 10 and 20 direct "affiliates”.
  • the network designer can use the analyser to compute statistics about its resilience to node failure, in terms of the cohesion of its largest component (initially including all nodes).
  • the GUI entries to provide this are shown in figure 8.
  • the analyser displays the results window as shown in figure 9 from which it can be seen that the r 2 value for Option 2 correlates closest to the simulation results indicating that expression [1a] models the network best.
  • the analyser shows that, on average, removing only about 14% of all nodes (equivalent to severing all their links) is enough to reduce the size of the largest component to 50% of the surviving population (X c ⁇ 0.14).
  • the analyser tells the designer that if 500 nodes out of 3000 are malfunctioning, chances are the largest sub-set of relays that are still interconnected contains less than a half of the 2500 surviving nodes. In other words, it is likely that in this situation, around 1250 operational nodes are cut from (and unable to exchange any information with) the core of the network.
  • a straightforward way of increasing robustness is to add at least some backup links, so that alternative routes are available between nodes in case the primary (presumably most efficient) path becomes unavailable due to node failure(s).
  • the designer could want to test the influence of doubling the total number of connections (raising it to 5999 links).
  • Doubling the number of links may however be an unacceptable solution because of financial considerations.
  • the network designer may look for alternative ways of improving robustness, perhaps by testing the benefit of partial route redundancy. Again, the analyser would allow the making of projections on the basis of another blueprint. For example, if only 1000 extra-connections are added to the original topology, bringing it to 3999. The results of this are shown in figure 12. As can be seen, the robustness is not increased in the same proportion as before. However, even though 33% extra links were created instead of 100%, the critical size X c is shifted to -0.44. In other words, the modified network is three times more robust on this measure relative to the original blueprint. Since the doubling of the number of connections described above only results in the robustness increasing four times then the second choice may be more cost-effective.
  • the apparatus described above is a combined simulation and analysis tool designed to study topological robustness. It does not take into account other critical aspects of network operation like traffic or routing management. Its purpose is to provide a suitable way of estimating the speed and profile of the largest component's decay under cumulative node failure, a necessary step in assessing a system's ability to withstand damage.
  • the apparatus that embodies the invention could be a general purpose device having software arranged to provide the an embodiment of the invention.
  • the device could be a single device or a group of devices and the software could be a single program or a set of programs.
  • any or all of the software used to implement the invention can be contained on various transmission and/or storage mediums such as a floppy disc, CD-ROM, or magnetic tape so that the program can be loaded onto one or more general purpose devices or could be downloaded over a network using a suitable transmission medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

A method and apparatus are disclosed which enable the robustness of a network to cumulative node failure to be determined. The system takes as its input a description of network topology and proceeds simulate cumulative node failure and produce a model of the robustness of the network that can be used as a relative measure against revised or other network topologies.

Description

METHOD AND APPARATUS FOR ANALYSING NETWORK ROBUSTNESS
The present invention relates to analysis of the structure of networks. In particular but not exclusively, the invention related to assessing the robustness of a network topology when exposed to cumulative node failure. Such node failure may result from a node going out of service as a result of maintenance or a directed attack. In an ad-hoc mobile network a node may go out of service because it is out of range of nodes that it was previously in communication with.
Different network topologies react differently to node failure and/or broken links (see R Albert, H Jeong, and A-L. Barabasi, "Error and attack tolerance of complex networks", Nature 406, pages 376-382 and R Cohen, K Erez, D ben-Avraham and S Havlin, "Resilience of the Internet to random breakdowns", Physics Review Letters 85, pages 4626-4628) and that mathematical techniques used in statistical physics can be used to describe the behavior of such networks in this situation (see D S Callaway, M E J Newman, S H Strogatz, and D J Watts, "Network Robustness and Fragility: Percolation on Random Graphs", Physics Review Letters 85, pages 5468-5471). Also, most artificial networks, including the Internet and the World Wide Web, can be described as complex systems, often featuring "scale-free" properties (see R Albert, H Jeong, and A-L. Barabasi, "Diameter of the World-Wide Web", Nature 401 , pages 130-131 , 1999; M Faloutsos, P Faloutsos, and C Faloutsos, "On Power-Law Relationships of the Internet Topology", ACM SIGCOMM '99, Computer Communications Review 29, pages 251-263 and B Tadic, "Dynamics of directed graphs: the world-wide Web", Physica A 293/1-2, pages 273-284).
As will be appreciated by those skilled in the art, robustness of a wide variety of real distributed architectures (telecommunication and transportation networks, power grids etc.) is a function of their topology and could therefore be evaluated on the basis of their blueprint. Similarly, several alternative designs could be compared before their actual implementation, in order, for example, to balance redundancy costs against increased resilience.
However, one problem that occurs is that efficient quantification and comparison requires selecting a consistent set of measurements that are considered a suitable summary of network behaviour under stress. According to an embodiment for the present invention there is provided apparatus for determining the response of a network to node failure, said apparatus comprising: means for inputting a representation of a network; means for measuring the performance of the network in simulations of node failure; and means for comparing the performance of the network in simulations to one or more models of network response to node failure.
Embodiments of the present invention provide a network analyser that quantifies a complex networks' behaviour when submitted to cumulative node failure. The analyser tests the robustness of any given network topology in an automated fashion, computing the values for a set of global variables after performing a statistical analysis of simulation results. Those variables characterise the decay of the network's largest component and effectively summarise the system's resilience to stress. In addition, the analyser provides a user-friendly interface to specify key simulation parameters and a graphical representation of the results. The results are also made available as text files.
Embodiments of the invention will now be described with reference to the accompanying drawing in which:
Figure 1 is a representation of the topology of a network; Figure 2 is a representation of the topology of the network of figure 1 after being subjected to cumulative node failure;
Figure 3 is a flow diagram illustrating the analysis method used by the analysis apparatus according to an embodiment of the present invention;
Figure 4a & 4b are graphs illustrating specific steps in the analysis illustrated in figure 3; Figure 5 is an annotated screen shot of the graphical user interface (GUI) of the analysis apparatus;
Figures 6 and 9 to 12 are screen shots of the display by the analysis apparatus of the results of its analysis;
Figure 7 is a graph representing features of the network whose analysis results are shown in figure 6; and
Figure 8 is a further screen shot of the GUI showing the inputs used to generate the analysis shown in figure 9.
With reference to figure 1 , a network 101 is made up of a number of nodes 103 interconnected by links 105 (Nodes 103 may also be referred to as vertices and links 105 as edges). One measure of the effect of cumulative node failures on a network is to measure the relative size (S) of the largest intact component to the total number of nodes. For example, the network 101 of figure 1 is fully intact since there is a path between each node and each other node and so S=1. Figure 2 illustrates the same network 101 after cumulative node failure. This node failure has resulted in 50% or less of the remaining nodes (i.e. the nodes left after the failed nodes have been removed) are still connected together in the largest component. The other 50% of the nodes are attached in smaller groups or not attached at all. As a result, S=0.5 (for clarity only the largest component is illustrated in figure 2). For example, a network may have one hundred nodes of which five fail leaving 95 nodes in the network. For a relatively resilient network topology this could result in a value of S of 0.80. In other words, 80% or 76 of the remaining 95 nodes would still be connected together. For a relatively brittle network topology this could result in a value of S of 0.20. In other words, 20% or 19 of the remaining 95 nodes would still be connected together.
The decay of the average relative size of the largest component <S> of a given network can be modelled using one of two basic non-linear equations. The first equation performs better when modelling networks with relatively resilient topology and has the form:
(s) [1a]
X + e*1
where X and β are constants and x is the fraction of nodes which have been disconnected or removed from the original network. If the topology of the network is such that is has a relatively brittle response to cumulative node loss then it may be modelled better by the expression:
(S) = -£-j [1b]
Equations [1a] and [1b] obey a very similar logic and are relative efficient in describing the network's behaviour under cumulative node failure. They can be used to discriminate between 2 qualitatively different categories of architecture. Since expression [1a] or [1b] give an approximation of the decay of a given network's largest component, then the corresponding X and β global variables are a suitable measurement for quantifying its resilience to cumulative node failure.
A further useful indicator derived from an adjusted value of X is Xc. This is defined as the value of x for which the average relative size (<S>) of the largest component is equal to 0.5. In other words Xc is the critical fraction of "missing" nodes above which, on average, less than 50% of the surviving nodes are still interconnected. The value of β provides an approximation of the slope of the curve around the critical value Xc. Xc is defined for networks described by equation [1a] as:
_ ln( )
Xc = [2a] β
and for networks described by equation [1 b] as:
x =ξfχ [2b]
Figure 3 is a flow diagram illustrating the analysis method carried out by a computer program embodying the present invention running on a general purpose computer. The program provides an analysis apparatus that takes as input a description of the topology of the network to be analysed. The topology is described in a text file that lists the total number of nodes, the total number of links and paired node identification numbers (IDs) thereby specifying which nodes are directly connected to which other nodes. An example network of 1000 nodes interconnected by 999 links is set out in Table 1 below (not all the connections are shown).
Figure imgf000005_0001
Figure imgf000006_0001
Table 1 : Topology file format
After a properly formatted topology file has been generated for analysis, the user can launch the program to perform robustness tests which will start by displaying a graphical user interface (GUI) allowing simulation parameters and topology file name to be entered/modified by the user. The purpose of the simulations is to enable the calculation of the global variables β, X and Xc by performing a statistical analysis on data produced using Monte Carlo techniques for both the random failure and directed attack simulation techniques which will be described in further detail below.
A representation of the GUI is shown in figure 5. The GUI 501 comprises a number of user definable fields, a check box and two buttons in addition to the standard Windows ™ control buttons. The #Sims box 503 enables the user to determine how many simulations should be performed on the supplied topology data. The Sample box 505 enables the user to determine how many points there should be during each simulation where the effect of node losses should be calculated i.e. S measured. The File box 507 enables the user to define the file in which the topology of the network to be analysed is stored. The Seed box 509 is used to define a number that is used by the analyser to initialise its random number generator. The Attack check box 511 enables the user to choose between a random node failure simulation or a directed attack simulation. The Start button 513 begins the simulation process while the Exit button 515 closes the program.
After the simulation phase is over, the analyser creates two separate text files, bearing the same name as the original topology file, but with different extensions. One is contains the values for the global variables and a measurement of fitting quality (r2) and has a ".gvr" suffix. The other contains a table of numerical values as shown in table 2 below and has a ".rst" suffix. The first column of table 2 contains the fraction of nodes that have failed, the second contains the corresponding average relative size <S> of the largest component, and the third is the standard deviation for S. The fourth column is the value of <S> as predicted by expression [1a], and the fifth is the value of <S> as predicted by [1b].
Figure imgf000007_0001
Table 2: Example ".rst" file
The method used by the analysis by the program to produce data of the type shown in table 2 will now be described with reference to figure 3. At step 301, the program is initiated and extracts the topology data from the topology file described above. Processing then moves to step 303 at which the topology data is used to simulate network decay either by random node loss or by directed attack as determined by the user via the GUI as noted above.
The random node loss is simulated by the system randomly choosing one or more nodes from the supplied topology and removing it from the network. This is repeated until all nodes have been removed. The number simulations that are carried out and the number of nodes that are removed at each iteration can be varied by the user via the GUI which is described in further detail below. After each round of node removal, the size S of the largest component of the remaining network is calculated by known methods and stored. At step 305, the average value between simulations (where there is more than one) of S is calculated along with its standard deviation SDEV and stored in the manner noted above with reference to table 2. Figure 4a is a graph of <S> (relative size of the largest component) derived from the simulations plotted against x (proportion of original number of nodes removed). At step 307, for each of the expressions [1a] & 1[b] a linear transformation is applied to the values of S. For both expressions [1a] & [1b] the transform is:
S'= ln(l -S) -ln(S) [3a]
For expression [1a], the linear transformation given by [3a] above must be plotted against x. For expression [1b], the linear transformation must be plotted against a modified version of x, namely x' where:
X' = \Ά(X) [3b]
Points should then be regrouped along a straight line for the model that best fits the numerical data. After the transformation provided by expression [3a] the data shown in figure 4a appears as shown in figure 4b. At step 309, the regression of the points shown in figure 4b is calculated which provides the expression:
S' = Ax + Const [4a]
Where A is in fact the constant β and X = exp(-Const). The same regression is also applied to the points produced by expression [3b] which similarly yields the constants β and via the expression [4a]. Having calculated the constants ? and the analyser then proceeds to calculate S using expressions [1a] & [1b] and stores the results as shown in the fourth and fifth columns of the table 2 above. Xc is also calculated using expressions [2a] & [2b] and the results stored as described above.
Processing then moves to step 311 where a fitting function is used to compare the results of the calculations of S using expressions [1a] & [1b] against the empirical results for S from the simulations carried out in step 303. The fitting function gives a measure (r2) for each of the curves derived from expressions [1a] & [1b] relative to the curve from the simulation. At step 313, the analyser displays the data it has calculated. An example display is shown in figure 6. The results window 601 is displayed which includes values for all global variables and a graph showing simulation data (average S +/- standard deviation) as well as the results from each of the expressions [1a] & [1b] (referred to in the window 601 as Option 1 and Option 2). In this example (1000 nodes, 999 links, scale-free) the value for r2 is highest for expression [1b] (option 2) indicating that provides expression [1b] is the better model for the network being analysed. As noted above, expression [1 b] is typical of a brittle network.
If the "Attack" option is selected using the check box 511 of the GUI 501 shown in figure 5 then the analyser is arranged to remove nodes using a "best guess" strategy in the simulation carried out at step 303 of figure 3. This strategy emulates an attacker's strategy where the attacker possesses partial information about network topology which is used to chose which node to target next. This strategy is modelled by attributing to each surviving node a probability of being selected that is linearly proportional to its degree k (i.e. the number of links it has to other nodes):
Figure imgf000009_0001
Using equation [5], the analyser recalculates P,- after each attack in order to take into account the changing probability distribution caused by the elimination of one of the nodes. This increased complexity means that testing a network's resilience for directed attack is more intensive and time consuming than for random failure.
The "Attack" scenario, because of its stochastic nature, can also be used to model special forms of accidental damage where connectivity level is involved. For example, in a network where congestion is a cause for node failure, key relays (highly connected nodes) are more likely to suffer breakdown, which can be modelled using expression [5].
The use of the analyser as a design tool when planning network architecture will now be described with reference to worked examples illustrated in figures 7 to 12. The example network is a relatively large 3000 nodes system. The cheapest way to have them all such nodes interconnected (from a topological point of view) involves 2999 links. They could all be arranged in a single "star" or in a closed "loop", but more realistic architectures would involve inter-connected sub-domains of different size and/or topology. The network used for this example is a scale-free network of the appropriate size (3000 nodes, one link per node except the first one) to use as the basic blueprint. Figure 7 indicates that the example network's topology is scale-free (power law relationship between node frequency and degree). The most highly connected node has a direct link with 45 other nodes, 9 "secondary hubs" have more than 20 connections, and 28 have between 10 and 20 direct "affiliates".
Using this topology the network designer can use the analyser to compute statistics about its resilience to node failure, in terms of the cohesion of its largest component (initially including all nodes). In this example, the designer wants the analyser to conduct statistics on a series of 100 simulations, "killing" 1/30 ~ 0.033 = 100 randomly selected nodes between successive sample values. The GUI entries to provide this are shown in figure 8. After the complete process is completed the analyser displays the results window as shown in figure 9 from which it can be seen that the r2 value for Option 2 correlates closest to the simulation results indicating that expression [1a] models the network best.
As a result of the example network having a tree-like hierarchical structure with no built-in redundancy (1 link per node), it is not very robust to node failure. Indeed, the analyser shows that, on average, removing only about 14% of all nodes (equivalent to severing all their links) is enough to reduce the size of the largest component to 50% of the surviving population (Xc ~ 0.14). The analyser tells the designer that if 500 nodes out of 3000 are malfunctioning, chances are the largest sub-set of relays that are still interconnected contains less than a half of the 2500 surviving nodes. In other words, it is likely that in this situation, around 1250 operational nodes are cut from (and unable to exchange any information with) the core of the network.
Testing the same architecture for "attack" (by checking the box 511 in the GUI 501) would give even more concerning results. For example killing only about 2% (Xc - 0.14) of the population but this time selecting preferentially highly connected nodes is enough to reach the same situation. So when applied to a typical scale-free architecture, the analyser correctly and automatically predicts the type of network behaviour and summarises it using a set of global variables. When the designer wants to increase the robustness of a planned network, alternative blueprints are produced, then fed in to the analyser in order to compare their performance against that of an original or control structure. For example, a straightforward way of increasing robustness is to add at least some backup links, so that alternative routes are available between nodes in case the primary (presumably most efficient) path becomes unavailable due to node failure(s). Continuing the above example, the designer could want to test the influence of doubling the total number of connections (raising it to 5999 links).
The results of this are illustrated in figure 10, where with 3000 new connections added to the original blueprint, the network becomes much more resilient to node failure. It now takes about 60% of the nodes to be missing before more than a half of the surviving population is cut from the largest component. It is also clear that option 1 instead gives a much better fitting than option 2. This suggests a "qualitative" change in network behaviour. Moreover, the analyser provides additional information in the form of the evolution of the standard deviation around the average value. Indeed, until up to 50 percent nodes have failed, the relative size of the largest component appears extremely stable relative to the simulation shown in figure 9. This indicates that the changes to the architecture (doubling the number of links between nodes) have resulted in the reaction of the network to cumulative stress being more predictable.
The ability of the network to withstand directed attack is also increased, as shown on figure 11 which illustrates the analysis of the same network as figure 9 except with the Attack box 511 checked. Instead of requiring the removal of only 2% of the nodes, it is now necessary to kill up to 40% to break the largest component, even though the most highly connected vertices are still specifically targeted.
Doubling the number of links may however be an unacceptable solution because of financial considerations. The network designer may look for alternative ways of improving robustness, perhaps by testing the benefit of partial route redundancy. Again, the analyser would allow the making of projections on the basis of another blueprint. For example, if only 1000 extra-connections are added to the original topology, bringing it to 3999. The results of this are shown in figure 12. As can be seen, the robustness is not increased in the same proportion as before. However, even though 33% extra links were created instead of 100%, the critical size Xc is shifted to -0.44. In other words, the modified network is three times more robust on this measure relative to the original blueprint. Since the doubling of the number of connections described above only results in the robustness increasing four times then the second choice may be more cost-effective.
These results demonstrate that the analyser enables the network designer to obtain valuable and detailed information quickly (including the value of β, which was not discussed in the example but gives a useful indication of how fast the network is likely collapse when approaching critical size). The apparatus described above is a combined simulation and analysis tool designed to study topological robustness. It does not take into account other critical aspects of network operation like traffic or routing management. Its purpose is to provide a suitable way of estimating the speed and profile of the largest component's decay under cumulative node failure, a necessary step in assessing a system's ability to withstand damage.
It will be understood by those skilled in the art that the apparatus that embodies the invention could be a general purpose device having software arranged to provide the an embodiment of the invention. The device could be a single device or a group of devices and the software could be a single program or a set of programs. Furthermore, any or all of the software used to implement the invention can be contained on various transmission and/or storage mediums such as a floppy disc, CD-ROM, or magnetic tape so that the program can be loaded onto one or more general purpose devices or could be downloaded over a network using a suitable transmission medium.
Unless the context clearly requires otherwise, throughout the description and the claims, the words "comprise", "comprising" and the like are to be construed in an inclusive as opposed to an exclusive or exhaustive sense; that is to say, in the sense of "including, but not limited to".

Claims

Claims
1. Apparatus for determining the response of a network to node failure, said apparatus comprising: means for inputting a representation of a network; means for measuring the performance of the network in simulations of node failure; and means for comparing the performance of the network in simulations to one or more models of network response to node failure.
2. Apparatus according to claim 1 in which the means for measuring the performance of the network is operable to determine two characteristics of the network.
3. Apparatus according to any preceding claim in which the means for measuring the performance of the network is operable to determine a measure (X, Xc) of the decay of the largest component of the network in response to node failure.
4. Apparatus according to any preceding claim in which the means for measuring the performance of the network is operable to determine a measure (β) of the robustness of the network in response to node failure.
5. Apparatus according to any preceding claim in which the means for measuring the performance of the network in simulations of node failure is operable to carry out simulations for a plurality of types of node failure.
6. Apparatus according to claim 5 in which the plurality of types of node failure include node failure resulting from directed attack or from random failure.
7. Apparatus according to any preceding claim in which the means for measuring the performance of the network in simulations of node failure is operable to carry out simulations for a plurality of types of network types such as a brittle network or a resilient network.
8. Apparatus according to any preceding claim further comprising means for choosing one of the models as modelling the performance of the network.
9. A method for determining the response of a network to node failure, said method comprising the steps of: determining a representation of a network; measuring the performance of the network in simulations of node failure; and comparing the performance of the network in simulations to one or more models of network response to node failure.
10. A method according to claim 9 in which the measuring step includes measuring the performance of the network to determine two characteristics of the network.
11. A method according to claim 8 or 9 in which the performance of the network is measured to determine of the decay of the largest component (X, Xc) of the network in response to node failure.
12. A method according to any of claims 9 to 11 in which the performance of the network is measured to determine the robustness (β) of the network in response to node failure.
13. A method according to any of claims 9 to 12 in which the performance of the network in is simulated for a plurality of types of node failure.
14. A method according to claim 13 in which the plurality of types of node failure include node failure resulting from directed attack or from random failure.
15. A method according to any of claims 9 to 14 in which the simulations of node failure are carried out for a plurality of types of network types such as a brittle network or a resilient network.
16. A method according to any of claims 9 to 15 comprising the further step of choosing one of the models as modelling the performance of the network.
17. A computer program or suite of computer programs arranged to enable a computer or computers to provide the functions of the method or apparatus of any preceding claim.
PCT/GB2002/005029 2001-11-01 2002-11-01 Method and apparatus for analysing network robustness WO2003039070A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP01309304.2 2001-11-01
EP01309304 2001-11-01

Publications (2)

Publication Number Publication Date
WO2003039070A2 true WO2003039070A2 (en) 2003-05-08
WO2003039070A3 WO2003039070A3 (en) 2003-08-14

Family

ID=8182413

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2002/005029 WO2003039070A2 (en) 2001-11-01 2002-11-01 Method and apparatus for analysing network robustness

Country Status (1)

Country Link
WO (1) WO2003039070A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1624397A1 (en) * 2004-08-02 2006-02-08 Microsoft Corporation Automatic validation and calibration of transaction-based performance models
CN100465918C (en) * 2004-08-02 2009-03-04 微软公司 Automatic configuration of transaction-based performance models
US7797425B2 (en) 2005-12-22 2010-09-14 Amdocs Systems Limited Method, system and apparatus for communications circuit design
US8018860B1 (en) * 2003-03-12 2011-09-13 Sprint Communications Company L.P. Network maintenance simulator with path re-route prediction
US20180351814A1 (en) * 2015-03-23 2018-12-06 Utopus Insights, Inc. Network management based on assessment of topological robustness and criticality of assets
CN112350312A (en) * 2020-10-29 2021-02-09 广东稳峰电力科技有限公司 Power line robustness analysis method and device
US20230214304A1 (en) * 2021-12-30 2023-07-06 Juniper Networks, Inc. Dynamic prediction of system resource requirement of network software in a live network using data driven models
US11855866B1 (en) 2022-09-29 2023-12-26 The Mitre Corporation Systems and methods for assessing a computing network's physical robustness

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DEMPSEY S ET AL: "Predicting FDDI Computer Network Performance Using A Calibrated Software Simulation Model" PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, 1997. IPCCC 1997., IEEE INTERNATIONAL PHOENIX, TEMPE, AZ, USA 5-7 FEB. 1997, NEW YORK, NY, USA,IEEE, US, 5 February 1997 (1997-02-05), pages 1-9, XP010217039 ISBN: 0-7803-3873-1 *
KANT L ET AL: "Modeling and simulation study of the survivability performance of ATM-based restoration strategies for the next generation high-speed networks" COMPUTER COMMUNICATIONS AND NETWORKS, 1999. PROCEEDINGS. EIGHT INTERNATIONAL CONFERENCE ON BOSTON, MA, USA 11-13 OCT. 1999, PISCATAWAY, NJ, USA,IEEE, US, 11 October 1999 (1999-10-11), pages 469-473, XP010359621 ISBN: 0-7803-5794-9 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8018860B1 (en) * 2003-03-12 2011-09-13 Sprint Communications Company L.P. Network maintenance simulator with path re-route prediction
EP1624397A1 (en) * 2004-08-02 2006-02-08 Microsoft Corporation Automatic validation and calibration of transaction-based performance models
CN100465918C (en) * 2004-08-02 2009-03-04 微软公司 Automatic configuration of transaction-based performance models
US7797425B2 (en) 2005-12-22 2010-09-14 Amdocs Systems Limited Method, system and apparatus for communications circuit design
US20180351814A1 (en) * 2015-03-23 2018-12-06 Utopus Insights, Inc. Network management based on assessment of topological robustness and criticality of assets
US10778529B2 (en) * 2015-03-23 2020-09-15 Utopus Insights, Inc. Network management based on assessment of topological robustness and criticality of assets
US11552854B2 (en) 2015-03-23 2023-01-10 Utopus Insights, Inc. Network management based on assessment of topological robustness and criticality of assets
CN112350312A (en) * 2020-10-29 2021-02-09 广东稳峰电力科技有限公司 Power line robustness analysis method and device
CN112350312B (en) * 2020-10-29 2022-10-04 广东稳峰电力科技有限公司 Power line robustness analysis method and device
US20230214304A1 (en) * 2021-12-30 2023-07-06 Juniper Networks, Inc. Dynamic prediction of system resource requirement of network software in a live network using data driven models
US11797408B2 (en) * 2021-12-30 2023-10-24 Juniper Networks, Inc. Dynamic prediction of system resource requirement of network software in a live network using data driven models
US11855866B1 (en) 2022-09-29 2023-12-26 The Mitre Corporation Systems and methods for assessing a computing network's physical robustness

Also Published As

Publication number Publication date
WO2003039070A3 (en) 2003-08-14

Similar Documents

Publication Publication Date Title
CN105580032B (en) For reducing instable method and system when upgrading software
US7243049B1 (en) Method for modeling system performance
Kalaji et al. An integrated search-based approach for automatic testing from extended finite state machine (EFSM) models
US7685468B2 (en) Method and system for test case generation
CN111176991B (en) Automatic generation method for embedded software interface use cases
CN107168995A (en) A kind of data processing method and server
WO2003039070A2 (en) Method and apparatus for analysing network robustness
EP2537297A1 (en) Network analysis
EP3932012B1 (en) Mesh communication network provision
US7778804B2 (en) Network system analysis
CN116886329A (en) Quantitative index optimization method for industrial control system safety
CN106533824A (en) Communication network elasticity evaluation method under given disturbance
Pandey et al. Analysis of reference and citation copying in evolving bibliographic networks
CN105577432A (en) Network packet loss probability prediction method based on correlation analysis
Walsh et al. The structure of vulnerable nodes in behavioral network models of complex engineered systems
Younis et al. Towards the Impact of Security Vunnerabilities in Software Design: A Complex Network-Based Approach
Marzo et al. Network robustness simulator: A case study on epidemic models
Malka et al. Design reliability—estimation through statistical analysis of bug discovery data
TW202026914A (en) System and method for analyzing potential degradation probability of broadband service equipment
Kogeda et al. A probabilistic approach to faults prediction in cellular networks
US12021680B1 (en) Detecting and mitigating cascading errors in a network to improve network resilience
Reichelt et al. Reliable communication network design with evolutionary algorithms
CN117097397B (en) Service fault recovery method and device based on optical fiber link loss test
Zhao et al. Identifying Root-Cause Changes for User-Reported Incidents in Online Service Systems
Saffre RAn (Robustness Analyser)

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP